9 resultados para hierarchical clustering

em Universidade Federal do Rio Grande do Norte(UFRN)


Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this work calibration models were constructed to determine the content of total lipids and moisture in powdered milk samples. For this, used the near-infrared spectroscopy by diffuse reflectance, combined with multivariate calibration. Initially, the spectral data were submitted to correction of multiplicative light scattering (MSC) and Savitzsky-Golay smoothing. Then, the samples were divided into subgroups by application of hierarchical clustering analysis of the classes (HCA) and Ward Linkage criterion. Thus, it became possible to build regression models by partial least squares (PLS) that allowed the calibration and prediction of the content total lipid and moisture, based on the values obtained by the reference methods of Soxhlet and 105 ° C, respectively . Therefore, conclude that the NIR had a good performance for the quantification of samples of powdered milk, mainly by minimizing the analysis time, not destruction of the samples and not waste. Prediction models for determination of total lipids correlated (R) of 0.9955, RMSEP of 0.8952, therefore the average error between the Soxhlet and NIR was ± 0.70%, while the model prediction to content moisture correlated (R) of 0.9184, RMSEP, 0.3778 and error of ± 0.76%

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Heavy metals can cause problems of human poisoning by ingestion of contaminated food, and the environment, a negative impact on the aquatic fauna and flora. And for the presence of these metals have been used for aquatic animals biomonitoramento environment. This research was done in order to assess the environmental impact of industrial and domestic sewage dumped in estuaries potiguares, from measures of heavy metals in mullet. The methods used for these determinations are those in the literature for analysis of food and water. Collections were 20 samples of mullet in several municipality of the state of Rio Grande do Norte, from the estuaries potiguares. Were analyzed the content of humidity, ash and heavy metals. The data were subjected to two methods of exploratory analysis: analysis of the main components (PCA), which provided a multivariate interpretation, showing that the samples are grouped according to similarities in the levels of metals and analysis of hierarchical groupings (HCA), producing similar results. These tests have proved useful for the treatment of the data producing information that would hardly viewed directly in the matrix of data. The analysis of the results shows the high levels of metallic species in samples Mugil brasiliensis collected in Estuaries /Potengi, Piranhas/Açu, Guaraíra / Papeba / Arês and Curimataú

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Peng was the first to work with the Technical DFA (Detrended Fluctuation Analysis), a tool capable of detecting auto-long-range correlation in time series with non-stationary. In this study, the technique of DFA is used to obtain the Hurst exponent (H) profile of the electric neutron porosity of the 52 oil wells in Namorado Field, located in the Campos Basin -Brazil. The purpose is to know if the Hurst exponent can be used to characterize spatial distribution of wells. Thus, we verify that the wells that have close values of H are spatially close together. In this work we used the method of hierarchical clustering and non-hierarchical clustering method (the k-mean method). Then compare the two methods to see which of the two provides the best result. From this, was the parameter � (index neighborhood) which checks whether a data set generated by the k- average method, or at random, so in fact spatial patterns. High values of � indicate that the data are aggregated, while low values of � indicate that the data are scattered (no spatial correlation). Using the Monte Carlo method showed that combined data show a random distribution of � below the empirical value. So the empirical evidence of H obtained from 52 wells are grouped geographically. By passing the data of standard curves with the results obtained by the k-mean, confirming that it is effective to correlate well in spatial distribution

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Hierarchical structure with nested nonlocal dependencies is a key feature of human language and can be identified theoretically in most pieces of tonal music. However, previous studies have argued against the perception of such structures in music. Here, we show processing of nonlocal dependencies in music. We presented chorales by J. S. Bach and modified versions inwhich the hierarchical structure was rendered irregular whereas the local structure was kept intact. Brain electric responses differed between regular and irregular hierarchical structures, in both musicians and nonmusicians. This finding indicates that, when listening to music, humans apply cognitive processes that are capable of dealing with longdistance dependencies resulting from hierarchically organized syntactic structures. Our results reveal that a brain mechanism fundamental for syntactic processing is engaged during the perception of music, indicating that processing of hierarchical structure with nested nonlocal dependencies is not just a key component of human language, but a multidomain capacity of human cognition.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The main objective of this study is to apply recently developed methods of physical-statistic to time series analysis, particularly in electrical induction s profiles of oil wells data, to study the petrophysical similarity of those wells in a spatial distribution. For this, we used the DFA method in order to know if we can or not use this technique to characterize spatially the fields. After obtain the DFA values for all wells, we applied clustering analysis. To do these tests we used the non-hierarchical method called K-means. Usually based on the Euclidean distance, the K-means consists in dividing the elements of a data matrix N in k groups, so that the similarities among elements belonging to different groups are the smallest possible. In order to test if a dataset generated by the K-means method or randomly generated datasets form spatial patterns, we created the parameter Ω (index of neighborhood). High values of Ω reveals more aggregated data and low values of Ω show scattered data or data without spatial correlation. Thus we concluded that data from the DFA of 54 wells are grouped and can be used to characterize spatial fields. Applying contour level technique we confirm the results obtained by the K-means, confirming that DFA is effective to perform spatial analysis

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this work we present a new clustering method that groups up points of a data set in classes. The method is based in a algorithm to link auxiliary clusters that are obtained using traditional vector quantization techniques. It is described some approaches during the development of the work that are based in measures of distances or dissimilarities (divergence) between the auxiliary clusters. This new method uses only two a priori information, the number of auxiliary clusters Na and a threshold distance dt that will be used to decide about the linkage or not of the auxiliary clusters. The number os classes could be automatically found by the method, that do it based in the chosen threshold distance dt, or it is given as additional information to help in the choice of the correct threshold. Some analysis are made and the results are compared with traditional clustering methods. In this work different dissimilarities metrics are analyzed and a new one is proposed based on the concept of negentropy. Besides grouping points of a set in classes, it is proposed a method to statistical modeling the classes aiming to obtain a expression to the probability of a point to belong to one of the classes. Experiments with several values of Na e dt are made in tests sets and the results are analyzed aiming to study the robustness of the method and to consider heuristics to the choice of the correct threshold. During this work it is explored the aspects of information theory applied to the calculation of the divergences. It will be explored specifically the different measures of information and divergence using the Rényi entropy. The results using the different metrics are compared and commented. The work also has appendix where are exposed real applications using the proposed method

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This work proposes a collaborative system for marking dangerous points in the transport routes and generation of alerts to drivers. It consisted of a proximity warning system for a danger point that is fed by the driver via a mobile device equipped with GPS. The system will consolidate data provided by several different drivers and generate a set of points common to be used in the warning system. Although the application is designed to protect drivers, the data generated by it can serve as inputs for the responsible to improve signage and recovery of public roads

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The use of clustering methods for the discovery of cancer subtypes has drawn a great deal of attention in the scientific community. While bioinformaticians have proposed new clustering methods that take advantage of characteristics of the gene expression data, the medical community has a preference for using classic clustering methods. There have been no studies thus far performing a large-scale evaluation of different clustering methods in this context. This work presents the first large-scale analysis of seven different clustering methods and four proximity measures for the analysis of 35 cancer gene expression data sets. Results reveal that the finite mixture of Gaussians, followed closely by k-means, exhibited the best performance in terms of recovering the true structure of the data sets. These methods also exhibited, on average, the smallest difference between the actual number of classes in the data sets and the best number of clusters as indicated by our validation criteria. Furthermore, hierarchical methods, which have been widely used by the medical community, exhibited a poorer recovery performance than that of the other methods evaluated. Moreover, as a stable basis for the assessment and comparison of different clustering methods for cancer gene expression data, this study provides a common group of data sets (benchmark data sets) to be shared among researchers and used for comparisons with new methods

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The main goal of this work is to investigate the suitability of applying cluster ensemble techniques (ensembles or committees) to gene expression data. More specifically, we will develop experiments with three diferent cluster ensembles methods, which have been used in many works in literature: coassociation matrix, relabeling and voting, and ensembles based on graph partitioning. The inputs for these methods will be the partitions generated by three clustering algorithms, representing diferent paradigms: kmeans, ExpectationMaximization (EM), and hierarchical method with average linkage. These algorithms have been widely applied to gene expression data. In general, the results obtained with our experiments indicate that the cluster ensemble methods present a better performance when compared to the individual techniques. This happens mainly for the heterogeneous ensembles, that is, ensembles built with base partitions generated with diferent clustering algorithms