6 resultados para Clustering Analysis
em Universidade Federal do Rio Grande do Norte(UFRN)
Resumo:
The main objective of this study is to apply recently developed methods of physical-statistic to time series analysis, particularly in electrical induction s profiles of oil wells data, to study the petrophysical similarity of those wells in a spatial distribution. For this, we used the DFA method in order to know if we can or not use this technique to characterize spatially the fields. After obtain the DFA values for all wells, we applied clustering analysis. To do these tests we used the non-hierarchical method called K-means. Usually based on the Euclidean distance, the K-means consists in dividing the elements of a data matrix N in k groups, so that the similarities among elements belonging to different groups are the smallest possible. In order to test if a dataset generated by the K-means method or randomly generated datasets form spatial patterns, we created the parameter Ω (index of neighborhood). High values of Ω reveals more aggregated data and low values of Ω show scattered data or data without spatial correlation. Thus we concluded that data from the DFA of 54 wells are grouped and can be used to characterize spatial fields. Applying contour level technique we confirm the results obtained by the K-means, confirming that DFA is effective to perform spatial analysis
Resumo:
In this work calibration models were constructed to determine the content of total lipids and moisture in powdered milk samples. For this, used the near-infrared spectroscopy by diffuse reflectance, combined with multivariate calibration. Initially, the spectral data were submitted to correction of multiplicative light scattering (MSC) and Savitzsky-Golay smoothing. Then, the samples were divided into subgroups by application of hierarchical clustering analysis of the classes (HCA) and Ward Linkage criterion. Thus, it became possible to build regression models by partial least squares (PLS) that allowed the calibration and prediction of the content total lipid and moisture, based on the values obtained by the reference methods of Soxhlet and 105 ° C, respectively . Therefore, conclude that the NIR had a good performance for the quantification of samples of powdered milk, mainly by minimizing the analysis time, not destruction of the samples and not waste. Prediction models for determination of total lipids correlated (R) of 0.9955, RMSEP of 0.8952, therefore the average error between the Soxhlet and NIR was ± 0.70%, while the model prediction to content moisture correlated (R) of 0.9184, RMSEP, 0.3778 and error of ± 0.76%
Resumo:
Heavy metals can cause problems of human poisoning by ingestion of contaminated food, and the environment, a negative impact on the aquatic fauna and flora. And for the presence of these metals have been used for aquatic animals biomonitoramento environment. This research was done in order to assess the environmental impact of industrial and domestic sewage dumped in estuaries potiguares, from measures of heavy metals in mullet. The methods used for these determinations are those in the literature for analysis of food and water. Collections were 20 samples of mullet in several municipality of the state of Rio Grande do Norte, from the estuaries potiguares. Were analyzed the content of humidity, ash and heavy metals. The data were subjected to two methods of exploratory analysis: analysis of the main components (PCA), which provided a multivariate interpretation, showing that the samples are grouped according to similarities in the levels of metals and analysis of hierarchical groupings (HCA), producing similar results. These tests have proved useful for the treatment of the data producing information that would hardly viewed directly in the matrix of data. The analysis of the results shows the high levels of metallic species in samples Mugil brasiliensis collected in Estuaries /Potengi, Piranhas/Açu, Guaraíra / Papeba / Arês and Curimataú
Resumo:
Measures of mortality represent one of the most important indicators of health conditions. For comprising the larger rate of deaths, the study of mortality in the elderly population is regarded as essential to understand the health situation. In this sense, the present study aims to analyze the mortality profile of the population from 60 to 69 (young elders) and older than 80 years old (oldest old) in the Rio Grande do Norte state (Brazil) in the period 2001 to 2011, and to identify the association with contextual factors and variables about the quality of the Mortality Information System (SIM). For this purpose, Mortality Proportional (MP) was calculated for the state and Specific Mortality Rate by Age (CMId) , according to chapters of ICD- 10, to the municipalities of Rio Grande do Norte , through data from the Mortality Information System (SIM) and the Brazilian Institute of Geography and Statistics (IGBE). In order to identify groups of municipalities with similar mortality profiles, Nonhierarchical Clustering K-means method was applied and the Factor Analysis by the Principal Components Analysis was resort to reduce contextual variables. The spatial distribution of these groups and the factors were visualized using the Spatial Analysis Areas technique. During the period investigated, 21,813 younger elders deaths were recorded , with a predominance of deaths from circulatory diseases (32.75%) and neoplasms (22.9 %) . Among the oldest old, 50,637 deaths were observed, which 35.26% occurred because of cardiovascular diseases and 17.27% of ill-defined causes. Clustering Analysis produced three clusters to the two age groups and Factor Analysis reduced the contextual variables into three factors, also the sum of the factor scores was considered. Among the younger elders, the groups are called misinformation profile, development profile and development paradox, which showed a statistically significant association with education and poverty and extreme poverty factors, factorial sum and the variable related to underreporting of deaths. Misinformation profile remained in the oldest old group, accompanied by the epidemiological transition profile and the epidemiological paradox, that were statistically associated with the development and health factor, as well as with the variables that indicate the SIM quality: proportion of blank fields about the schooling and underreporting. It proposed that the mortality profiles of the younger elders and oldest old differ on the importance of the basic causes and that are influenced by different contextual aspects , observing that 60 to 69 years group is more affected by such aspects. Health inequalities can be reduced by measures aimed to improve levels of education and poverty, especially in younger elders, and by optimizing the use of health services, which is more associated to the oldest old health situation. Furthermore, it is important to improve the quality of information for the two age groups
Resumo:
The main objective of this study is to apply recently developed methods of physical-statistic to time series analysis, particularly in electrical induction s profiles of oil wells data, to study the petrophysical similarity of those wells in a spatial distribution. For this, we used the DFA method in order to know if we can or not use this technique to characterize spatially the fields. After obtain the DFA values for all wells, we applied clustering analysis. To do these tests we used the non-hierarchical method called K-means. Usually based on the Euclidean distance, the K-means consists in dividing the elements of a data matrix N in k groups, so that the similarities among elements belonging to different groups are the smallest possible. In order to test if a dataset generated by the K-means method or randomly generated datasets form spatial patterns, we created the parameter Ω (index of neighborhood). High values of Ω reveals more aggregated data and low values of Ω show scattered data or data without spatial correlation. Thus we concluded that data from the DFA of 54 wells are grouped and can be used to characterize spatial fields. Applying contour level technique we confirm the results obtained by the K-means, confirming that DFA is effective to perform spatial analysis
Resumo:
In this work we present a new clustering method that groups up points of a data set in classes. The method is based in a algorithm to link auxiliary clusters that are obtained using traditional vector quantization techniques. It is described some approaches during the development of the work that are based in measures of distances or dissimilarities (divergence) between the auxiliary clusters. This new method uses only two a priori information, the number of auxiliary clusters Na and a threshold distance dt that will be used to decide about the linkage or not of the auxiliary clusters. The number os classes could be automatically found by the method, that do it based in the chosen threshold distance dt, or it is given as additional information to help in the choice of the correct threshold. Some analysis are made and the results are compared with traditional clustering methods. In this work different dissimilarities metrics are analyzed and a new one is proposed based on the concept of negentropy. Besides grouping points of a set in classes, it is proposed a method to statistical modeling the classes aiming to obtain a expression to the probability of a point to belong to one of the classes. Experiments with several values of Na e dt are made in tests sets and the results are analyzed aiming to study the robustness of the method and to consider heuristics to the choice of the correct threshold. During this work it is explored the aspects of information theory applied to the calculation of the divergences. It will be explored specifically the different measures of information and divergence using the Rényi entropy. The results using the different metrics are compared and commented. The work also has appendix where are exposed real applications using the proposed method