20 resultados para K-Means Cluster
Resumo:
Measures of mortality represent one of the most important indicators of health conditions. For comprising the larger rate of deaths, the study of mortality in the elderly population is regarded as essential to understand the health situation. In this sense, the present study aims to analyze the mortality profile of the population from 60 to 69 (young elders) and older than 80 years old (oldest old) in the Rio Grande do Norte state (Brazil) in the period 2001 to 2011, and to identify the association with contextual factors and variables about the quality of the Mortality Information System (SIM). For this purpose, Mortality Proportional (MP) was calculated for the state and Specific Mortality Rate by Age (CMId) , according to chapters of ICD- 10, to the municipalities of Rio Grande do Norte , through data from the Mortality Information System (SIM) and the Brazilian Institute of Geography and Statistics (IGBE). In order to identify groups of municipalities with similar mortality profiles, Nonhierarchical Clustering K-means method was applied and the Factor Analysis by the Principal Components Analysis was resort to reduce contextual variables. The spatial distribution of these groups and the factors were visualized using the Spatial Analysis Areas technique. During the period investigated, 21,813 younger elders deaths were recorded , with a predominance of deaths from circulatory diseases (32.75%) and neoplasms (22.9 %) . Among the oldest old, 50,637 deaths were observed, which 35.26% occurred because of cardiovascular diseases and 17.27% of ill-defined causes. Clustering Analysis produced three clusters to the two age groups and Factor Analysis reduced the contextual variables into three factors, also the sum of the factor scores was considered. Among the younger elders, the groups are called misinformation profile, development profile and development paradox, which showed a statistically significant association with education and poverty and extreme poverty factors, factorial sum and the variable related to underreporting of deaths. Misinformation profile remained in the oldest old group, accompanied by the epidemiological transition profile and the epidemiological paradox, that were statistically associated with the development and health factor, as well as with the variables that indicate the SIM quality: proportion of blank fields about the schooling and underreporting. It proposed that the mortality profiles of the younger elders and oldest old differ on the importance of the basic causes and that are influenced by different contextual aspects , observing that 60 to 69 years group is more affected by such aspects. Health inequalities can be reduced by measures aimed to improve levels of education and poverty, especially in younger elders, and by optimizing the use of health services, which is more associated to the oldest old health situation. Furthermore, it is important to improve the quality of information for the two age groups
Resumo:
The use of clustering methods for the discovery of cancer subtypes has drawn a great deal of attention in the scientific community. While bioinformaticians have proposed new clustering methods that take advantage of characteristics of the gene expression data, the medical community has a preference for using classic clustering methods. There have been no studies thus far performing a large-scale evaluation of different clustering methods in this context. This work presents the first large-scale analysis of seven different clustering methods and four proximity measures for the analysis of 35 cancer gene expression data sets. Results reveal that the finite mixture of Gaussians, followed closely by k-means, exhibited the best performance in terms of recovering the true structure of the data sets. These methods also exhibited, on average, the smallest difference between the actual number of classes in the data sets and the best number of clusters as indicated by our validation criteria. Furthermore, hierarchical methods, which have been widely used by the medical community, exhibited a poorer recovery performance than that of the other methods evaluated. Moreover, as a stable basis for the assessment and comparison of different clustering methods for cancer gene expression data, this study provides a common group of data sets (benchmark data sets) to be shared among researchers and used for comparisons with new methods
Resumo:
Objective to establish a methodology for the oil spill monitoring on the sea surface, located at the Submerged Exploration Area of the Polo Region of Guamaré, in the State of Rio Grande do Norte, using orbital images of Synthetic Aperture Radar (SAR integrated with meteoceanographycs products. This methodology was applied in the following stages: (1) the creation of a base map of the Exploration Area; (2) the processing of NOAA/AVHRR and ERS-2 images for generation of meteoceanographycs products; (3) the processing of RADARSAT-1 images for monitoring of oil spills; (4) the integration of RADARSAT-1 images with NOAA/AVHRR and ERS-2 image products; and (5) the structuring of a data base. The Integration of RADARSAT-1 image of the Potiguar Basin of day 21.05.99 with the base map of the Exploration Area of the Polo Region of Guamaré for the identification of the probable sources of the oil spots, was used successfully in the detention of the probable spot of oil detected next to the exit to the submarine emissary in the Exploration Area of the Polo Region of Guamaré. To support the integration of RADARSAT-1 images with NOAA/AVHRR and ERS-2 image products, a methodology was developed for the classification of oil spills identified by RADARSAT-1 images. For this, the following algorithms of classification not supervised were tested: K-means, Fuzzy k-means and Isodata. These algorithms are part of the PCI Geomatics software, which was used for the filtering of RADARSAT-1 images. For validation of the results, the oil spills submitted to the unsupervised classification were compared to the results of the Semivariogram Textural Classifier (STC). The mentioned classifier was developed especially for oil spill classification purposes and requires PCI software for the whole processing of RADARSAT-1 images. After all, the results of the classifications were analyzed through Visual Analysis; Calculation of Proportionality of Largeness and Analysis Statistics. Amongst the three algorithms of classifications tested, it was noted that there were no significant alterations in relation to the spills classified with the STC, in all of the analyses taken into consideration. Therefore, considering all the procedures, it has been shown that the described methodology can be successfully applied using the unsupervised classifiers tested, resulting in a decrease of time in the identification and classification processing of oil spills, if compared with the utilization of the STC classifier
Resumo:
The main objective of this study is to apply recently developed methods of physical-statistic to time series analysis, particularly in electrical induction s profiles of oil wells data, to study the petrophysical similarity of those wells in a spatial distribution. For this, we used the DFA method in order to know if we can or not use this technique to characterize spatially the fields. After obtain the DFA values for all wells, we applied clustering analysis. To do these tests we used the non-hierarchical method called K-means. Usually based on the Euclidean distance, the K-means consists in dividing the elements of a data matrix N in k groups, so that the similarities among elements belonging to different groups are the smallest possible. In order to test if a dataset generated by the K-means method or randomly generated datasets form spatial patterns, we created the parameter Ω (index of neighborhood). High values of Ω reveals more aggregated data and low values of Ω show scattered data or data without spatial correlation. Thus we concluded that data from the DFA of 54 wells are grouped and can be used to characterize spatial fields. Applying contour level technique we confirm the results obtained by the K-means, confirming that DFA is effective to perform spatial analysis
Resumo:
The extent of the Brazilian Atlantic rainforest, a global biodiversity hotspot, has been reduced to less than 7% of its original range. Yet, it contains one of the richest butterfly fauna in the world. Butterflies are commonly used as environmental indicators, mostly because of their strict association with host plants, microclimate and resource availability. This research describes diversity, composition and species richness of frugivorous butterflies in a forest fragment in the Brazilian Northeast. It compares communities in different physiognomies and seasons. The climate in the study area is classified as tropical rainy, with two well defined seasons. Butterfly captures were made with 60 Van Someren-Rydon traps, randomly located within six different habitat units (10 traps per unit) that varied from very open (e.g. coconut plantation) to forest interior. Sampling was made between January and December 2008, for five days each month. I captured 12090 individuals from 32 species. The most abundant species were Taygetis laches, Opsiphanes invirae and Hamadryas februa, which accounted for 70% of all captures. Similarity analysis identified two main groups, one of species associated with open or disturbed areas and a second by species associated with shaded areas. There was a strong seasonal component in species composition, with less species and lower abundance in the dry season and more species and higher abundance in the rainy season. K-means analysis indicates that choice of habitat units overestimated faunal perceptions, suggesting less distinct units. The species Taygetis virgilia, Hamadryas chloe, Callicore pygas e Morpho achilles were associated with less disturbed habitats, while Yphthimoides sp, Historis odius, H. acheronta, Hamadryas feronia e Siderone marthesia likey indicate open or disturbed habitats. This research brings important information for conservation of frugivorous butterflies, and will serve as baseline for future projects in environmental monitoring