21 resultados para k-Means algorithm


Relevância:

80.00% 80.00%

Publicador:

Resumo:

The use of clustering methods for the discovery of cancer subtypes has drawn a great deal of attention in the scientific community. While bioinformaticians have proposed new clustering methods that take advantage of characteristics of the gene expression data, the medical community has a preference for using classic clustering methods. There have been no studies thus far performing a large-scale evaluation of different clustering methods in this context. This work presents the first large-scale analysis of seven different clustering methods and four proximity measures for the analysis of 35 cancer gene expression data sets. Results reveal that the finite mixture of Gaussians, followed closely by k-means, exhibited the best performance in terms of recovering the true structure of the data sets. These methods also exhibited, on average, the smallest difference between the actual number of classes in the data sets and the best number of clusters as indicated by our validation criteria. Furthermore, hierarchical methods, which have been widely used by the medical community, exhibited a poorer recovery performance than that of the other methods evaluated. Moreover, as a stable basis for the assessment and comparison of different clustering methods for cancer gene expression data, this study provides a common group of data sets (benchmark data sets) to be shared among researchers and used for comparisons with new methods

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Objective to establish a methodology for the oil spill monitoring on the sea surface, located at the Submerged Exploration Area of the Polo Region of Guamaré, in the State of Rio Grande do Norte, using orbital images of Synthetic Aperture Radar (SAR integrated with meteoceanographycs products. This methodology was applied in the following stages: (1) the creation of a base map of the Exploration Area; (2) the processing of NOAA/AVHRR and ERS-2 images for generation of meteoceanographycs products; (3) the processing of RADARSAT-1 images for monitoring of oil spills; (4) the integration of RADARSAT-1 images with NOAA/AVHRR and ERS-2 image products; and (5) the structuring of a data base. The Integration of RADARSAT-1 image of the Potiguar Basin of day 21.05.99 with the base map of the Exploration Area of the Polo Region of Guamaré for the identification of the probable sources of the oil spots, was used successfully in the detention of the probable spot of oil detected next to the exit to the submarine emissary in the Exploration Area of the Polo Region of Guamaré. To support the integration of RADARSAT-1 images with NOAA/AVHRR and ERS-2 image products, a methodology was developed for the classification of oil spills identified by RADARSAT-1 images. For this, the following algorithms of classification not supervised were tested: K-means, Fuzzy k-means and Isodata. These algorithms are part of the PCI Geomatics software, which was used for the filtering of RADARSAT-1 images. For validation of the results, the oil spills submitted to the unsupervised classification were compared to the results of the Semivariogram Textural Classifier (STC). The mentioned classifier was developed especially for oil spill classification purposes and requires PCI software for the whole processing of RADARSAT-1 images. After all, the results of the classifications were analyzed through Visual Analysis; Calculation of Proportionality of Largeness and Analysis Statistics. Amongst the three algorithms of classifications tested, it was noted that there were no significant alterations in relation to the spills classified with the STC, in all of the analyses taken into consideration. Therefore, considering all the procedures, it has been shown that the described methodology can be successfully applied using the unsupervised classifiers tested, resulting in a decrease of time in the identification and classification processing of oil spills, if compared with the utilization of the STC classifier

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The aim of the present study was to trace the mortality profile of the elderly in Brazil using two neighboring age groups: 60 to 69 years (young-old) and 80 years or more (oldest-old). To do this, we sought to characterize the trend and distinctions of different mortality profiles, as well as the quality of the data and associations with socioeconomic and sanitary conditions in the micro-regions of Brazil. Data was collected from the Mortality Information System (SIM) and the Brazilian Institute of Geography and Statistics (IBGE). Based on these data, the coefficients of mortality were calculated for the chapters of the International Disease Classification (ICD-10). A polynomial regression model was used to ascertain the trend of the main chapters. Non-hierarchical cluster analysis (K-Means) was used to obtain the profiles for different Brazilian micro-regions. Factorial analysis of the contextual variables was used to obtain the socio-economic and sanitary deprivation indices (IPSS). The trend of the CMId and of the ratio of its values in the two age groups confirmed a decrease in most of the indicators, particularly for badly-defined causes among the oldest-old. Among the young-old, the following profiles emerged: the Development Profile; the Modernity Profile; the Epidemiological Paradox Profile and the Ignorance Profile. Among the oldest-old, the latter three profiles were confirmed, in addition to the Low Mortality Rates Profile. When comparing the mean IPSS values in global terms, all of the groups were different in both of the age groups. The Ignorance Profile was compared with the other profiles using orthogonal contrasts. This profile differed from all of the others in isolation and in clusters. However, the mean IPSS was similar for the Low Mortality Rates Profile among the oldest-old. Furthermore, associations were found between the data quality indicators, the CMId for badly-defined causes, the general coefficient of mortality for each age group (CGMId) and the IPSS of the micro-regions. The worst rates were recorded in areas with the greatest socioeconomic and sanitary deprivation. The findings of the present study show that, despite the decrease in the mortality coefficients, there are notable differences in the profiles related to contextual conditions, including regional differences in data quality. These differences increase the vulnerability of the age groups studied and the health iniquities that are already present.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The main objective of this study is to apply recently developed methods of physical-statistic to time series analysis, particularly in electrical induction s profiles of oil wells data, to study the petrophysical similarity of those wells in a spatial distribution. For this, we used the DFA method in order to know if we can or not use this technique to characterize spatially the fields. After obtain the DFA values for all wells, we applied clustering analysis. To do these tests we used the non-hierarchical method called K-means. Usually based on the Euclidean distance, the K-means consists in dividing the elements of a data matrix N in k groups, so that the similarities among elements belonging to different groups are the smallest possible. In order to test if a dataset generated by the K-means method or randomly generated datasets form spatial patterns, we created the parameter Ω (index of neighborhood). High values of Ω reveals more aggregated data and low values of Ω show scattered data or data without spatial correlation. Thus we concluded that data from the DFA of 54 wells are grouped and can be used to characterize spatial fields. Applying contour level technique we confirm the results obtained by the K-means, confirming that DFA is effective to perform spatial analysis

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In recent years, the DFA introduced by Peng, was established as an important tool capable of detecting long-range autocorrelation in time series with non-stationary. This technique has been successfully applied to various areas such as: Econophysics, Biophysics, Medicine, Physics and Climatology. In this study, we used the DFA technique to obtain the Hurst exponent (H) of the profile of electric density profile (RHOB) of 53 wells resulting from the Field School of Namorados. In this work we want to know if we can or not use H to spatially characterize the spatial data field. Two cases arise: In the first a set of H reflects the local geology, with wells that are geographically closer showing similar H, and then one can use H in geostatistical procedures. In the second case each well has its proper H and the information of the well are uncorrelated, the profiles show only random fluctuations in H that do not show any spatial structure. Cluster analysis is a method widely used in carrying out statistical analysis. In this work we use the non-hierarchy method of k-means. In order to verify whether a set of data generated by the k-means method shows spatial patterns, we create the parameter Ω (index of neighborhood). High Ω shows more aggregated data, low Ω indicates dispersed or data without spatial correlation. With help of this index and the method of Monte Carlo. Using Ω index we verify that random cluster data shows a distribution of Ω that is lower than actual cluster Ω. Thus we conclude that the data of H obtained in 53 wells are grouped and can be used to characterize space patterns. The analysis of curves level confirmed the results of the k-means

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The extent of the Brazilian Atlantic rainforest, a global biodiversity hotspot, has been reduced to less than 7% of its original range. Yet, it contains one of the richest butterfly fauna in the world. Butterflies are commonly used as environmental indicators, mostly because of their strict association with host plants, microclimate and resource availability. This research describes diversity, composition and species richness of frugivorous butterflies in a forest fragment in the Brazilian Northeast. It compares communities in different physiognomies and seasons. The climate in the study area is classified as tropical rainy, with two well defined seasons. Butterfly captures were made with 60 Van Someren-Rydon traps, randomly located within six different habitat units (10 traps per unit) that varied from very open (e.g. coconut plantation) to forest interior. Sampling was made between January and December 2008, for five days each month. I captured 12090 individuals from 32 species. The most abundant species were Taygetis laches, Opsiphanes invirae and Hamadryas februa, which accounted for 70% of all captures. Similarity analysis identified two main groups, one of species associated with open or disturbed areas and a second by species associated with shaded areas. There was a strong seasonal component in species composition, with less species and lower abundance in the dry season and more species and higher abundance in the rainy season. K-means analysis indicates that choice of habitat units overestimated faunal perceptions, suggesting less distinct units. The species Taygetis virgilia, Hamadryas chloe, Callicore pygas e Morpho achilles were associated with less disturbed habitats, while Yphthimoides sp, Historis odius, H. acheronta, Hamadryas feronia e Siderone marthesia likey indicate open or disturbed habitats. This research brings important information for conservation of frugivorous butterflies, and will serve as baseline for future projects in environmental monitoring