925 resultados para cluster analysis
Resumo:
We describe a network module detection approach which combines a rapid and robust clustering algorithm with an objective measure of the coherence of the modules identified. The approach is applied to the network of genetic regulatory interactions surrounding the tumor suppressor gene p53. This algorithm identifies ten clusters in the p53 network, which are visually coherent and biologically plausible.
Resumo:
Finite mixture models are being increasingly used to model the distributions of a wide variety of random phenomena. While normal mixture models are often used to cluster data sets of continuous multivariate data, a more robust clustering can be obtained by considering the t mixture model-based approach. Mixtures of factor analyzers enable model-based density estimation to be undertaken for high-dimensional data where the number of observations n is very large relative to their dimension p. As the approach using the multivariate normal family of distributions is sensitive to outliers, it is more robust to adopt the multivariate t family for the component error and factor distributions. The computational aspects associated with robustness and high dimensionality in these approaches to cluster analysis are discussed and illustrated.
Resumo:
This paper describes the application of a new technique, rough clustering, to the problem of market segmentation. Rough clustering produces different solutions to k-means analysis because of the possibility of multiple cluster membership of objects. Traditional clustering methods generate extensional descriptions of groups, that show which objects are members of each cluster. Clustering techniques based on rough sets theory generate intensional descriptions, which outline the main characteristics of each cluster. In this study, a rough cluster analysis was conducted on a sample of 437 responses from a larger study of the relationship between shopping orientation (the general predisposition of consumers toward the act of shopping) and intention to purchase products via the Internet. The cluster analysis was based on five measures of shopping orientation: enjoyment, personalization, convenience, loyalty, and price. The rough clusters obtained provide interpretations of different shopping orientations present in the data without the restriction of attempting to fit each object into only one segment. Such descriptions can be an aid to marketers attempting to identify potential segments of consumers.
Resumo:
A culster analysis was performed on 78 cases of Alzheimer's disease (AD) to identify possible pathological subtypes of the disease. Data on 47 neuropathological variables, inculding features of the gross brain and the density and distribution of senile plaques (SP) and neurofibrillary tangles (NFT) were used to describe each case. Cluster analysis is a multivariate statistical method which combines together in groups, AD cases with the most similar neuropathological characteristics. The majority of cases (83%) were clustered into five such groups. The analysis suggested that an initial division of the 78 cases could be made into two major groups: (1) a large group (68%) in which the distribution of SP and NFT was restricted to a relatively small number of brain regions, and (2) a smaller group (15%) in which the lesions were more widely disseminated throughout the neocortex. Each of these groups could be subdivided on the degree of capillary amyloid angiopathy (CAA) present. In addition, those cases with a restricted development of SP/NFT and CAA could be divided further into an early and a late onset form. Familial AD cases did not cluster as a separate group but were either distributed between four of the five groups or were cases with unique combinations of pathological features not closely related to any of the groups. It was concluded that multivariate statistical methods may be of value in the classification of AD into subtypes. © 1994 Springer-Verlag.
Resumo:
Two contrasting multivariate statistical methods, viz., principal components analysis (PCA) and cluster analysis were applied to the study of neuropathological variations between cases of Alzheimer's disease (AD). To compare the two methods, 78 cases of AD were analyzed, each characterised by measurements of 47 neuropathological variables. Both methods of analysis revealed significant variations between AD cases. These variations were related primarily to differences in the distribution and abundance of senile plaques (SP) and neurofibrillary tangles (NFT) in the brain. Cluster analysis classified the majority of AD cases into five groups which could represent subtypes of AD. However, PCA suggested that variation between cases was more continuous with no distinct subtypes. Hence, PCA may be a more appropriate method than cluster analysis in the study of neuropathological variations between AD cases.
Resumo:
This thesis seeks to describe the development of an inexpensive and efficient clustering technique for multivariate data analysis. The technique starts from a multivariate data matrix and ends with graphical representation of the data and pattern recognition discriminant function. The technique also results in distances frequency distribution that might be useful in detecting clustering in the data or for the estimation of parameters useful in the discrimination between the different populations in the data. The technique can also be used in feature selection. The technique is essentially for the discovery of data structure by revealing the component parts of the data. lhe thesis offers three distinct contributions for cluster analysis and pattern recognition techniques. The first contribution is the introduction of transformation function in the technique of nonlinear mapping. The second contribution is the us~ of distances frequency distribution instead of distances time-sequence in nonlinear mapping, The third contribution is the formulation of a new generalised and normalised error function together with its optimal step size formula for gradient method minimisation. The thesis consists of five chapters. The first chapter is the introduction. The second chapter describes multidimensional scaling as an origin of nonlinear mapping technique. The third chapter describes the first developing step in the technique of nonlinear mapping that is the introduction of "transformation function". The fourth chapter describes the second developing step of the nonlinear mapping technique. This is the use of distances frequency distribution instead of distances time-sequence. The chapter also includes the new generalised and normalised error function formulation. Finally, the fifth chapter, the conclusion, evaluates all developments and proposes a new program. for cluster analysis and pattern recognition by integrating all the new features.
Resumo:
PURPOSE: Two common approaches to identify subgroups of patients with bipolar disorder are clustering methodology (mixture analysis) based on the age of onset, and a birth cohort analysis. This study investigates if a birth cohort effect will influence the results of clustering on the age of onset, using a large, international database. METHODS: The database includes 4037 patients with a diagnosis of bipolar I disorder, previously collected at 36 collection sites in 23 countries. Generalized estimating equations (GEE) were used to adjust the data for country median age, and in some models, birth cohort. Model-based clustering (mixture analysis) was then performed on the age of onset data using the residuals. Clinical variables in subgroups were compared. RESULTS: There was a strong birth cohort effect. Without adjusting for the birth cohort, three subgroups were found by clustering. After adjusting for the birth cohort or when considering only those born after 1959, two subgroups were found. With results of either two or three subgroups, the youngest subgroup was more likely to have a family history of mood disorders and a first episode with depressed polarity. However, without adjusting for birth cohort (three subgroups), family history and polarity of the first episode could not be distinguished between the middle and oldest subgroups. CONCLUSION: These results using international data confirm prior findings using single country data, that there are subgroups of bipolar I disorder based on the age of onset, and that there is a birth cohort effect. Including the birth cohort adjustment altered the number and characteristics of subgroups detected when clustering by age of onset. Further investigation is needed to determine if combining both approaches will identify subgroups that are more useful for research.
Resumo:
The paper treats the task for cluster analysis of a given assembly of objects on the basis of the information contained in the description table of these objects. Various methods of cluster analysis are briefly considered. Heuristic method and rules for classification of the given assembly of objects are presented for the cases when their division into classes and the number of classes is not known. The algorithm is checked by a test example and two program products (PP) – learning systems and software for company management. Analysis of the results is presented.
Resumo:
This study subdivides the Weddell Sea, Antarctica, into seafloor regions using multivariate statistical methods. These regions are categories used for comparing, contrasting and quantifying biogeochemical processes and biodiversity between ocean regions geographically but also regions under development within the scope of global change. The division obtained is characterized by the dominating components and interpreted in terms of ruling environmental conditions. The analysis uses 28 environmental variables for the sea surface, 25 variables for the seabed and 9 variables for the analysis between surface and bottom variables. The data were taken during the years 1983-2013. Some data were interpolated. The statistical errors of several interpolation methods (e.g. IDW, Indicator, Ordinary and Co-Kriging) with changing settings have been compared for the identification of the most reasonable method. The multivariate mathematical procedures used are regionalized classification via k means cluster analysis, canonical-correlation analysis and multidimensional scaling. Canonical-correlation analysis identifies the influencing factors in the different parts of the cove. Several methods for the identification of the optimum number of clusters have been tested. For the seabed 8 and 12 clusters were identified as reasonable numbers for clustering the Weddell Sea. For the sea surface the numbers 8 and 13 and for the top/bottom analysis 8 and 3 were identified, respectively. Additionally, the results of 20 clusters are presented for the three alternatives offering the first small scale environmental regionalization of the Weddell Sea. Especially the results of 12 clusters identify marine-influenced regions which can be clearly separated from those determined by the geological catchment area and the ones dominated by river discharge.
Resumo:
[EN]In this paper an architecture for an estimator of short-term wind farm power is proposed. The estimator is made up of a Linear Machine classifier and a set of k Multilayer Perceptrons, training each one for a specific subspace of the input space. The splitting of the input dataset into the k clusters is done using a k-means technique, obtaining the equivalent Linear Machine classifier from the cluster centroids...
Resumo:
A Flood Vulnerability Index (FloodVI) was developed using Principal Component Analysis (PCA) and a new aggregation method based on Cluster Analysis (CA). PCA simplifies a large number of variables into a few uncorrelated factors representing the social, economic, physical and environmental dimensions of vulnerability. CA groups areas that have the same characteristics in terms of vulnerability into vulnerability classes. The grouping of the areas determines their classification contrary to other aggregation methods in which the areas' classification determines their grouping. While other aggregation methods distribute the areas into classes, in an artificial manner, by imposing a certain probability for an area to belong to a certain class, as determined by the assumption that the aggregation measure used is normally distributed, CA does not constrain the distribution of the areas by the classes. FloodVI was designed at the neighbourhood level and was applied to the Portuguese municipality of Vila Nova de Gaia where several flood events have taken place in the recent past. The FloodVI sensitivity was assessed using three different aggregation methods: the sum of component scores, the first component score and the weighted sum of component scores. The results highlight the sensitivity of the FloodVI to different aggregation methods. Both sum of component scores and weighted sum of component scores have shown similar results. The first component score aggregation method classifies almost all areas as having medium vulnerability and finally the results obtained using the CA show a distinct differentiation of the vulnerability where hot spots can be clearly identified. The information provided by records of previous flood events corroborate the results obtained with CA, because the inundated areas with greater damages are those that are identified as high and very high vulnerability areas by CA. This supports the fact that CA provides a reliable FloodVI.
Resumo:
2016
Resumo:
The taxonomy of the N(2)-fixing bacteria belonging to the genus Bradyrhizobium is still poorly refined, mainly due to conflicting results obtained by the analysis of the phenotypic and genotypic properties. This paper presents an application of a method aiming at the identification of possible new clusters within a Brazilian collection of 119 Bradryrhizobium strains showing phenotypic characteristics of B. japonicum and B. elkanii. The stability was studied as a function of the number of restriction enzymes used in the RFLP-PCR analysis of three ribosomal regions with three restriction enzymes per region. The method proposed here uses Clustering algorithms with distances calculated by average-linkage clustering. Introducing perturbations using sub-sampling techniques makes the stability analysis. The method showed efficacy in the grouping of the species B. japonicum and B. elkanii. Furthermore, two new clusters were clearly defined, indicating possible new species, and sub-clusters within each detected cluster. (C) 2008 Elsevier B.V. All rights reserved.