853 resultados para Unsupervised clustering
Resumo:
Motivation: The clustering of gene profiles across some experimental conditions of interest contributes significantly to the elucidation of unknown gene function, the validation of gene discoveries and the interpretation of biological processes. However, this clustering problem is not straightforward as the profiles of the genes are not all independently distributed and the expression levels may have been obtained from an experimental design involving replicated arrays. Ignoring the dependence between the gene profiles and the structure of the replicated data can result in important sources of variability in the experiments being overlooked in the analysis, with the consequent possibility of misleading inferences being made. We propose a random-effects model that provides a unified approach to the clustering of genes with correlated expression levels measured in a wide variety of experimental situations. Our model is an extension of the normal mixture model to account for the correlations between the gene profiles and to enable covariate information to be incorporated into the clustering process. Hence the model is applicable to longitudinal studies with or without replication, for example, time-course experiments by using time as a covariate, and to cross-sectional experiments by using categorical covariates to represent the different experimental classes. Results: We show that our random-effects model can be fitted by maximum likelihood via the EM algorithm for which the E(expectation) and M(maximization) steps can be implemented in closed form. Hence our model can be fitted deterministically without the need for time-consuming Monte Carlo approximations. The effectiveness of our model-based procedure for the clustering of correlated gene profiles is demonstrated on three real datasets, representing typical microarray experimental designs, covering time-course, repeated-measurement and cross-sectional data. In these examples, relevant clusters of the genes are obtained, which are supported by existing gene-function annotation. A synthetic dataset is considered too.
Resumo:
The Leximancer system is a relatively new method for transforming lexical co-occurrence information from natural language into semantic patterns in an unsupervised manner. It employs two stages of co-occurrence information extraction-semantic and relational-using a different algorithm for each stage. The algorithms used are statistical, but they employ nonlinear dynamics and machine learning. This article is an attempt to validate the output of Leximancer, using a set of evaluation criteria taken from content analysis that are appropriate for knowledge discovery tasks.
Resumo:
We have undertaken two-dimensional gel electrophoresis proteomic profiling on a series of cell lines with different recombinant antibody production rates. Due to the nature of gel-based experiments not all protein spots are detected across all samples in an experiment, and hence datasets are invariably incomplete. New approaches are therefore required for the analysis of such graduated datasets. We approached this problem in two ways. Firstly, we applied a missing value imputation technique to calculate missing data points. Secondly, we combined a singular value decomposition based hierarchical clustering with the expression variability test to identify protein spots whose expression correlates with increased antibody production. The results have shown that while imputation of missing data was a useful method to improve the statistical analysis of such data sets, this was of limited use in differentiating between the samples investigated, and highlighted a small number of candidate proteins for further investigation. (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
Quality of life has been shown to be poor among people living with chronic hepatitis C However, it is not clear how this relates to the presence of symptoms and their severity. The aim of this study was to describe the typology of a broad array of symptoms that were attributed to hepatitis C virus (HCV) infection. Phase I used qualitative methods to identify symptoms. In Phase 2, 188 treatment-naive people living with HCV participated in a quantitative survey. The most prevalent symptom was physical tiredness (86%) followed by irritability (75%), depression (70%), mental tiredness (70%), and abdominal pain (68%). Temporal clustering of symptoms was reported in 62% of participants. Principal components analysis identified four symptom clusters: neuropsychiatric (mental tiredness, poor concentration, forgetfulness, depression, irritability, physical tiredness, and sleep problems); gastrointestinal (day sweats, nausea, food intolerance, night sweats, abdominal pain, poor appetite, and diarrhea); algesic (joint pain, muscle pain, and general body pain); and dysesthetic (noise sensitivity, light sensitivity, skin. problems, and headaches). These data demonstrate that symptoms are prevalent in treatment-naive people with HCV and support the hypothesis that symptom clustering occurs.
Resumo:
The texture segmentation techniques are diversified by the existence of several approaches. In this paper, we propose fuzzy features for the segmentation of texture image. For this purpose, a membership function is constructed to represent the effect of the neighboring pixels on the current pixel in a window. Using these membership function values, we find a feature by weighted average method for the current pixel. This is repeated for all pixels in the window treating each time one pixel as the current pixel. Using these fuzzy based features, we derive three descriptors such as maximum, entropy, and energy for each window. To segment the texture image, the modified mountain clustering that is unsupervised and fuzzy c-means clustering have been used. The performance of the proposed features is compared with that of fractal features.
Resumo:
In this paper we present an efficient k-Means clustering algorithm for two dimensional data. The proposed algorithm re-organizes dataset into a form of nested binary tree*. Data items are compared at each node with only two nearest means with respect to each dimension and assigned to the one that has the closer mean. The main intuition of our research is as follows: We build the nested binary tree. Then we scan the data in raster order by in-order traversal of the tree. Lastly we compare data item at each node to the only two nearest means to assign the value to the intendant cluster. In this way we are able to save the computational cost significantly by reducing the number of comparisons with means and also by the least use to Euclidian distance formula. Our results showed that our method can perform clustering operation much faster than the classical ones. © Springer-Verlag Berlin Heidelberg 2005
Resumo:
In patients with Pick's disease (PD), high densities of tau positive Pick bodies (PB) have been observed within the granule cell layer of the dentate gyrus. This study investigated the spatial patterns of PB along the granule cell layer in coronal sections of the hippocampus in eight patients with PD. In all patients, there was evidence of clustering of PB within the granule cell layer; however, there was considerable variation in the pattern of clustering. In five patients, the clusters of PB were regularly distributed along the dentate gyms, and in two of these patients, the smaller clusters were aggregated into larger superclusters. In three patients, a single large cluster of PB, more than 1200 μm in diameter, was present. Clustering of PB may reflect a primary degenerative process within the granule cells or the degeneration of pathways that project to the dentate gyrus.
Resumo:
This study tested three hypotheses: (1) that there is clustering of the neuronal cytoplasmic inclusions (NCI), astrocytic plaques (AP) and ballooned neurons (BN) in corticobasal degeneration (CBD), (2) that the clusters of NCI and BN are not spatially correlated, and (3) that the lesions are correlated with disease ‘stage’. In 50% of the regions, clusters of lesions were 400–800 µm in diameter and regularly distributed parallel to the tissue boundary. Clusters of NCI and BN were larger in laminae II/III and V/VI, respectively. In a third of regions, the clusters of BN and NCI were negatively spatially correlated. Cluster size of the BN in the parahippocampal gyrus (PHG) was positively correlated with disease ‘stage’. The data suggest the following: (1) degeneration of the cortico-cortical pathways in CBD, (2) clusters of NCI and BN may affect different anatomical pathways and (3) BN may develop after the NCI in the PHG.