812 resultados para Hier-archical clustering
Resumo:
Motivation: The clustering of gene profiles across some experimental conditions of interest contributes significantly to the elucidation of unknown gene function, the validation of gene discoveries and the interpretation of biological processes. However, this clustering problem is not straightforward as the profiles of the genes are not all independently distributed and the expression levels may have been obtained from an experimental design involving replicated arrays. Ignoring the dependence between the gene profiles and the structure of the replicated data can result in important sources of variability in the experiments being overlooked in the analysis, with the consequent possibility of misleading inferences being made. We propose a random-effects model that provides a unified approach to the clustering of genes with correlated expression levels measured in a wide variety of experimental situations. Our model is an extension of the normal mixture model to account for the correlations between the gene profiles and to enable covariate information to be incorporated into the clustering process. Hence the model is applicable to longitudinal studies with or without replication, for example, time-course experiments by using time as a covariate, and to cross-sectional experiments by using categorical covariates to represent the different experimental classes. Results: We show that our random-effects model can be fitted by maximum likelihood via the EM algorithm for which the E(expectation) and M(maximization) steps can be implemented in closed form. Hence our model can be fitted deterministically without the need for time-consuming Monte Carlo approximations. The effectiveness of our model-based procedure for the clustering of correlated gene profiles is demonstrated on three real datasets, representing typical microarray experimental designs, covering time-course, repeated-measurement and cross-sectional data. In these examples, relevant clusters of the genes are obtained, which are supported by existing gene-function annotation. A synthetic dataset is considered too.
Resumo:
We have undertaken two-dimensional gel electrophoresis proteomic profiling on a series of cell lines with different recombinant antibody production rates. Due to the nature of gel-based experiments not all protein spots are detected across all samples in an experiment, and hence datasets are invariably incomplete. New approaches are therefore required for the analysis of such graduated datasets. We approached this problem in two ways. Firstly, we applied a missing value imputation technique to calculate missing data points. Secondly, we combined a singular value decomposition based hierarchical clustering with the expression variability test to identify protein spots whose expression correlates with increased antibody production. The results have shown that while imputation of missing data was a useful method to improve the statistical analysis of such data sets, this was of limited use in differentiating between the samples investigated, and highlighted a small number of candidate proteins for further investigation. (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
Quality of life has been shown to be poor among people living with chronic hepatitis C However, it is not clear how this relates to the presence of symptoms and their severity. The aim of this study was to describe the typology of a broad array of symptoms that were attributed to hepatitis C virus (HCV) infection. Phase I used qualitative methods to identify symptoms. In Phase 2, 188 treatment-naive people living with HCV participated in a quantitative survey. The most prevalent symptom was physical tiredness (86%) followed by irritability (75%), depression (70%), mental tiredness (70%), and abdominal pain (68%). Temporal clustering of symptoms was reported in 62% of participants. Principal components analysis identified four symptom clusters: neuropsychiatric (mental tiredness, poor concentration, forgetfulness, depression, irritability, physical tiredness, and sleep problems); gastrointestinal (day sweats, nausea, food intolerance, night sweats, abdominal pain, poor appetite, and diarrhea); algesic (joint pain, muscle pain, and general body pain); and dysesthetic (noise sensitivity, light sensitivity, skin. problems, and headaches). These data demonstrate that symptoms are prevalent in treatment-naive people with HCV and support the hypothesis that symptom clustering occurs.
Resumo:
In this paper we present an efficient k-Means clustering algorithm for two dimensional data. The proposed algorithm re-organizes dataset into a form of nested binary tree*. Data items are compared at each node with only two nearest means with respect to each dimension and assigned to the one that has the closer mean. The main intuition of our research is as follows: We build the nested binary tree. Then we scan the data in raster order by in-order traversal of the tree. Lastly we compare data item at each node to the only two nearest means to assign the value to the intendant cluster. In this way we are able to save the computational cost significantly by reducing the number of comparisons with means and also by the least use to Euclidian distance formula. Our results showed that our method can perform clustering operation much faster than the classical ones. © Springer-Verlag Berlin Heidelberg 2005
Resumo:
In patients with Pick's disease (PD), high densities of tau positive Pick bodies (PB) have been observed within the granule cell layer of the dentate gyrus. This study investigated the spatial patterns of PB along the granule cell layer in coronal sections of the hippocampus in eight patients with PD. In all patients, there was evidence of clustering of PB within the granule cell layer; however, there was considerable variation in the pattern of clustering. In five patients, the clusters of PB were regularly distributed along the dentate gyms, and in two of these patients, the smaller clusters were aggregated into larger superclusters. In three patients, a single large cluster of PB, more than 1200 μm in diameter, was present. Clustering of PB may reflect a primary degenerative process within the granule cells or the degeneration of pathways that project to the dentate gyrus.
Resumo:
This study tested three hypotheses: (1) that there is clustering of the neuronal cytoplasmic inclusions (NCI), astrocytic plaques (AP) and ballooned neurons (BN) in corticobasal degeneration (CBD), (2) that the clusters of NCI and BN are not spatially correlated, and (3) that the lesions are correlated with disease ‘stage’. In 50% of the regions, clusters of lesions were 400–800 µm in diameter and regularly distributed parallel to the tissue boundary. Clusters of NCI and BN were larger in laminae II/III and V/VI, respectively. In a third of regions, the clusters of BN and NCI were negatively spatially correlated. Cluster size of the BN in the parahippocampal gyrus (PHG) was positively correlated with disease ‘stage’. The data suggest the following: (1) degeneration of the cortico-cortical pathways in CBD, (2) clusters of NCI and BN may affect different anatomical pathways and (3) BN may develop after the NCI in the PHG.
Resumo:
In Alzheimer's disease (AD), neurofibrillary tangles (NFT) occur within neurons in both the upper and lower cortical laminae. Using a statistical method that estimates the size and spacing of NFT clusters along the cortex parallel to the pia mater, two hypotheses were tested: 1) that the cluster size and distribution of the NFT in gyri of the temporal lobe reflect degeneration of the feedforward (FF) and feedback (FB) cortico-cortical pathways, and 2) that there is a spatial relationship between the clusters of NFT in the upper and lower laminae. In 16 temporal lobe gyri from 10 cases of sporadic AD, NFT were present in both the upper and lower laminae in 11/16 (69%) gyri and in either the upper or lower laminae in 5/16 (31%) gyri. Clustering of the NFT was observed in all gyri. A significant peak-to-peak distance was observed in the upper laminae in 13/15 (87%) gyri and in the lower laminae in 8/ 12 (67%) gyri, suggesting a regularly repeating pattern of NFT clusters along the cortex. The regularly distributed clusters of NFT were between 500 and 800 μm in size, the estimated size of the cells of origin of the FF and FB cortico-cortical projections, in the upper laminae of 6/13 (46%) gyri and in the lower laminae of 2/8 (25%) gyri. Clusters of NFT in the upper laminae were spatially correlated (in phase) with those in the lower laminae in 5/16 (31%) gyri. The clustering patterns of the NFT are consistent with their formation in relation to the FF and FB cortico-cortical pathways. In most gyri, NFT clusters appeared to develop independently in the upper and lower laminae.
Resumo:
Dementia with neurofilament inclusions (DNI) is a new disorder characterized clinically by early-onset dementia and histologically by the presence of intraneural inclusions immunopositive for neurofilament antigens but lacking tau and α-synuclein reactivity. We studied the clustering patterns of the neurofilament inclusions (NI) in regions of the temporal lobe in three cases of DNI to determine whether they have the same spatial patterns as inclusions in the tauopathies and α-synucleinopathies. The NI exhibited a clustered distribution (mean size of clusters 400 μm, range 50-800 μm, SD 687.8) in 24/28 of the areas studied. In 22 of these areas, the clusters exhibited a regular distribution along the tissue parallel to the pia mater or alveus. In 3 cortical areas, there was evidence of a more complex pattern in which the NI clusters were aggregated into larger superclusters. In 6 cortical areas, the size of the clusters approximated to those of the cells of origin of the cortico-cortical pathways but in the remaining areas cluster size was smaller than 400 μm. Despite the unique molecular profile of the NI, their spatial patterns are similar to those shown by filamentous neuronal inclusions in the tauopathies and α-synucleinopathies.