783 resultados para grid, clustering, statistical, clustering
Resumo:
This paper proposes an adaptive algorithm for clustering cumulative probability distribution functions (c.p.d.f.) of a continuous random variable, observed in different populations, into the minimum homogeneous clusters, making no parametric assumptions about the c.p.d.f.’s. The distance function for clustering c.p.d.f.’s that is proposed is based on the Kolmogorov–Smirnov two sample statistic. This test is able to detect differences in position, dispersion or shape of the c.p.d.f.’s. In our context, this statistic allows us to cluster the recorded data with a homogeneity criterion based on the whole distribution of each data set, and to decide whether it is necessary to add more clusters or not. In this sense, the proposed algorithm is adaptive as it automatically increases the number of clusters only as necessary; therefore, there is no need to fix in advance the number of clusters. The output of the algorithm are the common c.p.d.f. of all observed data in the cluster (the centroid) and, for each cluster, the Kolmogorov–Smirnov statistic between the centroid and the most distant c.p.d.f. The proposed algorithm has been used for a large data set of solar global irradiation spectra distributions. The results obtained enable to reduce all the information of more than 270,000 c.p.d.f.’s in only 6 different clusters that correspond to 6 different c.p.d.f.’s.
Resumo:
There is increasing evidence to support the notion that membrane proteins, instead of being isolated components floating in a fluid lipid environment, can be assembled into supramolecular complexes that take part in a variety of cooperative cellular functions. The interplay between lipid-protein and protein-protein interactions is expected to be a determinant factor in the assembly and dynamics of such membrane complexes. Here we report on a role of anionic phospholipids in determining the extent of clustering of KcsA, a model potassium channel. Assembly/disassembly of channel clusters occurs, at least partly, as a consequence of competing lipid-protein and protein-protein interactions at nonannular lipid binding sites on the channel surface and brings about profound changes in the gating properties of the channel. Our results suggest that these latter effects of anionic lipids are mediated via the Trp67–Glu71–Asp80 inactivation triad within the channel structure and its bearing on the selectivity filter.
Resumo:
"May 1980."
Resumo:
Originally presented as the author's thesis (M.A.), University of Illinois at Urbana-Champaign.
Resumo:
Project officer: William B. Fetters.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
We consider the problem of assessing the number of clusters in a limited number of tissue samples containing gene expressions for possibly several thousands of genes. It is proposed to use a normal mixture model-based approach to the clustering of the tissue samples. One advantage of this approach is that the question on the number of clusters in the data can be formulated in terms of a test on the smallest number of components in the mixture model compatible with the data. This test can be carried out on the basis of the likelihood ratio test statistic, using resampling to assess its null distribution. The effectiveness of this approach is demonstrated on simulated data and on some microarray datasets, as considered previously in the bioinformatics literature. (C) 2004 Elsevier Inc. All rights reserved.
Resumo:
Objectives: The objectives of this study were to examine the extent of clustering of smoking, high levels of television watching, overweight, and high blood pressure among adolescents and whether this clustering varies by socioeconomic position and Cognitive function. Methods: This study was a cross-sectional analysis of 3613 (1742 females) participants of an Australian birth cohort who were examined at age 14. Results: Three hundred fifty-three (9.8%) of the participants had co-occurrence of three or four risk factors. Risk factors clustered in these adolescents with a greater number of participants than would be predicted by assumptions of independence having no risk factors and three or four risk factors. The extent of clustering tended to be greater in those from lower-income families and among those with lower cognitive function. The age-adjusted ratio of observed to expected cooccurrence of three or four risk factors was 2.70 (95% confidence interval [Cl], 1.80-4.06) among those from low-income families and 1.70 (95% Cl, 1.34-2.16) among those from more affluent families. The ratio among those with low Raven's scores (nonverbal reasoning) was 2.36 (95% Cl, 1.69-3.30) and among those with higher scores was 1.51 (95% Cl, 1.19-1.92); similar results for the WRAT 3 score (reading ability) were 2.69 (95% Cl, 1.85-3.94) and 1.68 (95% Cl, 1.34-2.11). Clustering did not differ by sex. Conclusion: Among adolescents, coronary heart disease risk factors cluster, and there is some evidence that this clustering is greater among those from families with low income and those who have lower cognitive function.
Resumo:
Motivation: The clustering of gene profiles across some experimental conditions of interest contributes significantly to the elucidation of unknown gene function, the validation of gene discoveries and the interpretation of biological processes. However, this clustering problem is not straightforward as the profiles of the genes are not all independently distributed and the expression levels may have been obtained from an experimental design involving replicated arrays. Ignoring the dependence between the gene profiles and the structure of the replicated data can result in important sources of variability in the experiments being overlooked in the analysis, with the consequent possibility of misleading inferences being made. We propose a random-effects model that provides a unified approach to the clustering of genes with correlated expression levels measured in a wide variety of experimental situations. Our model is an extension of the normal mixture model to account for the correlations between the gene profiles and to enable covariate information to be incorporated into the clustering process. Hence the model is applicable to longitudinal studies with or without replication, for example, time-course experiments by using time as a covariate, and to cross-sectional experiments by using categorical covariates to represent the different experimental classes. Results: We show that our random-effects model can be fitted by maximum likelihood via the EM algorithm for which the E(expectation) and M(maximization) steps can be implemented in closed form. Hence our model can be fitted deterministically without the need for time-consuming Monte Carlo approximations. The effectiveness of our model-based procedure for the clustering of correlated gene profiles is demonstrated on three real datasets, representing typical microarray experimental designs, covering time-course, repeated-measurement and cross-sectional data. In these examples, relevant clusters of the genes are obtained, which are supported by existing gene-function annotation. A synthetic dataset is considered too.
Resumo:
Quality of life has been shown to be poor among people living with chronic hepatitis C However, it is not clear how this relates to the presence of symptoms and their severity. The aim of this study was to describe the typology of a broad array of symptoms that were attributed to hepatitis C virus (HCV) infection. Phase I used qualitative methods to identify symptoms. In Phase 2, 188 treatment-naive people living with HCV participated in a quantitative survey. The most prevalent symptom was physical tiredness (86%) followed by irritability (75%), depression (70%), mental tiredness (70%), and abdominal pain (68%). Temporal clustering of symptoms was reported in 62% of participants. Principal components analysis identified four symptom clusters: neuropsychiatric (mental tiredness, poor concentration, forgetfulness, depression, irritability, physical tiredness, and sleep problems); gastrointestinal (day sweats, nausea, food intolerance, night sweats, abdominal pain, poor appetite, and diarrhea); algesic (joint pain, muscle pain, and general body pain); and dysesthetic (noise sensitivity, light sensitivity, skin. problems, and headaches). These data demonstrate that symptoms are prevalent in treatment-naive people with HCV and support the hypothesis that symptom clustering occurs.
Resumo:
In this paper we present an efficient k-Means clustering algorithm for two dimensional data. The proposed algorithm re-organizes dataset into a form of nested binary tree*. Data items are compared at each node with only two nearest means with respect to each dimension and assigned to the one that has the closer mean. The main intuition of our research is as follows: We build the nested binary tree. Then we scan the data in raster order by in-order traversal of the tree. Lastly we compare data item at each node to the only two nearest means to assign the value to the intendant cluster. In this way we are able to save the computational cost significantly by reducing the number of comparisons with means and also by the least use to Euclidian distance formula. Our results showed that our method can perform clustering operation much faster than the classical ones. © Springer-Verlag Berlin Heidelberg 2005