64 resultados para Two-step Cluster Analysis
em University of Queensland eSpace - Australia
Resumo:
This paper considers a model-based approach to the clustering of tissue samples of a very large number of genes from microarray experiments. It is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. Frequently in practice, there are also clinical data available on those cases on which the tissue samples have been obtained. Here we investigate how to use the clinical data in conjunction with the microarray gene expression data to cluster the tissue samples. We propose two mixture model-based approaches in which the number of components in the mixture model corresponds to the number of clusters to be imposed on the tissue samples. One approach specifies the components of the mixture model to be the conditional distributions of the microarray data given the clinical data with the mixing proportions also conditioned on the latter data. Another takes the components of the mixture model to represent the joint distributions of the clinical and microarray data. The approaches are demonstrated on some breast cancer data, as studied recently in van't Veer et al. (2002).
Resumo:
The purpose of this study was to investigate the relationship between self-awareness, emotional distress, motivation, and outcome in adults with severe traumatic brain injury. A sample of 55 patients were selected from 120 consecutive patients with severe traumatic brain injury admitted to the rehabilitation unit of a large metropolitan public hospital. Subjects received multidisciplinary inpatient rehabilitation and different types of outpatient rehabilitation and community-based services according to availability and need, Measures used in the cluster analysis were the Patient Competency Rating Scale, Self-Awareness of Deficits Interview, Head Injury Behavior Scale, Change Assessment Questionnaire, the Beck Depression Inventory, and Beck Anxiety Inventory; outcome measures were the Disability Rating Scale, Community Integration Questionnaire, and Sickness Impact Profile. A three-cluster solution was selected, with groups labeled as high self-awareness (n = 23), low self-awareness (n = 23), and good recovery (n = 8). The high self-awareness cluster had significantly higher levels of self-awareness, motivation, and emotional distress than the low self-awareness cluster but did not differ significantly in outcome. Self-awareness after brain injury is associated with greater motivation to change behavior and higher levels of depression and anxiety; however, it was not clear that this heightened motivation actually led to any improvement in outcome. Rehabilitation timing and approach may need to be tailored to match the individual's level of self-awareness, motivation, and emotional distress.
Resumo:
Conventionally, protein structure prediction via threading relies on some nonoptimal method to align a protein sequence to each member of a library of known structures. We show how a score function (force field) can be modified so as to allow the direct application of a dynamic programming algorithm to the problem. This involves an approximation whose damage can be minimized by an optimization process during score function parameter determination. The method is compared to sequence to structure alignments using a more conventional pair-wise score function and the frozen approximation. The new method produces results comparable to the frozen approximation, but is faster and has fewer adjustable parameters. It is also free of memory of the template's original amino acid sequence, and does not suffer from a problem of nonconvergence, which can be shown to occur with the frozen approximation. Alignments generated by the simplified score function can then be ranked using a second score function with the approximations removed. (C) 1999 John Wiley & Sons, Inc.
Resumo:
High quality MSS membranes were synthesised by a single-step and two-step catalysed hydrolyses employing tetraethylorthosilicate (TEOS), absolute ethanol (EtOH), I M nitric acid (HNO3) and distilled water (H2O). The Si-29 NMR results showed that the two-step xerogels consistently had more contribution of silanol groups (Q(3) and Q(2)) than the single-step xerogel. According to the fractal theory, high contribution of Q(2) and Q(3) species are responsible for the formation of weakly branched systems leading to low pore volume of microporous dimension. The transport of diffusing gases in these membranes is shown to be activated as the permeance increased with temperature. Albeit the permeance of He for both single-step and two-step membranes are very similar, the two-step membranes permselectivity (ideal separation factor) for He/CO2 (69-319) and He/CH4 (585-958) are one to two orders of magnitude higher than the single-step membranes results of 2-7 and 69, respectively. The two-step membranes have high activation energy for He and H-2 permeance, in excess of 16 kJ mol(-1). The mobility energy for He permeance is three to six-fold higher for the two-step than the single-step membranes. As the mobility energy is higher for small pores than large pores and coupled with the permselectivity results, the two-step catalysed hydrolysis sol-gel process resulted in the formation of pore sizes in the region of 3 Angstrom while the single-step process tended to produce slightly larger pores. (C) 2002 Elsevier Science B.V. All rights reserved.
Resumo:
Plasma levels of lipoprotein(a) _ Lp(a) _ are associated with cardiovascular risk (Danesh et al., 2000) and were long believed to be influenced by the LPA locus on chromosome 6q27 only. However, a recent report of Broeckel et al. (2002) suggested the presence of a second quantitative trait locus on chromosome 1 influencing Lp(a) levels. Using a two-locus model, we found no evidence for an additional Lp(a) locus on chromosome 1 in a linkage study among 483 dizygotic twin pairs.
Resumo:
Cluster analysis via a finite mixture model approach is considered. With this approach to clustering, the data can be partitioned into a specified number of clusters g by first fitting a mixture model with g components. An outright clustering of the data is then obtained by assigning an observation to the component to which it has the highest estimated posterior probability of belonging; that is, the ith cluster consists of those observations assigned to the ith component (i = 1,..., g). The focus is on the use of mixtures of normal components for the cluster analysis of data that can be regarded as being continuous. But attention is also given to the case of mixed data, where the observations consist of both continuous and discrete variables.
Resumo:
This article reports a study of the effects of synthesis parameters on the preparation and formation of mesoporous titania nanopowders by employing a two-step sol-gel method. These materials displayed crystalline domains characteristic of anatase. The first step of the process involved the hydrolysis of titanium isopropoxide in a basic aqueous solution mediated by neutral surfactant. The solid product obtained from step 1 was then treated in an acidified ethanol solution containing the same titanium precursor to thicken the pore walls. Low pH and higher loading of the Ti precursor in step 2 produced better mesoporosity and crystallinity of titanium dioxide polymorphs. The resultant powder exhibited a high surface area (73.8 m(2)/g) and large pore volume (0.17 cm(3)/g) with uniform mesopores. These materials are envisaged to be used as precursors for mesoporous titania films as a wide band gap semiconductor in dye-sensitized nanocrystalline TiO2 solar cells.
Resumo:
Normal mixture models are often used to cluster continuous data. However, conventional approaches for fitting these models will have problems in producing nonsingular estimates of the component-covariance matrices when the dimension of the observations is large relative to the number of observations. In this case, methods such as principal components analysis (PCA) and the mixture of factor analyzers model can be adopted to avoid these estimation problems. We examine these approaches applied to the Cabernet wine data set of Ashenfelter (1999), considering the clustering of both the wines and the judges, and comparing our results with another analysis. The mixture of factor analyzers model proves particularly effective in clustering the wines, accurately classifying many of the wines by location.
Resumo:
This research adopts a resource allocation theoretical framework to generate predictions regarding the relationship between self-efficacy and task performance from two levels of analysis and specificity. Participants were given multiple trials of practice on an air traffic control task. Measures of task-specific self-efficacy and performance were taken at repeated intervals. The authors used multilevel analysis to demonstrate dynamic main effects, dynamic mediation and dynamic moderation. As predicted, the positive effects of overall task specific self-efficacy and general self-efficacy on task performance strengthened throughout practice. In line with these dynamic main effects, the effect of general self-efficacy was mediated by overall task specific self-efficacy; however this pattern emerged over time. Finally, changes in task specific self-efficacy were negatively associated with changes in performance at the within-person level; however this effect only emerged towards the end of practice for individuals with high levels of overall task specific self-efficacy. These novel findings emphasise the importance of conceptualising self-efficacy within a multi-level and multi-specificity framework and make a significant contribution to understanding the way this construct relates to task performance.
Resumo:
We describe a network module detection approach which combines a rapid and robust clustering algorithm with an objective measure of the coherence of the modules identified. The approach is applied to the network of genetic regulatory interactions surrounding the tumor suppressor gene p53. This algorithm identifies ten clusters in the p53 network, which are visually coherent and biologically plausible.
Resumo:
Finite mixture models are being increasingly used to model the distributions of a wide variety of random phenomena. While normal mixture models are often used to cluster data sets of continuous multivariate data, a more robust clustering can be obtained by considering the t mixture model-based approach. Mixtures of factor analyzers enable model-based density estimation to be undertaken for high-dimensional data where the number of observations n is very large relative to their dimension p. As the approach using the multivariate normal family of distributions is sensitive to outliers, it is more robust to adopt the multivariate t family for the component error and factor distributions. The computational aspects associated with robustness and high dimensionality in these approaches to cluster analysis are discussed and illustrated.
Resumo:
This paper describes the application of a new technique, rough clustering, to the problem of market segmentation. Rough clustering produces different solutions to k-means analysis because of the possibility of multiple cluster membership of objects. Traditional clustering methods generate extensional descriptions of groups, that show which objects are members of each cluster. Clustering techniques based on rough sets theory generate intensional descriptions, which outline the main characteristics of each cluster. In this study, a rough cluster analysis was conducted on a sample of 437 responses from a larger study of the relationship between shopping orientation (the general predisposition of consumers toward the act of shopping) and intention to purchase products via the Internet. The cluster analysis was based on five measures of shopping orientation: enjoyment, personalization, convenience, loyalty, and price. The rough clusters obtained provide interpretations of different shopping orientations present in the data without the restriction of attempting to fit each object into only one segment. Such descriptions can be an aid to marketers attempting to identify potential segments of consumers.
Resumo:
Inaccurate species identification confounds insect ecological studies. Examining aspects of Trichogramma ecology pertinent to the novel insect resistance management strategy for future transgenic cotton, Gossypium hirsutum L., production in the Ord River Irrigation Area (ORIA) of Western Australia required accurate differentiation between morphologically similar Trichogramma species. Established molecular diagnostic methods for Trichogramma identification use species-specific sequence difference in the internal transcribed spacer (ITS)-2 chromosomal region; yet, difficulties arise discerning polymerase chain reaction (PCR) fragments of similar base pair length by gel electrophoresis. This necessitates the restriction enzyme digestion of PCR-amplified ITS-2 fragments to readily differentiate Trichogramma australicum Girault and Trichogramma pretiosum Riley. To overcome the time and expense associated with a two-step diagnostic procedure, we developed a “one-step” multiplex PCR technique using species-specific primers designed to the ITS-2 region. This approach allowed for a high-throughput analysis of samples as part of ongoing ecological studies examining Trichogramma biological control potential in the ORIA where these two species occur in sympatry.