Biblioteca Digital

27 resultados para HIERARCHICAL CLUSTER ANALYSIS

em University of Queensland eSpace - Australia

Cluster analysis of self-awareness levels in adults with traumatic brain injury and relationship to outcome

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The purpose of this study was to investigate the relationship between self-awareness, emotional distress, motivation, and outcome in adults with severe traumatic brain injury. A sample of 55 patients were selected from 120 consecutive patients with severe traumatic brain injury admitted to the rehabilitation unit of a large metropolitan public hospital. Subjects received multidisciplinary inpatient rehabilitation and different types of outpatient rehabilitation and community-based services according to availability and need, Measures used in the cluster analysis were the Patient Competency Rating Scale, Self-Awareness of Deficits Interview, Head Injury Behavior Scale, Change Assessment Questionnaire, the Beck Depression Inventory, and Beck Anxiety Inventory; outcome measures were the Disability Rating Scale, Community Integration Questionnaire, and Sickness Impact Profile. A three-cluster solution was selected, with groups labeled as high self-awareness (n = 23), low self-awareness (n = 23), and good recovery (n = 8). The high self-awareness cluster had significantly higher levels of self-awareness, motivation, and emotional distress than the low self-awareness cluster but did not differ significantly in outcome. Self-awareness after brain injury is associated with greater motivation to change behavior and higher levels of depression and anxiety; however, it was not clear that this heightened motivation actually led to any improvement in outcome. Rehabilitation timing and approach may need to be tailored to match the individual's level of self-awareness, motivation, and emotional distress.

Mixture modelling for cluster analysis

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cluster analysis via a finite mixture model approach is considered. With this approach to clustering, the data can be partitioned into a specified number of clusters g by first fitting a mixture model with g components. An outright clustering of the data is then obtained by assigning an observation to the component to which it has the highest estimated posterior probability of belonging; that is, the ith cluster consists of those observations assigned to the ith component (i = 1,..., g). The focus is on the use of mixtures of normal components for the cluster analysis of data that can be regarded as being continuous. But attention is also given to the case of mixed data, where the observations consist of both continuous and discrete variables.

Cluster analysis of high-dimensional data: A case study

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Normal mixture models are often used to cluster continuous data. However, conventional approaches for fitting these models will have problems in producing nonsingular estimates of the component-covariance matrices when the dimension of the observations is large relative to the number of observations. In this case, methods such as principal components analysis (PCA) and the mixture of factor analyzers model can be adopted to avoid these estimation problems. We examine these approaches applied to the Cabernet wine data set of Ashenfelter (1999), considering the clustering of both the wines and the judges, and comparing our results with another analysis. The mixture of factor analyzers model proves particularly effective in clustering the wines, accurately classifying many of the wines by location.

Robust cluster analysis via mixture models

Relevância:

100.00% 100.00%

Publicador:

On the simultaneous use of clinical and microarray expression data in the cluster analysis of tissue samples

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper considers a model-based approach to the clustering of tissue samples of a very large number of genes from microarray experiments. It is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. Frequently in practice, there are also clinical data available on those cases on which the tissue samples have been obtained. Here we investigate how to use the clinical data in conjunction with the microarray gene expression data to cluster the tissue samples. We propose two mixture model-based approaches in which the number of components in the mixture model corresponds to the number of clusters to be imposed on the tissue samples. One approach specifies the components of the mixture model to be the conditional distributions of the microarray data given the clinical data with the mixing proportions also conditioned on the latter data. Another takes the components of the mixture model to represent the joint distributions of the clinical and microarray data. The approaches are demonstrated on some breast cancer data, as studied recently in van't Veer et al. (2002).

Cluster analysis of the p53 genetic regulatory network: topology and biology

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We describe a network module detection approach which combines a rapid and robust clustering algorithm with an objective measure of the coherence of the modules identified. The approach is applied to the network of genetic regulatory interactions surrounding the tumor suppressor gene p53. This algorithm identifies ten clusters in the p53 network, which are visually coherent and biologically plausible.

Issues of robustness and high dimensionality in cluster analysis

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Finite mixture models are being increasingly used to model the distributions of a wide variety of random phenomena. While normal mixture models are often used to cluster data sets of continuous multivariate data, a more robust clustering can be obtained by considering the t mixture model-based approach. Mixtures of factor analyzers enable model-based density estimation to be undertaken for high-dimensional data where the number of observations n is very large relative to their dimension p. As the approach using the multivariate normal family of distributions is sensitive to outliers, it is more robust to adopt the multivariate t family for the component error and factor distributions. The computational aspects associated with robustness and high dimensionality in these approaches to cluster analysis are discussed and illustrated.

A rough cluster analysis of shopping orientation data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes the application of a new technique, rough clustering, to the problem of market segmentation. Rough clustering produces different solutions to k-means analysis because of the possibility of multiple cluster membership of objects. Traditional clustering methods generate extensional descriptions of groups, that show which objects are members of each cluster. Clustering techniques based on rough sets theory generate intensional descriptions, which outline the main characteristics of each cluster. In this study, a rough cluster analysis was conducted on a sample of 437 responses from a larger study of the relationship between shopping orientation (the general predisposition of consumers toward the act of shopping) and intention to purchase products via the Internet. The cluster analysis was based on five measures of shopping orientation: enjoyment, personalization, convenience, loyalty, and price. The rough clusters obtained provide interpretations of different shopping orientations present in the data without the restriction of attempting to fit each object into only one segment. Such descriptions can be an aid to marketers attempting to identify potential segments of consumers.

Internationale Konjunkturverbunde

Relevância:

100.00% 100.00%

Publicador:

Resumo:

After conceptual clarification of international business cycle and a review of the literature, a new indicator is proposed. This indicator refers to two time series only and allows for an internationally comparable quantification of a country's position in the business cycle. We then calculate times series of this indicator for 30 countries from 1970-2000. After some plausibility checks, we refer to these series to test a number of hypotheses. Cross correlations reveal a high degree of interconnectedness. Moreover, the number of highly positive correlations has increased over time, whereas the number of low and moderate correlations has decreased. A principal components analysis yields a first component that can be interpreted as the world business cycle. The further components suggest the existence of a Scandinavian-Anglo-Saxon business cycle as well as of another, smaller group of Anglo-Saxon countries that move together. This finding is replicated by a hierarchical cluster analysis, which in addition suggests a closely integrated group of non-Scandinavian and non-English speaking European countries plus Japan and Israel. Furthermore, there is indication for some, albeit weak business cycle integration in Southeast Asia and in South America. The international business cycle is thus found to have a hierarchical structure.

A DNA fingerprinting procedure for ultra high-throughput genetic analysis of insects

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Existing procedures for the generation of polymorphic DNA markers are not optimal for insect studies in which the organisms are often tiny and background molecular Information is often non-existent. We have used a new high throughput DNA marker generation protocol called randomly amplified DNA fingerprints (RAF) to analyse the genetic variability In three separate strains of the stored grain pest, Rhyzopertha dominica. This protocol is quick, robust and reliable even though it requires minimal sample preparation, minute amounts of DNA and no prior molecular analysis of the organism. Arbitrarily selected oligonucleotide primers routinely produced similar to 50 scoreable polymorphic DNA markers, between individuals of three Independent field isolates of R. dominica. Multivariate cluster analysis using forty-nine arbitrarily selected polymorphisms generated from a single primer reliably separated individuals into three clades corresponding to their geographical origin. The resulting clades were quite distinct, with an average genetic difference of 37.5 +/- 6.0% between clades and of 21.0 +/- 7.1% between individuals within clades. As a prelude to future gene mapping efforts, we have also assessed the performance of RAF under conditions commonly used in gene mapping. In this analysis, fingerprints from pooled DNA samples accurately and reproducibly reflected RAF profiles obtained from Individual DNA samples that had been combined to create the bulked samples.

Analysis of genetic diversity within Australian lucerne cultivars and implications for future genetic improvement

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Lucerne (Medicago sativa L.) is autotetraploid, and predominantly allogamous. This complex breeding structure maximises the genetic diversity within lucerne populations making it difficult to genetically discriminate between populations. The objective of this study was to evaluate the level of random genetic diversity within and between a selection of Australian-grown lucerne cultivars, with tetraploid M. falcata included as a possible divergent control source. This diversity was evaluated using random amplified polymorphic DNA (RAPDs). Nineteen plants from each of 10 cultivars were analysed. Using 11 RAPD primers, 96 polymorphic bands were scored as present or absent across the 190 individuals. Genetic similarity estimates (GSEs) of all pair-wise comparisons were calculated from these data. Mean GSEs within cultivars ranged from 0.43 to 0.51. Cultivar Venus (0.43) had the highest level of intra-population genetic diversity and cultivar Sequel HR (0.51) had the lowest level of intra-population genetic diversity. Mean GSEs between cultivars ranged from 0.31 to 0.49, which overlapped with values obtained for within-cultivar GSE, thus not allowing separation of the cultivars. The high level of intra- and inter-population diversity that was detected is most likely due to the breeding of synthetic cultivars using parents derived from a number of diverse sources. Cultivar-specific polymorphisms were only identified in the M. falcata source, which like M. sativa, is outcrossing and autotetraploid. From a cluster analysis and a principal components analysis, it was clear that M. falcata was distinct from the other cultivars. The results indicate that the M. falcata accession tested has not been widely used in Australian lucerne breeding programs, and offers a means of introducing new genetic diversity into the lucerne gene pool. This provides a means of maximising heterozygosity, which is essential to maximising productivity in lucerne.

Model-based clustering in gene expression microarrays: An application to breast cancer data

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In microarray studies, the application of clustering techniques is often used to derive meaningful insights into the data. In the past, hierarchical methods have been the primary clustering tool employed to perform this task. The hierarchical algorithms have been mainly applied heuristically to these cluster analysis problems. Further, a major limitation of these methods is their inability to determine the number of clusters. Thus there is a need for a model-based approach to these. clustering problems. To this end, McLachlan et al. [7] developed a mixture model-based algorithm (EMMIX-GENE) for the clustering of tissue samples. To further investigate the EMMIX-GENE procedure as a model-based -approach, we present a case study involving the application of EMMIX-GENE to the breast cancer data as studied recently in van 't Veer et al. [10]. Our analysis considers the problem of clustering the tissue samples on the basis of the genes which is a non-standard problem because the number of genes greatly exceed the number of tissue samples. We demonstrate how EMMIX-GENE can be useful in reducing the initial set of genes down to a more computationally manageable size. The results from this analysis also emphasise the difficulty associated with the task of separating two tissue groups on the basis of a particular subset of genes. These results also shed light on why supervised methods have such a high misallocation error rate for the breast cancer data.

Analysis of the acute postoperative pain experience following oral surgery: identification of 'unaffected', 'disabled' and 'depressed, anxious and disabled' patient clusters

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background: Pain is defined as both a sensory and an emotional experience. Acute postoperative tooth extraction pain is assessed and treated as a physiological (sensory) pain while chronic pain is a biopsychosocial problem. The purpose of this study was to assess whether psychological and social changes Occur in the acute pain state. Methods: A biopsychosocial pain questionnaire was completed by 438 subjects (165 males, 273 females) with acute postoperative pain at 24 hours following the surgical extraction of teeth and compared with 273 subjects (78 males, 195 females) with chronic orofacial pain. Statistical methods used a k-means cluster analysis. Results: Three clusters were identified in the acute pain group: 'unaffected', 'disabled' and 'depressed, anxious and disabled'. Psychosocial effects showed 24.8 per cent feeling 'distress/suffering' and 15.1 per cent 'sad and depressed'. Females reported higher pain intensity and more distress, depression and inadequate medication for pain relief (p

Comparison of virulence gene profiles of Escherichia coli strains isolated from healthy and diarrheic swine

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A combination of uni- and multiplex PCR assays targeting 58 virulence genes (VGs) associated with Escherichia coli strains causing intestinal and extraintestinal disease in humans and other mammals was used to analyze the VG repertoire of 23 commensal E. coli isolates from healthy pigs and 52 clinical isolates associated with porcine neonatal diarrhea (ND) and postweaning diarrhea (PWD). The relationship between the presence and absence of VGs was interrogated using three statistical methods. According to the generalized linear model, 17 of 58 VGs were found to be significant (P < 0.05) in distinguishing between commensal and clinical isolates. Nine of the 17 genes represented by iha, hlyA, aidA, east1, aah, fimH, iroN(E).(coli), traT, and saa have not been previously identified as important VGs in clinical porcine isolates in Australia. The remaining eight VGs code for fimbriae (F4, F5, F18, and F41) and toxins (STa, STh, LT, and Stx2), normally associated with porcine enterotoxigenic E. coli. Agglomerative hierarchical algorithm analysis grouped E. coli strains into subclusters based primarily on their serogroup. Multivariate analyses of clonal relationships based on the 17 VGs were collapsed into two-dimensional space by principal coordinate analysis. PWD clones were distributed in two quadrants, separated from ND and commensal clones, which tended to cluster within one quadrant. Clonal subclusters within quadrants were highly correlated with serogroups. These methods of analysis provide different perspectives in our attempts to understand how commensal and clinical porcine enterotoxigenic E. coli strains have evolved and are engaged in the dynamic process of losing or acquiring VGs within the pig population.

Using multivariate analysis to predict the behaviour of soils under effluent irrigation

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Onsite wastewater treatment systems aim to assimilate domestic effluent into the environment. Unfortunately failure of such systems is common and inadequate effluent treatment can have serious environmental implications. The capacity of a particular soil to treat wastewater will change over time. The physical properties influence the rate of effluent movement through the soil and its chemical properties dictate the ability to renovate effluent. A research project was undertaken to determine the role that physical and chemical soil properties play in predicting the long-term behaviour of soil under effluent irrigation and to determine if they have a potential function as early indicators of adverse effects of effluent irrigation on treatment sustainability. Principal Component Analysis (PCA) and Cluster Analysis grouped the soils independently of their soil classifications and allowed us to distinguish the most suitable soils for sustainable long term effluent irrigation and determine the most influential soil parameters to characterise them. Multivariate analysis allowed a clear distinction between soils based on the cation exchange capacities. This in turn correlated well with the soil mineralogy. Mixed mineralogy soils in particular sodium or magnesium dominant soils are the most susceptible to dispersion under effluent irrigation. The soil Exchangeable Sodium Percentage (ESP) was identified as a crucial parameter and was highly correlated with percentage clay, electrical conductivity, exchangeable sodium, exchangeable magnesium and low Ca:Mg ratios (less than 0.5).

«
1
2
»