981 resultados para Mixture modelling
Resumo:
This article explores the use of probabilistic classification, namely finite mixture modelling, for identification of complex disease phenotypes, given cross-sectional data. In particular, if focuses on posterior probabilities of subgroup membership, a standard output of finite mixture modelling, and how the quantification of uncertainty in these probabilities can lead to more detailed analyses. Using a Bayesian approach, we describe two practical uses of this uncertainty: (i) as a means of describing a person’s membership to a single or multiple latent subgroups and (ii) as a means of describing identified subgroups by patient-centred covariates not included in model estimation. These proposed uses are demonstrated on a case study in Parkinson’s disease (PD), where latent subgroups are identified using multiple symptoms from the Unified Parkinson’s Disease Rating Scale (UPDRS).
Resumo:
We describe a strategy for Markov chain Monte Carlo analysis of non-linear, non-Gaussian state-space models involving batch analysis for inference on dynamic, latent state variables and fixed model parameters. The key innovation is a Metropolis-Hastings method for the time series of state variables based on sequential approximation of filtering and smoothing densities using normal mixtures. These mixtures are propagated through the non-linearities using an accurate, local mixture approximation method, and we use a regenerating procedure to deal with potential degeneracy of mixture components. This provides accurate, direct approximations to sequential filtering and retrospective smoothing distributions, and hence a useful construction of global Metropolis proposal distributions for simulation of posteriors for the set of states. This analysis is embedded within a Gibbs sampler to include uncertain fixed parameters. We give an example motivated by an application in systems biology. Supplemental materials provide an example based on a stochastic volatility model as well as MATLAB code.
Resumo:
Cluster analysis via a finite mixture model approach is considered. With this approach to clustering, the data can be partitioned into a specified number of clusters g by first fitting a mixture model with g components. An outright clustering of the data is then obtained by assigning an observation to the component to which it has the highest estimated posterior probability of belonging; that is, the ith cluster consists of those observations assigned to the ith component (i = 1,..., g). The focus is on the use of mixtures of normal components for the cluster analysis of data that can be regarded as being continuous. But attention is also given to the case of mixed data, where the observations consist of both continuous and discrete variables.
Resumo:
PURPOSE/OBJECTIVES: To identify latent classes of individuals with distinct quality-of-life (QOL) trajectories, to evaluate for differences in demographic characteristics between the latent classes, and to evaluate for variations in pro- and anti-inflammatory cytokine genes between the latent classes. DESIGN: Descriptive, longitudinal study. SETTING: Two radiation therapy departments located in a comprehensive cancer center and a community-based oncology program in northern California. SAMPLE: 168 outpatients with prostate, breast, brain, or lung cancer and 85 of their family caregivers (FCs). METHODS: Growth mixture modeling (GMM) was employed to identify latent classes of individuals based on QOL scores measured prior to, during, and for four months following completion of radiation therapy. Single nucleotide polymorphisms (SNPs) and haplotypes in 16 candidate cytokine genes were tested between the latent classes. Logistic regression was used to evaluate the relationships among genotypic and phenotypic characteristics and QOL GMM group membership. MAIN RESEARCH VARIABLES: QOL latent class membership and variations in cytokine genes. FINDINGS: Two latent QOL classes were found: higher and lower. Patients and FCs who were younger, identified with an ethnic minority group, had poorer functional status, or had children living at home were more likely to belong to the lower QOL class. After controlling for significant covariates, between-group differences were found in SNPs in interleukin 1 receptor 2 (IL1R2) and nuclear factor kappa beta 2 (NFKB2). For IL1R2, carrying one or two doses of the rare C allele was associated with decreased odds of belonging to the lower QOL class. For NFKB2, carriers with two doses of the rare G allele were more likely to belong to the lower QOL class. CONCLUSIONS: Unique genetic markers in cytokine genes may partially explain interindividual variability in QOL. IMPLICATIONS FOR NURSING: Determination of high-risk characteristics and unique genetic markers would allow for earlier identification of patients with cancer and FCs at higher risk for poorer QOL. Knowledge of these risk factors could assist in the development of more targeted clinical or supportive care interventions for those identified.
Resumo:
This thesis has investigated how to cluster a large number of faces within a multi-media corpus in the presence of large session variation. Quality metrics are used to select the best faces to represent a sequence of faces; and session variation modelling improves clustering performance in the presence of wide variations across videos. Findings from this thesis contribute to improving the performance of both face verification systems and the fully automated clustering of faces from a large video corpus.
Resumo:
Automatic gender classification has many security and commercial applications. Various modalities have been investigated for gender classification with face-based classification being the most popular. In some real-world scenarios the face may be partially occluded. In these circumstances a classification based on individual parts of the face known as local features must be adopted. We investigate gender classification using lip movements. We show for the first time that important gender specific information can be obtained from the way in which a person moves their lips during speech. Furthermore our study indicates that the lip dynamics during speech provide greater gender discriminative information than simply lip appearance. We also show that the lip dynamics and appearance contain complementary gender information such that a model which captures both traits gives the highest overall classification result. We use Discrete Cosine Transform based features and Gaussian Mixture Modelling to model lip appearance and dynamics and employ the XM2VTS database for our experiments. Our experiments show that a model which captures lip dynamics along with appearance can improve gender classification rates by between 16-21% compared to models of only lip appearance.
Resumo:
This thesis entitled Reliability Modelling and Analysis in Discrete time Some Concepts and Models Useful in the Analysis of discrete life time data.The present study consists of five chapters. In Chapter II we take up the derivation of some general results useful in reliability modelling that involves two component mixtures. Expression for the failure rate, mean residual life and second moment of residual life of the mixture distributions in terms of the corresponding quantities in the component distributions are investigated. Some applications of these results are also pointed out. The role of the geometric,Waring and negative hypergeometric distributions as models of life lengths in the discrete time domain has been discussed already. While describing various reliability characteristics, it was found that they can be often considered as a class. The applicability of these models in single populations naturally extends to the case of populations composed of sub-populations making mixtures of these distributions worth investigating. Accordingly the general properties, various reliability characteristics and characterizations of these models are discussed in chapter III. Inference of parameters in mixture distribution is usually a difficult problem because the mass function of the mixture is a linear function of the component masses that makes manipulation of the likelihood equations, leastsquare function etc and the resulting computations.very difficult. We show that one of our characterizations help in inferring the parameters of the geometric mixture without involving computational hazards. As mentioned in the review of results in the previous sections, partial moments were not studied extensively in literature especially in the case of discrete distributions. Chapters IV and V deal with descending and ascending partial factorial moments. Apart from studying their properties, we prove characterizations of distributions by functional forms of partial moments and establish recurrence relations between successive moments for some well known families. It is further demonstrated that partial moments are equally efficient and convenient compared to many of the conventional tools to resolve practical problems in reliability modelling and analysis. The study concludes by indicating some new problems that surfaced during the course of the present investigation which could be the subject for a future work in this area.
Resumo:
Background: Microarray based comparative genomic hybridisation (CGH) experiments have been used to study numerous biological problems including understanding genome plasticity in pathogenic bacteria. Typically such experiments produce large data sets that are difficult for biologists to handle. Although there are some programmes available for interpretation of bacterial transcriptomics data and CGH microarray data for looking at genetic stability in oncogenes, there are none specifically to understand the mosaic nature of bacterial genomes. Consequently a bottle neck still persists in accurate processing and mathematical analysis of these data. To address this shortfall we have produced a simple and robust CGH microarray data analysis process that may be automated in the future to understand bacterial genomic diversity. Results: The process involves five steps: cleaning, normalisation, estimating gene presence and absence or divergence, validation, and analysis of data from test against three reference strains simultaneously. Each stage of the process is described and we have compared a number of methods available for characterising bacterial genomic diversity, for calculating the cut-off between gene presence and absence or divergence, and shown that a simple dynamic approach using a kernel density estimator performed better than both established, as well as a more sophisticated mixture modelling technique. We have also shown that current methods commonly used for CGH microarray analysis in tumour and cancer cell lines are not appropriate for analysing our data. Conclusion: After carrying out the analysis and validation for three sequenced Escherichia coli strains, CGH microarray data from 19 E. coli O157 pathogenic test strains were used to demonstrate the benefits of applying this simple and robust process to CGH microarray studies using bacterial genomes.
Resumo:
Background: Psychotic phenomena appear to form a continuum with normal experience and beliefs, and may build on common emotional interpersonal concerns. Aims: We tested predictions that paranoid ideation is exponentially distributed and hierarchically arranged in the general population, and that persecutory ideas build on more common cognitions of mistrust, interpersonal sensitivity and ideas of reference. Method: Items were chosen from the Structured Clinical Interview for DSM-IV Axis II Disorders (SCID-II) questionnaire and the Psychosis Screening Questionnaire in the second British National Survey of Psychiatric Morbidity (n = 8580), to test a putative hierarchy of paranoid development using confirmatory factor analysis, latent class analysis and factor mixture modelling analysis. Results: Different types of paranoid ideation ranged in frequency from less than 2% to nearly 30%. Total scores on these items followed an almost perfect exponential distribution (r = 0.99). Our four a priori first-order factors were corroborated (interpersonal sensitivity; mistrust; ideas of reference; ideas of persecution). These mapped onto four classes of individual respondents: a rare, severe, persecutory class with high endorsement of all item factors, including persecutory ideation; a quasi-normal class with infrequent endorsement of interpersonal sensitivity, mistrust and ideas of reference, and no ideas of persecution; and two intermediate classes, characterised respectively by relatively high endorsement of items relating to mistrust and to ideas of reference. Conclusions: The paranoia continuum has implications for the aetiology, mechanisms and treatment of psychotic disorders, while confirming the lack of a clear distinction from normal experiences and processes.
Resumo:
Pós-graduação em Alimentos e Nutrição - FCFAR
Resumo:
This paper considers a model-based approach to the clustering of tissue samples of a very large number of genes from microarray experiments. It is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. Frequently in practice, there are also clinical data available on those cases on which the tissue samples have been obtained. Here we investigate how to use the clinical data in conjunction with the microarray gene expression data to cluster the tissue samples. We propose two mixture model-based approaches in which the number of components in the mixture model corresponds to the number of clusters to be imposed on the tissue samples. One approach specifies the components of the mixture model to be the conditional distributions of the microarray data given the clinical data with the mixing proportions also conditioned on the latter data. Another takes the components of the mixture model to represent the joint distributions of the clinical and microarray data. The approaches are demonstrated on some breast cancer data, as studied recently in van't Veer et al. (2002).
Resumo:
Dynamic positron emission tomography (PET) imaging can be used to track the distribution of injected radio-labelled molecules over time in vivo. This is a powerful technique, which provides researchers and clinicians the opportunity to study the status of healthy and pathological tissue by examining how it processes substances of interest. Widely used tracers include 18F-uorodeoxyglucose, an analog of glucose, which is used as the radiotracer in over ninety percent of PET scans. This radiotracer provides a way of quantifying the distribution of glucose utilisation in vivo. The interpretation of PET time-course data is complicated because the measured signal is a combination of vascular delivery and tissue retention effects. If the arterial time-course is known, the tissue time-course can typically be expressed in terms of a linear convolution between the arterial time-course and the tissue residue function. As the residue represents the amount of tracer remaining in the tissue, this can be thought of as a survival function; these functions been examined in great detail by the statistics community. Kinetic analysis of PET data is concerned with estimation of the residue and associated functionals such as ow, ux and volume of distribution. This thesis presents a Markov chain formulation of blood tissue exchange and explores how this relates to established compartmental forms. A nonparametric approach to the estimation of the residue is examined and the improvement in this model relative to compartmental model is evaluated using simulations and cross-validation techniques. The reference distribution of the test statistics, generated in comparing the models, is also studied. We explore these models further with simulated studies and an FDG-PET dataset from subjects with gliomas, which has previously been analysed with compartmental modelling. We also consider the performance of a recently proposed mixture modelling technique in this study.
Resumo:
Computer modelling promises to be an important tool for analysing and predicting interactions between trees within mixed species forest plantations. This study explored the use of an individual-based mechanistic model as a predictive tool for designing mixed species plantations of Australian tropical trees. The `spatially explicit individually based-forest simulator' (SeXI-FS) modelling system was used to describe the spatial interaction of individual tree crowns within a binary mixed-species experiment. The three-dimensional model was developed and verified with field data from three forest tree species grown in tropical Australia. The model predicted the interactions within monocultures and binary mixtures of Flindersia brayleyana, Eucalyptus pellita and Elaeocarpus grandis, accounting for an average of 42% of the growth variation exhibited by species in different treatments. The model requires only structural dimensions and shade tolerance as species parameters. By modelling interactions in existing tree mixtures, the model predicted both increases and reductions in the growth of mixtures (up to +/-50% of stem volume at 7 years) compared to monocultures. This modelling approach may be useful for designing mixed tree plantations.