995 resultados para Mixture Modelling


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Normal mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster sets of continuous multivariate data. However, for a set of data containing a group or groups of observations with longer than normal tails or atypical observations, the use of normal components may unduly affect the fit of the mixture model. In this paper, we consider a more robust approach by modelling the data by a mixture of t distributions. The use of the ECM algorithm to fit this t mixture model is described and examples of its use are given in the context of clustering multivariate data in the presence of atypical observations in the form of background noise.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cluster analysis via a finite mixture model approach is considered. With this approach to clustering, the data can be partitioned into a specified number of clusters g by first fitting a mixture model with g components. An outright clustering of the data is then obtained by assigning an observation to the component to which it has the highest estimated posterior probability of belonging; that is, the ith cluster consists of those observations assigned to the ith component (i = 1,..., g). The focus is on the use of mixtures of normal components for the cluster analysis of data that can be regarded as being continuous. But attention is also given to the case of mixed data, where the observations consist of both continuous and discrete variables.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We focus on mixtures of factor analyzers from the perspective of a method for model-based density estimation from high-dimensional data, and hence for the clustering of such data. This approach enables a normal mixture model to be fitted to a sample of n data points of dimension p, where p is large relative to n. The number of free parameters is controlled through the dimension of the latent factor space. By working in this reduced space, it allows a model for each component-covariance matrix with complexity lying between that of the isotropic and full covariance structure models. We shall illustrate the use of mixtures of factor analyzers in a practical example that considers the clustering of cell lines on the basis of gene expressions from microarray experiments. (C) 2002 Elsevier Science B.V. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

One hundred and twenty-five mineral grains from 45 visually pure K-bearing Mn oxide (hollandite group) samples collected from weathering profiles in the Mt Tabor region of central Queensland, Australia, were analysed by the Ar-40/Ar-39 laser probe technique. These K-Mn oxides precipitated mainly through a process of cavity filling (direct precipitation from weathering solution), with botryoidal texture formed by micrometric mineral bands. Well-defined and reproducible plateau ages have been obtained for most samples, ranging from 27.2 +/- 0.8 to 6.8 +/- 0.5 Ma (2 sigma). Statistical analysis of the geochronological results by mixture modelling suggests an episodic mineral precipitation history, with two major peaks at 20.2 +/- 0.22 Ma and 16.5 +/- 0.17 Ma. The geochronological results, when combined with information on paragenetic relationships and mineralogical textures obtained from petrographic, scanning electron microscopy, and electron microprobe investigations, indicate that warm and humid palaeoclimatic conditions favourable to intense chemical weathering prevailed in central Queensland from late Oligocene to middle Miocene, particularly in the early Miocene. These results, in conjunction with previous and ongoing investigations in NW and eastern Queensland, suggest that most of Queensland was dominated by humid climates during the Miocene. (C) 2002 Elsevier Science BN. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In microarray studies, the application of clustering techniques is often used to derive meaningful insights into the data. In the past, hierarchical methods have been the primary clustering tool employed to perform this task. The hierarchical algorithms have been mainly applied heuristically to these cluster analysis problems. Further, a major limitation of these methods is their inability to determine the number of clusters. Thus there is a need for a model-based approach to these. clustering problems. To this end, McLachlan et al. [7] developed a mixture model-based algorithm (EMMIX-GENE) for the clustering of tissue samples. To further investigate the EMMIX-GENE procedure as a model-based -approach, we present a case study involving the application of EMMIX-GENE to the breast cancer data as studied recently in van 't Veer et al. [10]. Our analysis considers the problem of clustering the tissue samples on the basis of the genes which is a non-standard problem because the number of genes greatly exceed the number of tissue samples. We demonstrate how EMMIX-GENE can be useful in reducing the initial set of genes down to a more computationally manageable size. The results from this analysis also emphasise the difficulty associated with the task of separating two tissue groups on the basis of a particular subset of genes. These results also shed light on why supervised methods have such a high misallocation error rate for the breast cancer data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis entitled Reliability Modelling and Analysis in Discrete time Some Concepts and Models Useful in the Analysis of discrete life time data.The present study consists of five chapters. In Chapter II we take up the derivation of some general results useful in reliability modelling that involves two component mixtures. Expression for the failure rate, mean residual life and second moment of residual life of the mixture distributions in terms of the corresponding quantities in the component distributions are investigated. Some applications of these results are also pointed out. The role of the geometric,Waring and negative hypergeometric distributions as models of life lengths in the discrete time domain has been discussed already. While describing various reliability characteristics, it was found that they can be often considered as a class. The applicability of these models in single populations naturally extends to the case of populations composed of sub-populations making mixtures of these distributions worth investigating. Accordingly the general properties, various reliability characteristics and characterizations of these models are discussed in chapter III. Inference of parameters in mixture distribution is usually a difficult problem because the mass function of the mixture is a linear function of the component masses that makes manipulation of the likelihood equations, leastsquare function etc and the resulting computations.very difficult. We show that one of our characterizations help in inferring the parameters of the geometric mixture without involving computational hazards. As mentioned in the review of results in the previous sections, partial moments were not studied extensively in literature especially in the case of discrete distributions. Chapters IV and V deal with descending and ascending partial factorial moments. Apart from studying their properties, we prove characterizations of distributions by functional forms of partial moments and establish recurrence relations between successive moments for some well known families. It is further demonstrated that partial moments are equally efficient and convenient compared to many of the conventional tools to resolve practical problems in reliability modelling and analysis. The study concludes by indicating some new problems that surfaced during the course of the present investigation which could be the subject for a future work in this area.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background: Microarray based comparative genomic hybridisation (CGH) experiments have been used to study numerous biological problems including understanding genome plasticity in pathogenic bacteria. Typically such experiments produce large data sets that are difficult for biologists to handle. Although there are some programmes available for interpretation of bacterial transcriptomics data and CGH microarray data for looking at genetic stability in oncogenes, there are none specifically to understand the mosaic nature of bacterial genomes. Consequently a bottle neck still persists in accurate processing and mathematical analysis of these data. To address this shortfall we have produced a simple and robust CGH microarray data analysis process that may be automated in the future to understand bacterial genomic diversity. Results: The process involves five steps: cleaning, normalisation, estimating gene presence and absence or divergence, validation, and analysis of data from test against three reference strains simultaneously. Each stage of the process is described and we have compared a number of methods available for characterising bacterial genomic diversity, for calculating the cut-off between gene presence and absence or divergence, and shown that a simple dynamic approach using a kernel density estimator performed better than both established, as well as a more sophisticated mixture modelling technique. We have also shown that current methods commonly used for CGH microarray analysis in tumour and cancer cell lines are not appropriate for analysing our data. Conclusion: After carrying out the analysis and validation for three sequenced Escherichia coli strains, CGH microarray data from 19 E. coli O157 pathogenic test strains were used to demonstrate the benefits of applying this simple and robust process to CGH microarray studies using bacterial genomes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background: Psychotic phenomena appear to form a continuum with normal experience and beliefs, and may build on common emotional interpersonal concerns. Aims: We tested predictions that paranoid ideation is exponentially distributed and hierarchically arranged in the general population, and that persecutory ideas build on more common cognitions of mistrust, interpersonal sensitivity and ideas of reference. Method: Items were chosen from the Structured Clinical Interview for DSM-IV Axis II Disorders (SCID-II) questionnaire and the Psychosis Screening Questionnaire in the second British National Survey of Psychiatric Morbidity (n = 8580), to test a putative hierarchy of paranoid development using confirmatory factor analysis, latent class analysis and factor mixture modelling analysis. Results: Different types of paranoid ideation ranged in frequency from less than 2% to nearly 30%. Total scores on these items followed an almost perfect exponential distribution (r = 0.99). Our four a priori first-order factors were corroborated (interpersonal sensitivity; mistrust; ideas of reference; ideas of persecution). These mapped onto four classes of individual respondents: a rare, severe, persecutory class with high endorsement of all item factors, including persecutory ideation; a quasi-normal class with infrequent endorsement of interpersonal sensitivity, mistrust and ideas of reference, and no ideas of persecution; and two intermediate classes, characterised respectively by relatively high endorsement of items relating to mistrust and to ideas of reference. Conclusions: The paranoia continuum has implications for the aetiology, mechanisms and treatment of psychotic disorders, while confirming the lack of a clear distinction from normal experiences and processes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Pós-graduação em Alimentos e Nutrição - FCFAR

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper considers a model-based approach to the clustering of tissue samples of a very large number of genes from microarray experiments. It is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. Frequently in practice, there are also clinical data available on those cases on which the tissue samples have been obtained. Here we investigate how to use the clinical data in conjunction with the microarray gene expression data to cluster the tissue samples. We propose two mixture model-based approaches in which the number of components in the mixture model corresponds to the number of clusters to be imposed on the tissue samples. One approach specifies the components of the mixture model to be the conditional distributions of the microarray data given the clinical data with the mixing proportions also conditioned on the latter data. Another takes the components of the mixture model to represent the joint distributions of the clinical and microarray data. The approaches are demonstrated on some breast cancer data, as studied recently in van't Veer et al. (2002).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dynamic positron emission tomography (PET) imaging can be used to track the distribution of injected radio-labelled molecules over time in vivo. This is a powerful technique, which provides researchers and clinicians the opportunity to study the status of healthy and pathological tissue by examining how it processes substances of interest. Widely used tracers include 18F-uorodeoxyglucose, an analog of glucose, which is used as the radiotracer in over ninety percent of PET scans. This radiotracer provides a way of quantifying the distribution of glucose utilisation in vivo. The interpretation of PET time-course data is complicated because the measured signal is a combination of vascular delivery and tissue retention effects. If the arterial time-course is known, the tissue time-course can typically be expressed in terms of a linear convolution between the arterial time-course and the tissue residue function. As the residue represents the amount of tracer remaining in the tissue, this can be thought of as a survival function; these functions been examined in great detail by the statistics community. Kinetic analysis of PET data is concerned with estimation of the residue and associated functionals such as ow, ux and volume of distribution. This thesis presents a Markov chain formulation of blood tissue exchange and explores how this relates to established compartmental forms. A nonparametric approach to the estimation of the residue is examined and the improvement in this model relative to compartmental model is evaluated using simulations and cross-validation techniques. The reference distribution of the test statistics, generated in comparing the models, is also studied. We explore these models further with simulated studies and an FDG-PET dataset from subjects with gliomas, which has previously been analysed with compartmental modelling. We also consider the performance of a recently proposed mixture modelling technique in this study.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A two-component survival mixture model is proposed to analyse a set of ischaemic stroke-specific mortality data. The survival experience of stroke patients after index stroke may be described by a subpopulation of patients in the acute condition and another subpopulation of patients in the chronic phase. To adjust for the inherent correlation of observations due to random hospital effects, a mixture model of two survival functions with random effects is formulated. Assuming a Weibull hazard in both components, an EM algorithm is developed for the estimation of fixed effect parameters and variance components. A simulation study is conducted to assess the performance of the two-component survival mixture model estimators. Simulation results confirm the applicability of the proposed model in a small sample setting. Copyright (C) 2004 John Wiley Sons, Ltd.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

While the standard models of concentration addition and independent action predict overall toxicity of multicomponent mixtures reasonably, interactions may limit the predictive capability when a few compounds dominate a mixture. This study was conducted to test if statistically significant systematic deviations from concentration addition (i.e. synergism/antagonism, dose ratio- or dose level-dependency) occur when two taxonomically unrelated species, the earthworm Eisenia fetida and the nematode Caenorhabditis elegans were exposed to a full range of mixtures of the similar acting neonicotinoid pesticides imidacloprid and thiacloprid. The effect of the mixtures on C. elegans was described significantly better (p<0.01) by a dose level-dependent deviation from the concentration addition model than by the reference model alone, while the reference model description of the effects on E. fetida could not be significantly improved. These results highlight that deviations from concentration addition are possible even with similar acting compounds, but that the nature of such deviations are species dependent. For improving ecological risk assessment of simple mixtures, this implies that the concentration addition model may need to be used in a probabilistic context, rather than in its traditional deterministic manner. Crown Copyright (C) 2008 Published by Elsevier Inc. All rights reserved.