973 resultados para Mixture-models


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background: Recently, with the access of low toxicity biological and targeted therapies, evidence of the existence of a long-term survival subpopulation of cancer patients is appearing. We have studied an unselected population with advanced lung cancer to look for evidence of multimodality in survival distribution, and estimate the proportion of long-term survivors. Methods: We used survival data of 4944 patients with non-small-cell lung cancer (NSCLC) stages IIIb-IV at diagnostic, registered in the National Cancer Registry of Cuba (NCRC) between January 1998 and December 2006. We fitted one-component survival model and two-component mixture models to identify short-and long-term survivors. Bayesian information criterion was used for model selection. Results: For all of the selected parametric distributions the two components model presented the best fit. The population with short-term survival (almost 4 months median survival) represented 64% of patients. The population of long-term survival included 35% of patients, and showed a median survival around 12 months. None of the patients of short-term survival was still alive at month 24, while 10% of the patients of long-term survival died afterwards. Conclusions: There is a subgroup showing long-term evolution among patients with advanced lung cancer. As survival rates continue to improve with the new generation of therapies, prognostic models considering short-and long-term survival subpopulations should be considered in clinical research.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A mixture of Gaussians fit to a single curved or heavy-tailed cluster will report that the data contains many clusters. To produce more appropriate clusterings, we introduce a model which warps a latent mixture of Gaussians to produce nonparametric cluster shapes. The possibly low-dimensional latent mixture model allows us to summarize the properties of the high-dimensional clusters (or density manifolds) describing the data. The number of manifolds, as well as the shape and dimension of each manifold is automatically inferred. We derive a simple inference scheme for this model which analytically integrates out both the mixture parameters and the warping function. We show that our model is effective for density estimation, performs better than infinite Gaussian mixture models at recovering the true number of clusters, and produces interpretable summaries of high-dimensional datasets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Semi-supervised clustering is the task of clustering data points into clusters where only a fraction of the points are labelled. The true number of clusters in the data is often unknown and most models require this parameter as an input. Dirichlet process mixture models are appealing as they can infer the number of clusters from the data. However, these models do not deal with high dimensional data well and can encounter difficulties in inference. We present a novel nonparameteric Bayesian kernel based method to cluster data points without the need to prespecify the number of clusters or to model complicated densities from which data points are assumed to be generated from. The key insight is to use determinants of submatrices of a kernel matrix as a measure of how close together a set of points are. We explore some theoretical properties of the model and derive a natural Gibbs based algorithm with MCMC hyperparameter learning. The model is implemented on a variety of synthetic and real world data sets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A probabilistic, nonlinear supervised learning model is proposed: the Specialized Mappings Architecture (SMA). The SMA employs a set of several forward mapping functions that are estimated automatically from training data. Each specialized function maps certain domains of the input space (e.g., image features) onto the output space (e.g., articulated body parameters). The SMA can model ambiguous, one-to-many mappings that may yield multiple valid output hypotheses. Once learned, the mapping functions generate a set of output hypotheses for a given input via a statistical inference procedure. The SMA inference procedure incorporates an inverse mapping or feedback function in evaluating the likelihood of each of the hypothesis. Possible feedback functions include computer graphics rendering routines that can generate images for given hypotheses. The SMA employs a variant of the Expectation-Maximization algorithm for simultaneous learning of the specialized domains along with the mapping functions, and approximate strategies for inference. The framework is demonstrated in a computer vision system that can estimate the articulated pose parameters of a human’s body or hands, given silhouettes from a single image. The accuracy and stability of the SMA are also tested using synthetic images of human bodies and hands, where ground truth is known.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This article describes advances in statistical computation for large-scale data analysis in structured Bayesian mixture models via graphics processing unit (GPU) programming. The developments are partly motivated by computational challenges arising in fitting models of increasing heterogeneity to increasingly large datasets. An example context concerns common biological studies using high-throughput technologies generating many, very large datasets and requiring increasingly high-dimensional mixture models with large numbers of mixture components.We outline important strategies and processes for GPU computation in Bayesian simulation and optimization approaches, give examples of the benefits of GPU implementations in terms of processing speed and scale-up in ability to analyze large datasets, and provide a detailed, tutorial-style exposition that will benefit readers interested in developing GPU-based approaches in other statistical models. Novel, GPU-oriented approaches to modifying existing algorithms software design can lead to vast speed-up and, critically, enable statistical analyses that presently will not be performed due to compute time limitations in traditional computational environments. Supplementalmaterials are provided with all source code, example data, and details that will enable readers to implement and explore the GPU approach in this mixture modeling context. © 2010 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We address the problem of non-linearity in 2D Shape modelling of a particular articulated object: the human body. This issue is partially resolved by applying a different Point Distribution Model (PDM) depending on the viewpoint. The remaining non-linearity is solved by using Gaussian Mixture Models (GMM). A dynamic-based clustering is proposed and carried out in the Pose Eigenspace. A fundamental question when clustering is to determine the optimal number of clusters. From our point of view, the main aspect to be evaluated is the mean gaussianity. This partitioning is then used to fit a GMM to each one of the view-based PDM, derived from a database of Silhouettes and Skeletons. Dynamic correspondences are then obtained between gaussian models of the 4 mixtures. Finally, we compare this approach with other two methods we previously developed to cope with non-linearity: Nearest Neighbor (NN) Classifier and Independent Component Analysis (ICA).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we propose a multi-camera application capable of processing high resolution images and extracting features based on colors patterns over graphic processing units (GPU). The goal is to work in real time under the uncontrolled environment of a sport event like a football match. Since football players are composed for diverse and complex color patterns, a Gaussian Mixture Models (GMM) is applied as segmentation paradigm, in order to analyze sport live images and video. Optimization techniques have also been applied over the C++ implementation using profiling tools focused on high performance. Time consuming tasks were implemented over NVIDIA's CUDA platform, and later restructured and enhanced, speeding up the whole process significantly. Our resulting code is around 4-11 times faster on a low cost GPU than a highly optimized C++ version on a central processing unit (CPU) over the same data. Real time has been obtained processing until 64 frames per second. An important conclusion derived from our study is the scalability of the application to the number of cores on the GPU. © 2011 Springer-Verlag.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper proposes a discrete mixture model which assigns individuals, up to a probability, to either a class of random utility (RU) maximizers or a class of random regret (RR) minimizers, on the basis of their sequence of observed choices. Our proposed model advances the state of the art of RU-RR mixture models by (i) adding and simultaneously estimating a membership model which predicts the probability of belonging to a RU or RR class; (ii) adding a layer of random taste heterogeneity within each behavioural class; and (iii) deriving a welfare measure associated with the RU-RR mixture model and consistent with referendum-voting, which is the adequate mechanism of provision for such local public goods. The context of our empirical application is a stated choice experiment concerning traffic calming schemes. We find that the random parameter RU-RR mixture model not only outperforms its fixed coefficient counterpart in terms of fit-as expected-but also in terms of plausibility of membership determinants of behavioural class. In line with psychological theories of regret, we find that, compared to respondents who are familiar with the choice context (i.e. the traffic calming scheme), unfamiliar respondents are more likely to be regret minimizers than utility maximizers. © 2014 Elsevier Ltd.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Generative algorithms for random graphs have yielded insights into the structure and evolution of real-world networks. Most networks exhibit a well-known set of properties, such as heavy-tailed degree distributions, clustering and community formation. Usually, random graph models consider only structural information, but many real-world networks also have labelled vertices and weighted edges. In this paper, we present a generative model for random graphs with discrete vertex labels and numeric edge weights. The weights are represented as a set of Beta Mixture Models (BMMs) with an arbitrary number of mixtures, which are learned from real-world networks. We propose a Bayesian Variational Inference (VI) approach, which yields an accurate estimation while keeping computation times tractable. We compare our approach to state-of-the-art random labelled graph generators and an earlier approach based on Gaussian Mixture Models (GMMs). Our results allow us to draw conclusions about the contribution of vertex labels and edge weights to graph structure.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

During the last century mean global temperatures have been increasing. According to the predictions, the temperature change is expected to exceed 1.5ºC in this century and the warming is likely to continue. Freshwater ecosystems are among the most sensitive mainly due to changes in the hydrologic cycle and consequently changes in several physico-chemical parameters (e.g. pH, dissolved oxygen). Alterations in environmental parameters of freshwater systems are likely to affect distribution, morphology, physiology and richness of a wide range of species leading to important changes in ecosystem biodiversity and function. Moreover, they can also work as co-stressors in environments where organisms have already to cope with chemical contamination (such as pesticides), increasing the environmental risk due to potential interactions. Therefore, the objective of this work was to evaluate the effects of climate change related environmental parameters on the toxicity of pesticides to zebrafish embryos. The following environmental factors were studied: pH (3.0-12.0), dissolved oxygen level (0-8 mg/L) and UV radiation (0-500 mW/m2). The pesticides studied were the carbamate insecticide carbaryl and the benzimidazole fungicide carbendazim. Stressors were firstly tested separately in order to derive concentration- or intensity-response curves to further study the effects of binary combinations (environmental factors x pesticides) by applying mixture models. Characterization of zebrafish embryos response to environmental stress revealed that pH effects were fully established after 24 h of exposure and survival was only affected at pH values below 5 and above 10. Low oxygen levels also affected embryos development at concentrations below 4 mg/L (delay, heart rate decrease and edema), and at concentrations below 0.5 mg/L the survival was drastically reduced. Continuous exposure to UV radiation showed a strong time-dependent impact on embryos survival leading to 100% of mortality after 72 hours of exposure. The toxicity of pesticides carbaryl and carbendazim was characterized at several levels of biological organization including developmental, biochemical and behavioural allowing a mechanistic understanding of the effects and highlighting the usefulness of behavioural responses (locomotion) as a sensitive endpoint in ecotoxicology. Once the individual concentration response relationship of each stressor was established, a combined toxicity study was conducted to evaluate the effects of pH on the toxicity of carbaryl. We have shown that pH can modify the toxicity of the pesticide carbaryl. The conceptual model concentration addition allowed a precise prediction of the toxicity of the jointeffects of acid pH and carbaryl. Nevertheless, for alkaline condition both concepts failed in predicting the effects. Deviations to the model were however easy to explain as high pH values favour the hydrolysis of carbaryl with the consequent formation of the more toxic degradation product 1- naphtol. Although in the present study such explanatory process was easy to establish, for many other combinations the “interactive” nature is not so evident. In the context of the climate change few scenarios predict such increase in the pH of aquatic systems, however this was a first approach focused in the lethal effects only. In a second tier assessment effects at sublethal level would be sought and it is expectable that more subtle pH changes (more realistic in terms of climate changes scenarios) may have an effect at physiological and biochemical levels with possible long term consequences for the population fitness.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Tese apresentada como requisito parcial para obtenção do grau de Doutor em Estatística e Gestão de Informação pelo Instituto Superior de Estatística e Gestão de Informação da Universidade Nova de Lisboa

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Affiliation: Département de Biochimie, Faculté de médecine, Université de Montréal

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Les modèles à sur-représentation de zéros discrets et continus ont une large gamme d'applications et leurs propriétés sont bien connues. Bien qu'il existe des travaux portant sur les modèles discrets à sous-représentation de zéro et modifiés à zéro, la formulation usuelle des modèles continus à sur-représentation -- un mélange entre une densité continue et une masse de Dirac -- empêche de les généraliser afin de couvrir le cas de la sous-représentation de zéros. Une formulation alternative des modèles continus à sur-représentation de zéros, pouvant aisément être généralisée au cas de la sous-représentation, est présentée ici. L'estimation est d'abord abordée sous le paradigme classique, et plusieurs méthodes d'obtention des estimateurs du maximum de vraisemblance sont proposées. Le problème de l'estimation ponctuelle est également considéré du point de vue bayésien. Des tests d'hypothèses classiques et bayésiens visant à déterminer si des données sont à sur- ou sous-représentation de zéros sont présentées. Les méthodes d'estimation et de tests sont aussi évaluées au moyen d'études de simulation et appliquées à des données de précipitation agrégées. Les diverses méthodes s'accordent sur la sous-représentation de zéros des données, démontrant la pertinence du modèle proposé. Nous considérons ensuite la classification d'échantillons de données à sous-représentation de zéros. De telles données étant fortement non normales, il est possible de croire que les méthodes courantes de détermination du nombre de grappes s'avèrent peu performantes. Nous affirmons que la classification bayésienne, basée sur la distribution marginale des observations, tiendrait compte des particularités du modèle, ce qui se traduirait par une meilleure performance. Plusieurs méthodes de classification sont comparées au moyen d'une étude de simulation, et la méthode proposée est appliquée à des données de précipitation agrégées provenant de 28 stations de mesure en Colombie-Britannique.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Cette étude aborde le thème de l’utilisation des modèles de mélange de lois pour analyser des données de comportements et d’habiletés cognitives mesurées à plusieurs moments au cours du développement des enfants. L’estimation des mélanges de lois multinormales en utilisant l’algorithme EM est expliquée en détail. Cet algorithme simplifie beaucoup les calculs, car il permet d’estimer les paramètres de chaque groupe séparément, permettant ainsi de modéliser plus facilement la covariance des observations à travers le temps. Ce dernier point est souvent mis de côté dans les analyses de mélanges. Cette étude porte sur les conséquences d’une mauvaise spécification de la covariance sur l’estimation du nombre de groupes formant un mélange. La conséquence principale est la surestimation du nombre de groupes, c’est-à-dire qu’on estime des groupes qui n’existent pas. En particulier, l’hypothèse d’indépendance des observations à travers le temps lorsque ces dernières étaient corrélées résultait en l’estimation de plusieurs groupes qui n’existaient pas. Cette surestimation du nombre de groupes entraîne aussi une surparamétrisation, c’est-à-dire qu’on utilise plus de paramètres qu’il n’est nécessaire pour modéliser les données. Finalement, des modèles de mélanges ont été estimés sur des données de comportements et d’habiletés cognitives. Nous avons estimé les mélanges en supposant d’abord une structure de covariance puis l’indépendance. On se rend compte que dans la plupart des cas l’ajout d’une structure de covariance a pour conséquence d’estimer moins de groupes et les résultats sont plus simples et plus clairs à interpréter.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We present distribution independent bounds on the generalization misclassification performance of a family of kernel classifiers with margin. Support Vector Machine classifiers (SVM) stem out of this class of machines. The bounds are derived through computations of the $V_gamma$ dimension of a family of loss functions where the SVM one belongs to. Bounds that use functions of margin distributions (i.e. functions of the slack variables of SVM) are derived.