959 resultados para mixture models
Resumo:
This article describes advances in statistical computation for large-scale data analysis in structured Bayesian mixture models via graphics processing unit (GPU) programming. The developments are partly motivated by computational challenges arising in fitting models of increasing heterogeneity to increasingly large datasets. An example context concerns common biological studies using high-throughput technologies generating many, very large datasets and requiring increasingly high-dimensional mixture models with large numbers of mixture components.We outline important strategies and processes for GPU computation in Bayesian simulation and optimization approaches, give examples of the benefits of GPU implementations in terms of processing speed and scale-up in ability to analyze large datasets, and provide a detailed, tutorial-style exposition that will benefit readers interested in developing GPU-based approaches in other statistical models. Novel, GPU-oriented approaches to modifying existing algorithms software design can lead to vast speed-up and, critically, enable statistical analyses that presently will not be performed due to compute time limitations in traditional computational environments. Supplementalmaterials are provided with all source code, example data, and details that will enable readers to implement and explore the GPU approach in this mixture modeling context. © 2010 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.
Resumo:
We address the problem of non-linearity in 2D Shape modelling of a particular articulated object: the human body. This issue is partially resolved by applying a different Point Distribution Model (PDM) depending on the viewpoint. The remaining non-linearity is solved by using Gaussian Mixture Models (GMM). A dynamic-based clustering is proposed and carried out in the Pose Eigenspace. A fundamental question when clustering is to determine the optimal number of clusters. From our point of view, the main aspect to be evaluated is the mean gaussianity. This partitioning is then used to fit a GMM to each one of the view-based PDM, derived from a database of Silhouettes and Skeletons. Dynamic correspondences are then obtained between gaussian models of the 4 mixtures. Finally, we compare this approach with other two methods we previously developed to cope with non-linearity: Nearest Neighbor (NN) Classifier and Independent Component Analysis (ICA).
Resumo:
In this paper, we propose a multi-camera application capable of processing high resolution images and extracting features based on colors patterns over graphic processing units (GPU). The goal is to work in real time under the uncontrolled environment of a sport event like a football match. Since football players are composed for diverse and complex color patterns, a Gaussian Mixture Models (GMM) is applied as segmentation paradigm, in order to analyze sport live images and video. Optimization techniques have also been applied over the C++ implementation using profiling tools focused on high performance. Time consuming tasks were implemented over NVIDIA's CUDA platform, and later restructured and enhanced, speeding up the whole process significantly. Our resulting code is around 4-11 times faster on a low cost GPU than a highly optimized C++ version on a central processing unit (CPU) over the same data. Real time has been obtained processing until 64 frames per second. An important conclusion derived from our study is the scalability of the application to the number of cores on the GPU. © 2011 Springer-Verlag.
Resumo:
This paper proposes a discrete mixture model which assigns individuals, up to a probability, to either a class of random utility (RU) maximizers or a class of random regret (RR) minimizers, on the basis of their sequence of observed choices. Our proposed model advances the state of the art of RU-RR mixture models by (i) adding and simultaneously estimating a membership model which predicts the probability of belonging to a RU or RR class; (ii) adding a layer of random taste heterogeneity within each behavioural class; and (iii) deriving a welfare measure associated with the RU-RR mixture model and consistent with referendum-voting, which is the adequate mechanism of provision for such local public goods. The context of our empirical application is a stated choice experiment concerning traffic calming schemes. We find that the random parameter RU-RR mixture model not only outperforms its fixed coefficient counterpart in terms of fit-as expected-but also in terms of plausibility of membership determinants of behavioural class. In line with psychological theories of regret, we find that, compared to respondents who are familiar with the choice context (i.e. the traffic calming scheme), unfamiliar respondents are more likely to be regret minimizers than utility maximizers. © 2014 Elsevier Ltd.
Resumo:
Generative algorithms for random graphs have yielded insights into the structure and evolution of real-world networks. Most networks exhibit a well-known set of properties, such as heavy-tailed degree distributions, clustering and community formation. Usually, random graph models consider only structural information, but many real-world networks also have labelled vertices and weighted edges. In this paper, we present a generative model for random graphs with discrete vertex labels and numeric edge weights. The weights are represented as a set of Beta Mixture Models (BMMs) with an arbitrary number of mixtures, which are learned from real-world networks. We propose a Bayesian Variational Inference (VI) approach, which yields an accurate estimation while keeping computation times tractable. We compare our approach to state-of-the-art random labelled graph generators and an earlier approach based on Gaussian Mixture Models (GMMs). Our results allow us to draw conclusions about the contribution of vertex labels and edge weights to graph structure.
Resumo:
During the last century mean global temperatures have been increasing. According to the predictions, the temperature change is expected to exceed 1.5ºC in this century and the warming is likely to continue. Freshwater ecosystems are among the most sensitive mainly due to changes in the hydrologic cycle and consequently changes in several physico-chemical parameters (e.g. pH, dissolved oxygen). Alterations in environmental parameters of freshwater systems are likely to affect distribution, morphology, physiology and richness of a wide range of species leading to important changes in ecosystem biodiversity and function. Moreover, they can also work as co-stressors in environments where organisms have already to cope with chemical contamination (such as pesticides), increasing the environmental risk due to potential interactions. Therefore, the objective of this work was to evaluate the effects of climate change related environmental parameters on the toxicity of pesticides to zebrafish embryos. The following environmental factors were studied: pH (3.0-12.0), dissolved oxygen level (0-8 mg/L) and UV radiation (0-500 mW/m2). The pesticides studied were the carbamate insecticide carbaryl and the benzimidazole fungicide carbendazim. Stressors were firstly tested separately in order to derive concentration- or intensity-response curves to further study the effects of binary combinations (environmental factors x pesticides) by applying mixture models. Characterization of zebrafish embryos response to environmental stress revealed that pH effects were fully established after 24 h of exposure and survival was only affected at pH values below 5 and above 10. Low oxygen levels also affected embryos development at concentrations below 4 mg/L (delay, heart rate decrease and edema), and at concentrations below 0.5 mg/L the survival was drastically reduced. Continuous exposure to UV radiation showed a strong time-dependent impact on embryos survival leading to 100% of mortality after 72 hours of exposure. The toxicity of pesticides carbaryl and carbendazim was characterized at several levels of biological organization including developmental, biochemical and behavioural allowing a mechanistic understanding of the effects and highlighting the usefulness of behavioural responses (locomotion) as a sensitive endpoint in ecotoxicology. Once the individual concentration response relationship of each stressor was established, a combined toxicity study was conducted to evaluate the effects of pH on the toxicity of carbaryl. We have shown that pH can modify the toxicity of the pesticide carbaryl. The conceptual model concentration addition allowed a precise prediction of the toxicity of the jointeffects of acid pH and carbaryl. Nevertheless, for alkaline condition both concepts failed in predicting the effects. Deviations to the model were however easy to explain as high pH values favour the hydrolysis of carbaryl with the consequent formation of the more toxic degradation product 1- naphtol. Although in the present study such explanatory process was easy to establish, for many other combinations the “interactive” nature is not so evident. In the context of the climate change few scenarios predict such increase in the pH of aquatic systems, however this was a first approach focused in the lethal effects only. In a second tier assessment effects at sublethal level would be sought and it is expectable that more subtle pH changes (more realistic in terms of climate changes scenarios) may have an effect at physiological and biochemical levels with possible long term consequences for the population fitness.
Resumo:
Tese apresentada como requisito parcial para obtenção do grau de Doutor em Estatística e Gestão de Informação pelo Instituto Superior de Estatística e Gestão de Informação da Universidade Nova de Lisboa
Resumo:
Affiliation: Département de Biochimie, Faculté de médecine, Université de Montréal
Resumo:
Les modèles à sur-représentation de zéros discrets et continus ont une large gamme d'applications et leurs propriétés sont bien connues. Bien qu'il existe des travaux portant sur les modèles discrets à sous-représentation de zéro et modifiés à zéro, la formulation usuelle des modèles continus à sur-représentation -- un mélange entre une densité continue et une masse de Dirac -- empêche de les généraliser afin de couvrir le cas de la sous-représentation de zéros. Une formulation alternative des modèles continus à sur-représentation de zéros, pouvant aisément être généralisée au cas de la sous-représentation, est présentée ici. L'estimation est d'abord abordée sous le paradigme classique, et plusieurs méthodes d'obtention des estimateurs du maximum de vraisemblance sont proposées. Le problème de l'estimation ponctuelle est également considéré du point de vue bayésien. Des tests d'hypothèses classiques et bayésiens visant à déterminer si des données sont à sur- ou sous-représentation de zéros sont présentées. Les méthodes d'estimation et de tests sont aussi évaluées au moyen d'études de simulation et appliquées à des données de précipitation agrégées. Les diverses méthodes s'accordent sur la sous-représentation de zéros des données, démontrant la pertinence du modèle proposé. Nous considérons ensuite la classification d'échantillons de données à sous-représentation de zéros. De telles données étant fortement non normales, il est possible de croire que les méthodes courantes de détermination du nombre de grappes s'avèrent peu performantes. Nous affirmons que la classification bayésienne, basée sur la distribution marginale des observations, tiendrait compte des particularités du modèle, ce qui se traduirait par une meilleure performance. Plusieurs méthodes de classification sont comparées au moyen d'une étude de simulation, et la méthode proposée est appliquée à des données de précipitation agrégées provenant de 28 stations de mesure en Colombie-Britannique.
Resumo:
Cette étude aborde le thème de l’utilisation des modèles de mélange de lois pour analyser des données de comportements et d’habiletés cognitives mesurées à plusieurs moments au cours du développement des enfants. L’estimation des mélanges de lois multinormales en utilisant l’algorithme EM est expliquée en détail. Cet algorithme simplifie beaucoup les calculs, car il permet d’estimer les paramètres de chaque groupe séparément, permettant ainsi de modéliser plus facilement la covariance des observations à travers le temps. Ce dernier point est souvent mis de côté dans les analyses de mélanges. Cette étude porte sur les conséquences d’une mauvaise spécification de la covariance sur l’estimation du nombre de groupes formant un mélange. La conséquence principale est la surestimation du nombre de groupes, c’est-à-dire qu’on estime des groupes qui n’existent pas. En particulier, l’hypothèse d’indépendance des observations à travers le temps lorsque ces dernières étaient corrélées résultait en l’estimation de plusieurs groupes qui n’existaient pas. Cette surestimation du nombre de groupes entraîne aussi une surparamétrisation, c’est-à-dire qu’on utilise plus de paramètres qu’il n’est nécessaire pour modéliser les données. Finalement, des modèles de mélanges ont été estimés sur des données de comportements et d’habiletés cognitives. Nous avons estimé les mélanges en supposant d’abord une structure de covariance puis l’indépendance. On se rend compte que dans la plupart des cas l’ajout d’une structure de covariance a pour conséquence d’estimer moins de groupes et les résultats sont plus simples et plus clairs à interpréter.
Resumo:
We present distribution independent bounds on the generalization misclassification performance of a family of kernel classifiers with margin. Support Vector Machine classifiers (SVM) stem out of this class of machines. The bounds are derived through computations of the $V_gamma$ dimension of a family of loss functions where the SVM one belongs to. Bounds that use functions of margin distributions (i.e. functions of the slack variables of SVM) are derived.
Resumo:
"Expectation-Maximization'' (EM) algorithm and gradient-based approaches for maximum likelihood learning of finite Gaussian mixtures. We show that the EM step in parameter space is obtained from the gradient via a projection matrix $P$, and we provide an explicit expression for the matrix. We then analyze the convergence of EM in terms of special properties of $P$ and provide new results analyzing the effect that $P$ has on the likelihood surface. Based on these mathematical results, we present a comparative discussion of the advantages and disadvantages of EM and other algorithms for the learning of Gaussian mixture models.
Resumo:
Real-world learning tasks often involve high-dimensional data sets with complex patterns of missing features. In this paper we review the problem of learning from incomplete data from two statistical perspectives---the likelihood-based and the Bayesian. The goal is two-fold: to place current neural network approaches to missing data within a statistical framework, and to describe a set of algorithms, derived from the likelihood-based framework, that handle clustering, classification, and function approximation from incomplete data in a principled and efficient manner. These algorithms are based on mixture modeling and make two distinct appeals to the Expectation-Maximization (EM) principle (Dempster, Laird, and Rubin 1977)---both for the estimation of mixture components and for coping with the missing data.
Resumo:
We formulate density estimation as an inverse operator problem. We then use convergence results of empirical distribution functions to true distribution functions to develop an algorithm for multivariate density estimation. The algorithm is based upon a Support Vector Machine (SVM) approach to solving inverse operator problems. The algorithm is implemented and tested on simulated data from different distributions and different dimensionalities, gaussians and laplacians in $R^2$ and $R^{12}$. A comparison in performance is made with Gaussian Mixture Models (GMMs). Our algorithm does as well or better than the GMMs for the simulations tested and has the added advantage of being automated with respect to parameters.
Resumo:
In most studies on civil wars, determinants of conflict have been hitherto explored assuming that actors involved were either unitary or stable. However, if this intra-group homogeneity assumption does not hold, empirical econometric estimates may be biased. We use Fixed Effects Finite Mixture Model (FE-FMM) approach to address this issue that provides a representation of heterogeneity when data originate from different latent classes and the affiliation is unknown. It allows to identify sub-populations within a population as well as the determinants of their behaviors. By combining various data sources for the period 2000-2005, we apply this methodology to the Colombian conflict. Our results highlight a behavioral heterogeneity in guerrilla’s armed groups and their distinct economic correlates. By contrast paramilitaries behave as a rather homogenous group.