894 resultados para gaussian mixture model
Resumo:
We propose a model-based approach to unify clustering and network modeling using time-course gene expression data. Specifically, our approach uses a mixture model to cluster genes. Genes within the same cluster share a similar expression profile. The network is built over cluster-specific expression profiles using state-space models. We discuss the application of our model to simulated data as well as to time-course gene expression data arising from animal models on prostate cancer progression. The latter application shows that with a combined statistical/bioinformatics analyses, we are able to extract gene-to-gene relationships supported by the literature as well as new plausible relationships.
Resumo:
In this paper, the goal of identifying disease subgroups based on differences in observed symptom profile is considered. Commonly referred to as phenotype identification, solutions to this task often involve the application of unsupervised clustering techniques. In this paper, we investigate the application of a Dirichlet Process mixture (DPM) model for this task. This model is defined by the placement of the Dirichlet Process (DP) on the unknown components of a mixture model, allowing for the expression of uncertainty about the partitioning of observed data into homogeneous subgroups. To exemplify this approach, an application to phenotype identification in Parkinson’s disease (PD) is considered, with symptom profiles collected using the Unified Parkinson’s Disease Rating Scale (UPDRS). Clustering, Dirichlet Process mixture, Parkinson’s disease, UPDRS.
Resumo:
This study considered the problem of predicting survival, based on three alternative models: a single Weibull, a mixture of Weibulls and a cure model. Instead of the common procedure of choosing a single “best” model, where “best” is defined in terms of goodness of fit to the data, a Bayesian model averaging (BMA) approach was adopted to account for model uncertainty. This was illustrated using a case study in which the aim was the description of lymphoma cancer survival with covariates given by phenotypes and gene expression. The results of this study indicate that if the sample size is sufficiently large, one of the three models emerge as having highest probability given the data, as indicated by the goodness of fit measure; the Bayesian information criterion (BIC). However, when the sample size was reduced, no single model was revealed as “best”, suggesting that a BMA approach would be appropriate. Although a BMA approach can compromise on goodness of fit to the data (when compared to the true model), it can provide robust predictions and facilitate more detailed investigation of the relationships between gene expression and patient survival. Keywords: Bayesian modelling; Bayesian model averaging; Cure model; Markov Chain Monte Carlo; Mixture model; Survival analysis; Weibull distribution
Resumo:
We present a novel approach for developing summary statistics for use in approximate Bayesian computation (ABC) algorithms using indirect infer- ence. We embed this approach within a sequential Monte Carlo algorithm that is completely adaptive. This methodological development was motivated by an application involving data on macroparasite population evolution modelled with a trivariate Markov process. The main objective of the analysis is to compare inferences on the Markov process when considering two di®erent indirect mod- els. The two indirect models are based on a Beta-Binomial model and a three component mixture of Binomials, with the former providing a better ¯t to the observed data.
Resumo:
The recently discovered twist phase is studied in the context of the full ten-parameter family of partially coherent general anisotropic Gaussian Schell-model beams. It is shown that the nonnegativity requirement on the cross-spectral density of the beam demands that the strength of the twist phase be bounded from above by the inverse of the transverse coherence area of the beam. The twist phase as a two-point function is shown to have the structure of the generalized Huygens kernel or Green's function of a first-order system. The ray-transfer matrix of this system is exhibited. Wolf-type coherent-mode decomposition of the twist phase is carried out. Imposition of the twist phase on an otherwise untwisted beam is shown to result in a linear transformation in the ray phase space of the Wigner distribution. Though this transformation preserves the four-dimensional phase-space volume, it is not symplectic and hence it can, when impressed on a Wigner distribution, push it out of the convex set of all bona fide Wigner distributions unless the original Wigner distribution was sufficiently deep into the interior of the set.
Resumo:
Variable Endmember Constrained Least Square (VECLS) technique is proposed to account endmember variability in the linear mixture model by incorporating the variance for each class, the signals of which varies from pixel to pixel due to change in urban land cover (LC) structures. VECLS is first tested with a computer simulated three class endmember considering four bands having small, medium and large variability with three different spatial resolutions. The technique is next validated with real datasets of IKONOS, Landsat ETM+ and MODIS. The results show that correlation between actual and estimated proportion is higher by an average of 0.25 for the artificial datasets compared to a situation where variability is not considered. With IKONOS, Landsat ETM+ and MODIS data, the average correlation increased by 0.15 for 2 and 3 classes and by 0.19 for 4 classes, when compared to single endmember per class. (C) 2013 COSPAR. Published by Elsevier Ltd. All rights reserved.
Three-dimensional localization of multiple acoustic sources in shallow ocean with non-Gaussian noise
Resumo:
In this paper, a low-complexity algorithm SAGE-USL is presented for 3-dimensional (3-D) localization of multiple acoustic sources in a shallow ocean with non-Gaussian ambient noise, using a vertical and a horizontal linear array of sensors. In the proposed method, noise is modeled as a Gaussian mixture. Initial estimates of the unknown parameters (source coordinates, signal waveforms and noise parameters) are obtained by known/conventional methods, and a generalized expectation maximization algorithm is used to update the initial estimates iteratively. Simulation results indicate that convergence is reached in a small number of (<= 10) iterations. Initialization requires one 2-D search and one 1-D search, and the iterative updates require a sequence of 1-D searches. Therefore the computational complexity of the SAGE-USL algorithm is lower than that of conventional techniques such as 3-D MUSIC by several orders of magnitude. We also derive the Cramer-Rao Bound (CRB) for 3-D localization of multiple sources in a range-independent ocean. Simulation results are presented to show that the root-mean-square localization errors of SAGE-USL are close to the corresponding CRBs and significantly lower than those of 3-D MUSIC. (C) 2014 Elsevier Inc. All rights reserved.
Resumo:
The use of mixture-model techniques for motion estimation and image sequence segmentation was discussed. The issues such as modeling of occlusion and uncovering, determining the relative depth of the objects in a scene, and estimating the number of objects in a scene were also investigated. The segmentation algorithm was found to be computationally demanding, but the computational requirements were reduced as the motion parameters and segmentation of the frame were initialized. The method provided a stable description, in whichthe addition and removal of objects from the description corresponded to the entry and exit of objects from the scene.
Resumo:
A mixture of Gaussians fit to a single curved or heavy-tailed cluster will report that the data contains many clusters. To produce more appropriate clusterings, we introduce a model which warps a latent mixture of Gaussians to produce nonparametric cluster shapes. The possibly low-dimensional latent mixture model allows us to summarize the properties of the high-dimensional clusters (or density manifolds) describing the data. The number of manifolds, as well as the shape and dimension of each manifold is automatically inferred. We derive a simple inference scheme for this model which analytically integrates out both the mixture parameters and the warping function. We show that our model is effective for density estimation, performs better than infinite Gaussian mixture models at recovering the true number of clusters, and produces interpretable summaries of high-dimensional datasets.
Resumo:
We introduce a Gaussian process model of functions which are additive. An additive function is one which decomposes into a sum of low-dimensional functions, each depending on only a subset of the input variables. Additive GPs generalize both Generalized Additive Models, and the standard GP models which use squared-exponential kernels. Hyperparameter learning in this model can be seen as Bayesian Hierarchical Kernel Learning (HKL). We introduce an expressive but tractable parameterization of the kernel function, which allows efficient evaluation of all input interaction terms, whose number is exponential in the input dimension. The additional structure discoverable by this model results in increased interpretability, as well as state-of-the-art predictive power in regression tasks.
Resumo:
MOTIVATION: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct-but often complementary-information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets. RESULTS: Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI's performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation-chip and protein-protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques-as well as to non-integrative approaches-demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods.
Resumo:
We investigate the Student-t process as an alternative to the Gaussian process as a non-parametric prior over functions. We derive closed form expressions for the marginal likelihood and predictive distribution of a Student-t process, by integrating away an inverse Wishart process prior over the co-variance kernel of a Gaussian process model. We show surprising equivalences between different hierarchical Gaussian process models leading to Student-t processes, and derive a new sampling scheme for the inverse Wishart process, which helps elucidate these equivalences. Overall, we show that a Student-t process can retain the attractive properties of a Gaussian process - a nonparamet-ric representation, analytic marginal and predictive distributions, and easy model selection through covariance kernels - but has enhanced flexibility, and predictive covariances that, unlike a Gaussian process, explicitly depend on the values of training observations. We verify empirically that a Student-t process is especially useful in situations where there are changes in covariance structure, or in applications such as Bayesian optimization, where accurate predictive covariances are critical for good performance. These advantages come at no additional computational cost over Gaussian processes.
Resumo:
We present an image-based approach to infer 3D structure parameters using a probabilistic "shape+structure'' model. The 3D shape of a class of objects may be represented by sets of contours from silhouette views simultaneously observed from multiple calibrated cameras. Bayesian reconstructions of new shapes can then be estimated using a prior density constructed with a mixture model and probabilistic principal components analysis. We augment the shape model to incorporate structural features of interest; novel examples with missing structure parameters may then be reconstructed to obtain estimates of these parameters. Model matching and parameter inference are done entirely in the image domain and require no explicit 3D construction. Our shape model enables accurate estimation of structure despite segmentation errors or missing views in the input silhouettes, and works even with only a single input view. Using a dataset of thousands of pedestrian images generated from a synthetic model, we can perform accurate inference of the 3D locations of 19 joints on the body based on observed silhouette contours from real images.
Resumo:
We address the problem of non-linearity in 2D Shape modelling of a particular articulated object: the human body. This issue is partially resolved by applying a different Point Distribution Model (PDM) depending on the viewpoint. The remaining non-linearity is solved by using Gaussian Mixture Models (GMM). A dynamic-based clustering is proposed and carried out in the Pose Eigenspace. A fundamental question when clustering is to determine the optimal number of clusters. From our point of view, the main aspect to be evaluated is the mean gaussianity. This partitioning is then used to fit a GMM to each one of the view-based PDM, derived from a database of Silhouettes and Skeletons. Dynamic correspondences are then obtained between gaussian models of the 4 mixtures. Finally, we compare this approach with other two methods we previously developed to cope with non-linearity: Nearest Neighbor (NN) Classifier and Independent Component Analysis (ICA).