44 resultados para bayesian analysis
Resumo:
A nonparametric Bayesian extension of Factor Analysis (FA) is proposed where observed data $\mathbf{Y}$ is modeled as a linear superposition, $\mathbf{G}$, of a potentially infinite number of hidden factors, $\mathbf{X}$. The Indian Buffet Process (IBP) is used as a prior on $\mathbf{G}$ to incorporate sparsity and to allow the number of latent features to be inferred. The model's utility for modeling gene expression data is investigated using randomly generated data sets based on a known sparse connectivity matrix for E. Coli, and on three biological data sets of increasing complexity.
Resumo:
In this paper, an introduction to Bayesian methods in signal processing will be given. The paper starts by considering the important issues of model selection and parameter estimation and derives analytic expressions for the model probabilities of two simple models. The idea of marginal estimation of certain model parameter is then introduced and expressions are derived for the marginal probability densities for frequencies in white Gaussian noise and a Bayesian approach to general changepoint analysis is given. Numerical integration methods are introduced based on Markov chain Monte Carlo techniques and the Gibbs sampler in particular.
Resumo:
In this paper, an introduction to Bayesian methods in signal processing will be given. The paper starts by considering the important issues of model selection and parameter estimation and derives analytic expressions for the model probabilities of two simple models. The idea of marginal estimation of certain model parameter is then introduced and expressions are derived for the marginal probabilitiy densities for frequencies in white Gaussian noise and a Bayesian approach to general changepoint analysis is given. Numerical integration methods are introduced based on Markov chain Monte Carlo techniques and the Gibbs sampler in particular.
Resumo:
Condition-based maintenance is concerned with the collection and interpretation of data to support maintenance decisions. The non-intrusive nature of vibration data enables the monitoring of enclosed systems such as gearboxes. It remains a significant challenge to analyze vibration data that are generated under fluctuating operating conditions. This is especially true for situations where relatively little prior knowledge regarding the specific gearbox is available. It is therefore investigated how an adaptive time series model, which is based on Bayesian model selection, may be used to remove the non-fault related components in the structural response of a gear assembly to obtain a residual signal which is robust to fluctuating operating conditions. A statistical framework is subsequently proposed which may be used to interpret the structure of the residual signal in order to facilitate an intuitive understanding of the condition of the gear system. The proposed methodology is investigated on both simulated and experimental data from a single stage gearbox. © 2011 Elsevier Ltd. All rights reserved.
Resumo:
We present in this paper a new multivariate probabilistic approach to Acoustic Pulse Recognition (APR) for tangible interface applications. This model uses Principle Component Analysis (PCA) in a probabilistic framework to classify tapping pulses with a high degree of variability. It was found that this model, achieves a higher robustness to pulse variability than simpler template matching methods, specifically when allowed to train on data containing high variability. © 2011 IEEE.
Resumo:
We consider the general problem of constructing nonparametric Bayesian models on infinite-dimensional random objects, such as functions, infinite graphs or infinite permutations. The problem has generated much interest in machine learning, where it is treated heuristically, but has not been studied in full generality in non-parametric Bayesian statistics, which tends to focus on models over probability distributions. Our approach applies a standard tool of stochastic process theory, the construction of stochastic processes from their finite-dimensional marginal distributions. The main contribution of the paper is a generalization of the classic Kolmogorov extension theorem to conditional probabilities. This extension allows a rigorous construction of nonparametric Bayesian models from systems of finite-dimensional, parametric Bayes equations. Using this approach, we show (i) how existence of a conjugate posterior for the nonparametric model can be guaranteed by choosing conjugate finite-dimensional models in the construction, (ii) how the mapping to the posterior parameters of the nonparametric model can be explicitly determined, and (iii) that the construction of conjugate models in essence requires the finite-dimensional models to be in the exponential family. As an application of our constructive framework, we derive a model on infinite permutations, the nonparametric Bayesian analogue of a model recently proposed for the analysis of rank data.
Resumo:
We present a new haplotype-based approach for inferring local genetic ancestry of individuals in an admixed population. Most existing approaches for local ancestry estimation ignore the latent genetic relatedness between ancestral populations and treat them as independent. In this article, we exploit such information by building an inheritance model that describes both the ancestral populations and the admixed population jointly in a unified framework. Based on an assumption that the common hypothetical founder haplotypes give rise to both the ancestral and the admixed population haplotypes, we employ an infinite hidden Markov model to characterize each ancestral population and further extend it to generate the admixed population. Through an effective utilization of the population structural information under a principled nonparametric Bayesian framework, the resulting model is significantly less sensitive to the choice and the amount of training data for ancestral populations than state-of-the-art algorithms. We also improve the robustness under deviation from common modeling assumptions by incorporating population-specific scale parameters that allow variable recombination rates in different populations. Our method is applicable to an admixed population from an arbitrary number of ancestral populations and also performs competitively in terms of spurious ancestry proportions under a general multiway admixture assumption. We validate the proposed method by simulation under various admixing scenarios and present empirical analysis results from a worldwide-distributed dataset from the Human Genome Diversity Project.
Resumo:
We live in an era of abundant data. This has necessitated the development of new and innovative statistical algorithms to get the most from experimental data. For example, faster algorithms make practical the analysis of larger genomic data sets, allowing us to extend the utility of cutting-edge statistical methods. We present a randomised algorithm that accelerates the clustering of time series data using the Bayesian Hierarchical Clustering (BHC) statistical method. BHC is a general method for clustering any discretely sampled time series data. In this paper we focus on a particular application to microarray gene expression data. We define and analyse the randomised algorithm, before presenting results on both synthetic and real biological data sets. We show that the randomised algorithm leads to substantial gains in speed with minimal loss in clustering quality. The randomised time series BHC algorithm is available as part of the R package BHC, which is available for download from Bioconductor (version 2.10 and above) via http://bioconductor.org/packages/2.10/bioc/html/BHC.html. We have also made available a set of R scripts which can be used to reproduce the analyses carried out in this paper. These are available from the following URL. https://sites.google.com/site/randomisedbhc/.
Resumo:
We consider a method for approximate inference in hidden Markov models (HMMs). The method circumvents the need to evaluate conditional densities of observations given the hidden states. It may be considered an instance of Approximate Bayesian Computation (ABC) and it involves the introduction of auxiliary variables valued in the same space as the observations. The quality of the approximation may be controlled to arbitrary precision through a parameter ε > 0. We provide theoretical results which quantify, in terms of ε, the ABC error in approximation of expectations of additive functionals with respect to the smoothing distributions. Under regularity assumptions, this error is, where n is the number of time steps over which smoothing is performed. For numerical implementation, we adopt the forward-only sequential Monte Carlo (SMC) scheme of [14] and quantify the combined error from the ABC and SMC approximations. This forms some of the first quantitative results for ABC methods which jointly treat the ABC and simulation errors, with a finite number of data and simulated samples. © Taylor & Francis Group, LLC.
Resumo:
The design and construction of deep excavations in urban environment is often governed by serviceability limit state related to the risk of damage to adjacent buildings. In current practice, the assessment of excavation-induced building damage has focused on a deterministic approach. This paper presents a component/system reliability analysis framework to assess the probability that specified threshold design criteria for multiple serviceability limit states are exceeded. A recently developed Bayesian probabilistic framework is used to update the predictions of ground movements in the later stages of excavation based on the recorded deformation measurements. An example is presented to show how the serviceability performance for excavation problems can be assessed based on the component/system reliability analysis. © 2011 ASCE.
Resumo:
Some amount of differential settlement occurs even in the most uniform soil deposit, but it is extremely difficult to estimate because of the natural heterogeneity of the soil. The compression response of the soil and its variability must be characterised in order to estimate the probability of the differential settlement exceeding a certain threshold value. The work presented in this paper introduces a probabilistic framework to address this issue in a rigorous manner, while preserving the format of a typical geotechnical settlement analysis. In order to avoid dealing with different approaches for each category of soil, a simplified unified compression model is used to characterise the nonlinear compression behavior of soils of varying gradation through a single constitutive law. The Bayesian updating rule is used to incorporate information from three different laboratory datasets in the computation of the statistics (estimates of the means and covariance matrix) of the compression model parameters, as well as of the uncertainty inherent in the model.
Resumo:
Vibration and acoustic analysis at higher frequencies faces two challenges: computing the response without using an excessive number of degrees of freedom, and quantifying its uncertainty due to small spatial variations in geometry, material properties and boundary conditions. Efficient models make use of the observation that when the response of a decoupled vibro-acoustic subsystem is sufficiently sensitive to uncertainty in such spatial variations, the local statistics of its natural frequencies and mode shapes saturate to universal probability distributions. This holds irrespective of the causes that underly these spatial variations and thus leads to a nonparametric description of uncertainty. This work deals with the identification of uncertain parameters in such models by using experimental data. One of the difficulties is that both experimental errors and modeling errors, due to the nonparametric uncertainty that is inherent to the model type, are present. This is tackled by employing a Bayesian inference strategy. The prior probability distribution of the uncertain parameters is constructed using the maximum entropy principle. The likelihood function that is subsequently computed takes the experimental information, the experimental errors and the modeling errors into account. The posterior probability distribution, which is computed with the Markov Chain Monte Carlo method, provides a full uncertainty quantification of the identified parameters, and indicates how well their uncertainty is reduced, with respect to the prior information, by the experimental data. © 2013 Taylor & Francis Group, London.