928 resultados para Bayesian nonparametric


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Despite its importance, choosing the structural form of the kernel in nonparametric regression remains a black art. We define a space of kernel structures which are built compositionally by adding and multiplying a small number of base kernels. We present a method for searching over this space of structures which mirrors the scientific discovery process. The learned structures can often decompose functions into interpretable components and enable long-range extrapolation on time-series datasets. Our structure search method outperforms many widely used kernels and kernel combination methods on a variety of prediction tasks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We live in an era of abundant data. This has necessitated the development of new and innovative statistical algorithms to get the most from experimental data. For example, faster algorithms make practical the analysis of larger genomic data sets, allowing us to extend the utility of cutting-edge statistical methods. We present a randomised algorithm that accelerates the clustering of time series data using the Bayesian Hierarchical Clustering (BHC) statistical method. BHC is a general method for clustering any discretely sampled time series data. In this paper we focus on a particular application to microarray gene expression data. We define and analyse the randomised algorithm, before presenting results on both synthetic and real biological data sets. We show that the randomised algorithm leads to substantial gains in speed with minimal loss in clustering quality. The randomised time series BHC algorithm is available as part of the R package BHC, which is available for download from Bioconductor (version 2.10 and above) via http://bioconductor.org/packages/2.10/bioc/html/BHC.html. We have also made available a set of R scripts which can be used to reproduce the analyses carried out in this paper. These are available from the following URL. https://sites.google.com/site/randomisedbhc/.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a nonparametric Bayesian method for disease subtype discovery in multi-dimensional cancer data. Our method can simultaneously analyse a wide range of data types, allowing for both agreement and disagreement between their underlying clustering structure. It includes feature selection and infers the most likely number of disease subtypes, given the data. We apply the method to 277 glioblastoma samples from The Cancer Genome Atlas, for which there are gene expression, copy number variation, methylation and microRNA data. We identify 8 distinct consensus subtypes and study their prognostic value for death, new tumour events, progression and recurrence. The consensus subtypes are prognostic of tumour recurrence (log-rank p-value of $3.6 \times 10^{-4}$ after correction for multiple hypothesis tests). This is driven principally by the methylation data (log-rank p-value of $2.0 \times 10^{-3}$) but the effect is strengthened by the other 3 data types, demonstrating the value of integrating multiple data types. Of particular note is a subtype of 47 patients characterised by very low levels of methylation. This subtype has very low rates of tumour recurrence and no new events in 10 years of follow up. We also identify a small gene expression subtype of 6 patients that shows particularly poor survival outcomes. Additionally, we note a consensus subtype that showly a highly distinctive data signature and suggest that it is therefore a biologically distinct subtype of glioblastoma. The code is available from https://sites.google.com/site/multipledatafusion/

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider a method for approximate inference in hidden Markov models (HMMs). The method circumvents the need to evaluate conditional densities of observations given the hidden states. It may be considered an instance of Approximate Bayesian Computation (ABC) and it involves the introduction of auxiliary variables valued in the same space as the observations. The quality of the approximation may be controlled to arbitrary precision through a parameter ε > 0. We provide theoretical results which quantify, in terms of ε, the ABC error in approximation of expectations of additive functionals with respect to the smoothing distributions. Under regularity assumptions, this error is, where n is the number of time steps over which smoothing is performed. For numerical implementation, we adopt the forward-only sequential Monte Carlo (SMC) scheme of [14] and quantify the combined error from the ABC and SMC approximations. This forms some of the first quantitative results for ABC methods which jointly treat the ABC and simulation errors, with a finite number of data and simulated samples. © Taylor & Francis Group, LLC.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Numerical integration is a key component of many problems in scientific computing, statistical modelling, and machine learning. Bayesian Quadrature is a modelbased method for numerical integration which, relative to standard Monte Carlo methods, offers increased sample efficiency and a more robust estimate of the uncertainty in the estimated integral. We propose a novel Bayesian Quadrature approach for numerical integration when the integrand is non-negative, such as the case of computing the marginal likelihood, predictive distribution, or normalising constant of a probabilistic model. Our approach approximately marginalises the quadrature model's hyperparameters in closed form, and introduces an active learning scheme to optimally select function evaluations, as opposed to using Monte Carlo samples. We demonstrate our method on both a number of synthetic benchmarks and a real scientific problem from astronomy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The ground movements induced by the construction of supported excavation systems are generally predicted by empirical/semi-empirical methods in the design stage. However, these methods cannot account for the site-specific conditions and for information that becomes available as an excavation proceeds. A Bayesian updating methodology is proposed to update the predictions of ground movements in the later stages of excavation based on recorded deformation measurements. As an application, the proposed framework is used to predict the three-dimensional deformation shapes at four incremental excavation stages of an actual supported excavation project. © 2011 Taylor & Francis Group, London.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The ground movements induced by the construction of supported excavation systems are generally predicted in the design stage by empirical/semi-empirical methods. However, these methods cannot account for the site-specific conditions and for information that become available as an excavation proceeds. A Bayesian updating methodology is proposed to update the predictions of ground movements in the later stages of excavation based on recorded deformation measurements. As an application, the proposed framework is used to predict the three-dimensional deformation shapes at four incremental excavation stages of an actual supported excavation project. Copyright © ASCE 2011.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Some amount of differential settlement occurs even in the most uniform soil deposit, but it is extremely difficult to estimate because of the natural heterogeneity of the soil. The compression response of the soil and its variability must be characterised in order to estimate the probability of the differential settlement exceeding a certain threshold value. The work presented in this paper introduces a probabilistic framework to address this issue in a rigorous manner, while preserving the format of a typical geotechnical settlement analysis. In order to avoid dealing with different approaches for each category of soil, a simplified unified compression model is used to characterise the nonlinear compression behavior of soils of varying gradation through a single constitutive law. The Bayesian updating rule is used to incorporate information from three different laboratory datasets in the computation of the statistics (estimates of the means and covariance matrix) of the compression model parameters, as well as of the uncertainty inherent in the model.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider the inverse reinforcement learning problem, that is, the problem of learning from, and then predicting or mimicking a controller based on state/action data. We propose a statistical model for such data, derived from the structure of a Markov decision process. Adopting a Bayesian approach to inference, we show how latent variables of the model can be estimated, and how predictions about actions can be made, in a unified framework. A new Markov chain Monte Carlo (MCMC) sampler is devised for simulation from the posterior distribution. This step includes a parameter expansion step, which is shown to be essential for good convergence properties of the MCMC sampler. As an illustration, the method is applied to learning a human controller.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We introduce a conceptually novel structured prediction model, GPstruct, which is kernelized, non-parametric and Bayesian, by design. We motivate the model with respect to existing approaches, among others, conditional random fields (CRFs), maximum margin Markov networks (M3N), and structured support vector machines (SVMstruct), which embody only a subset of its properties. We present an inference procedure based on Markov Chain Monte Carlo. The framework can be instantiated for a wide range of structured objects such as linear chains, trees, grids, and other general graphs. As a proof of concept, the model is benchmarked on several natural language processing tasks and a video gesture segmentation task involving a linear chain structure. We show prediction accuracies for GPstruct which are comparable to or exceeding those of CRFs and SVMstruct.