263 resultados para Nonparametric Estimation
em Cambridge University Engineering Department Publications Database
Resumo:
We present a new haplotype-based approach for inferring local genetic ancestry of individuals in an admixed population. Most existing approaches for local ancestry estimation ignore the latent genetic relatedness between ancestral populations and treat them as independent. In this article, we exploit such information by building an inheritance model that describes both the ancestral populations and the admixed population jointly in a unified framework. Based on an assumption that the common hypothetical founder haplotypes give rise to both the ancestral and the admixed population haplotypes, we employ an infinite hidden Markov model to characterize each ancestral population and further extend it to generate the admixed population. Through an effective utilization of the population structural information under a principled nonparametric Bayesian framework, the resulting model is significantly less sensitive to the choice and the amount of training data for ancestral populations than state-of-the-art algorithms. We also improve the robustness under deviation from common modeling assumptions by incorporating population-specific scale parameters that allow variable recombination rates in different populations. Our method is applicable to an admixed population from an arbitrary number of ancestral populations and also performs competitively in terms of spurious ancestry proportions under a general multiway admixture assumption. We validate the proposed method by simulation under various admixing scenarios and present empirical analysis results from a worldwide-distributed dataset from the Human Genome Diversity Project.
Resumo:
We present the Gaussian process density sampler (GPDS), an exchangeable generative model for use in nonparametric Bayesian density estimation. Samples drawn from the GPDS are consistent with exact, independent samples from a distribution defined by a density that is a transformation of a function drawn from a Gaussian process prior. Our formulation allows us to infer an unknown density from data using Markov chain Monte Carlo, which gives samples from the posterior distribution over density functions and from the predictive distribution on data space. We describe two such MCMC methods. Both methods also allow inference of the hyperparameters of the Gaussian process.
Resumo:
The mixtures of factor analyzers (MFA) model allows data to be modeled as a mixture of Gaussians with a reduced parametrization. We present the formulation of a nonparametric form of the MFA model, the Dirichlet process MFA (DPMFA). The proposed model can be used for density estimation or clustering of high dimensiona data. We utilize the DPMFA for clustering the action potentials of different neurons from extracellular recordings, a problem known as spike sorting. DPMFA model is compared to Dirichlet process mixtures of Gaussians model (DPGMM) which has a higher computational complexity. We show that DPMFA has similar modeling performance in lower dimensions when compared to DPGMM, and is able to work in higher dimensions. ©2009 IEEE.
Resumo:
A mixture of Gaussians fit to a single curved or heavy-tailed cluster will report that the data contains many clusters. To produce more appropriate clusterings, we introduce a model which warps a latent mixture of Gaussians to produce nonparametric cluster shapes. The possibly low-dimensional latent mixture model allows us to summarize the properties of the high-dimensional clusters (or density manifolds) describing the data. The number of manifolds, as well as the shape and dimension of each manifold is automatically inferred. We derive a simple inference scheme for this model which analytically integrates out both the mixture parameters and the warping function. We show that our model is effective for density estimation, performs better than infinite Gaussian mixture models at recovering the true number of clusters, and produces interpretable summaries of high-dimensional datasets.
Resumo:
Modern technology has allowed real-time data collection in a variety of domains, ranging from environmental monitoring to healthcare. Consequently, there is a growing need for algorithms capable of performing inferential tasks in an online manner, continuously revising their estimates to reflect the current status of the underlying process. In particular, we are interested in constructing online and temporally adaptive classifiers capable of handling the possibly drifting decision boundaries arising in streaming environments. We first make a quadratic approximation to the log-likelihood that yields a recursive algorithm for fitting logistic regression online. We then suggest a novel way of equipping this framework with self-tuning forgetting factors. The resulting scheme is capable of tracking changes in the underlying probability distribution, adapting the decision boundary appropriately and hence maintaining high classification accuracy in dynamic or unstable environments. We demonstrate the scheme's effectiveness in both real and simulated streaming environments. © Springer-Verlag 2009.
Resumo:
We present methods for fixed-lag smoothing using Sequential Importance sampling (SIS) on a discrete non-linear, non-Gaussian state space system with unknown parameters. Our particular application is in the field of digital communication systems. Each input data point is taken from a finite set of symbols. We represent transmission media as a fixed filter with a finite impulse response (FIR), hence a discrete state-space system is formed. Conventional Markov chain Monte Carlo (MCMC) techniques such as the Gibbs sampler are unsuitable for this task because they can only perform processing on a batch of data. Data arrives sequentially, so it would seem sensible to process it in this way. In addition, many communication systems are interactive, so there is a maximum level of latency that can be tolerated before a symbol is decoded. We will demonstrate this method by simulation and compare its performance to existing techniques.
Resumo:
We develop methods for performing filtering and smoothing in non-linear non-Gaussian dynamical models. The methods rely on a particle cloud representation of the filtering distribution which evolves through time using importance sampling and resampling ideas. In particular, novel techniques are presented for generation of random realisations from the joint smoothing distribution and for MAP estimation of the state sequence. Realisations of the smoothing distribution are generated in a forward-backward procedure, while the MAP estimation procedure can be performed in a single forward pass of the Viterbi algorithm applied to a discretised version of the state space. An application to spectral estimation for time-varying autoregressions is described.
Resumo:
In the present study, we report the hydrogen content estimation of the hydrogenated amorphous carbon (a-C:H) films using visible Raman spectroscopy in a fast and nondestructive way. Hydrogenated diamondlike carbon films were deposited by the plasma enhanced chemical vapor deposition, plasma beam source, and integrated distributed electron cyclotron resonance techniques. Methane and acetylene were used as source gases resulting in different hydrogen content and sp2/sp3 fraction. Ultraviolet-visible (UV-Vis) spectroscopic ellipsometry (1.5-5 eV) as well as UV-Vis spectroscopy were provided with the optical band gap (Tauc gap). The sp2/sp3 fraction and the hydrogen content were independently estimated by electron energy loss spectroscopy and elastic recoil detection analysis-Rutherford back scattering, respectively. The Raman spectra that were acquired in the visible region using the 488 nm line shows the superposition of Raman features on a photoluminescence (PL) background. The direct relationship of the sp2 content and the optical band gap has been confirmed. The difference in the PL background for samples of the same optical band gap (sp2 content) and different hydrogen content was demonstrated and an empirical relationship between the visible Raman spectra PL background slope and the corresponding hydrogen content was extracted. © 2004 American Institute of Physics.
Resumo:
The inhomogeneous Poisson process is a point process that has varying intensity across its domain (usually time or space). For nonparametric Bayesian modeling, the Gaussian process is a useful way to place a prior distribution on this intensity. The combination of a Poisson process and GP is known as a Gaussian Cox process, or doubly-stochastic Poisson process. Likelihood-based inference in these models requires an intractable integral over an infinite-dimensional random function. In this paper we present the first approach to Gaussian Cox processes in which it is possible to perform inference without introducing approximations or finitedimensional proxy distributions. We call our method the Sigmoidal Gaussian Cox Process, which uses a generative model for Poisson data to enable tractable inference via Markov chain Monte Carlo. We compare our methods to competing methods on synthetic data and apply it to several real-world data sets. Copyright 2009.
Resumo:
The inhomogeneous Poisson process is a point process that has varying intensity across its domain (usually time or space). For nonparametric Bayesian modeling, the Gaussian process is a useful way to place a prior distribution on this intensity. The combination of a Poisson process and GP is known as a Gaussian Cox process, or doubly-stochastic Poisson process. Likelihood-based inference in these models requires an intractable integral over an infinite-dimensional random function. In this paper we present the first approach to Gaussian Cox processes in which it is possible to perform inference without introducing approximations or finite-dimensional proxy distributions. We call our method the Sigmoidal Gaussian Cox Process, which uses a generative model for Poisson data to enable tractable inference via Markov chain Monte Carlo. We compare our methods to competing methods on synthetic data and apply it to several real-world data sets.
Resumo:
Approximate Bayesian computation (ABC) is a popular technique for analysing data for complex models where the likelihood function is intractable. It involves using simulation from the model to approximate the likelihood, with this approximate likelihood then being used to construct an approximate posterior. In this paper, we consider methods that estimate the parameters by maximizing the approximate likelihood used in ABC. We give a theoretical analysis of the asymptotic properties of the resulting estimator. In particular, we derive results analogous to those of consistency and asymptotic normality for standard maximum likelihood estimation. We also discuss how sequential Monte Carlo methods provide a natural method for implementing our likelihood-based ABC procedures.