938 resultados para DNA Sequence, Hidden Markov Model, Bayesian Model, Sensitive Analysis, Markov Chain Monte Carlo


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work addresses the problem of estimating the optimal value function in a Markov Decision Process from observed state-action pairs. We adopt a Bayesian approach to inference, which allows both the model to be estimated and predictions about actions to be made in a unified framework, providing a principled approach to mimicry of a controller on the basis of observed data. A new Markov chain Monte Carlo (MCMC) sampler is devised for simulation from theposterior distribution over the optimal value function. This step includes a parameter expansion step, which is shown to be essential for good convergence properties of the MCMC sampler. As an illustration, the method is applied to learning a human controller.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider the inverse reinforcement learning problem, that is, the problem of learning from, and then predicting or mimicking a controller based on state/action data. We propose a statistical model for such data, derived from the structure of a Markov decision process. Adopting a Bayesian approach to inference, we show how latent variables of the model can be estimated, and how predictions about actions can be made, in a unified framework. A new Markov chain Monte Carlo (MCMC) sampler is devised for simulation from the posterior distribution. This step includes a parameter expansion step, which is shown to be essential for good convergence properties of the MCMC sampler. As an illustration, the method is applied to learning a human controller.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Vibration and acoustic analysis at higher frequencies faces two challenges: computing the response without using an excessive number of degrees of freedom, and quantifying its uncertainty due to small spatial variations in geometry, material properties and boundary conditions. Efficient models make use of the observation that when the response of a decoupled vibro-acoustic subsystem is sufficiently sensitive to uncertainty in such spatial variations, the local statistics of its natural frequencies and mode shapes saturate to universal probability distributions. This holds irrespective of the causes that underly these spatial variations and thus leads to a nonparametric description of uncertainty. This work deals with the identification of uncertain parameters in such models by using experimental data. One of the difficulties is that both experimental errors and modeling errors, due to the nonparametric uncertainty that is inherent to the model type, are present. This is tackled by employing a Bayesian inference strategy. The prior probability distribution of the uncertain parameters is constructed using the maximum entropy principle. The likelihood function that is subsequently computed takes the experimental information, the experimental errors and the modeling errors into account. The posterior probability distribution, which is computed with the Markov Chain Monte Carlo method, provides a full uncertainty quantification of the identified parameters, and indicates how well their uncertainty is reduced, with respect to the prior information, by the experimental data. © 2013 Taylor & Francis Group, London.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider the problem of variable selection in regression modeling in high-dimensional spaces where there is known structure among the covariates. This is an unconventional variable selection problem for two reasons: (1) The dimension of the covariate space is comparable, and often much larger, than the number of subjects in the study, and (2) the covariate space is highly structured, and in some cases it is desirable to incorporate this structural information in to the model building process. We approach this problem through the Bayesian variable selection framework, where we assume that the covariates lie on an undirected graph and formulate an Ising prior on the model space for incorporating structural information. Certain computational and statistical problems arise that are unique to such high-dimensional, structured settings, the most interesting being the phenomenon of phase transitions. We propose theoretical and computational schemes to mitigate these problems. We illustrate our methods on two different graph structures: the linear chain and the regular graph of degree k. Finally, we use our methods to study a specific application in genomics: the modeling of transcription factor binding sites in DNA sequences. © 2010 American Statistical Association.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Statistical methods of inference typically require the likelihood function to be computable in a reasonable amount of time. The class of “likelihood-free” methods termed Approximate Bayesian Computation (ABC) is able to eliminate this requirement, replacing the evaluation of the likelihood with simulation from it. Likelihood-free methods have gained in efficiency and popularity in the past few years, following their integration with Markov Chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC) in order to better explore the parameter space. They have been applied primarily to estimating the parameters of a given model, but can also be used to compare models. Here we present novel likelihood-free approaches to model comparison, based upon the independent estimation of the evidence of each model under study. Key advantages of these approaches over previous techniques are that they allow the exploitation of MCMC or SMC algorithms for exploring the parameter space, and that they do not require a sampler able to mix between models. We validate the proposed methods using a simple exponential family problem before providing a realistic problem from human population genetics: the comparison of different demographic models based upon genetic data from the Y chromosome.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we propose a hybrid hazard regression model with threshold stress which includes the proportional hazards and the accelerated failure time models as particular cases. To express the behavior of lifetimes the generalized-gamma distribution is assumed and an inverse power law model with a threshold stress is considered. For parameter estimation we develop a sampling-based posterior inference procedure based on Markov Chain Monte Carlo techniques. We assume proper but vague priors for the parameters of interest. A simulation study investigates the frequentist properties of the proposed estimators obtained under the assumption of vague priors. Further, some discussions on model selection criteria are given. The methodology is illustrated on simulated and real lifetime data set.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A data set of a commercial Nellore beef cattle selection program was used to compare breeding models that assumed or not markers effects to estimate the breeding values, when a reduced number of animals have phenotypic, genotypic and pedigree information available. This herd complete data set was composed of 83,404 animals measured for weaning weight (WW), post-weaning gain (PWG), scrotal circumference (SC) and muscle score (MS), corresponding to 116,652 animals in the relationship matrix. Single trait analyses were performed by MTDFREML software to estimate fixed and random effects solutions using this complete data. The additive effects estimated were assumed as the reference breeding values for those animals. The individual observed phenotype of each trait was adjusted for fixed and random effects solutions, except for direct additive effects. The adjusted phenotype composed of the additive and residual parts of observed phenotype was used as dependent variable for models' comparison. Among all measured animals of this herd, only 3160 animals were genotyped for 106 SNP markers. Three models were compared in terms of changes on animals' rank, global fit and predictive ability. Model 1 included only polygenic effects, model 2 included only markers effects and model 3 included both polygenic and markers effects. Bayesian inference via Markov chain Monte Carlo methods performed by TM software was used to analyze the data for model comparison. Two different priors were adopted for markers effects in models 2 and 3, the first prior assumed was a uniform distribution (U) and, as a second prior, was assumed that markers effects were distributed as normal (N). Higher rank correlation coefficients were observed for models 3_U and 3_N, indicating a greater similarity of these models animals' rank and the rank based on the reference breeding values. Model 3_N presented a better global fit, as demonstrated by its low DIC. The best models in terms of predictive ability were models 1 and 3_N. Differences due prior assumed to markers effects in models 2 and 3 could be attributed to the better ability of normal prior in handle with collinear effects. The models 2_U and 2_N presented the worst performance, indicating that this small set of markers should not be used to genetically evaluate animals with no data, since its predictive ability is restricted. In conclusion, model 3_N presented a slight superiority when a reduce number of animals have phenotypic, genotypic and pedigree information. It could be attributed to the variation retained by markers and polygenic effects assumed together and the normal prior assumed to markers effects, that deals better with the collinearity between markers. (C) 2012 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Euastacus crayfish are endemic to freshwater ecosystems of the eastern coast of Australia. While recent evolutionary studies have focused on a few of these species, here we provide a comprehensive phylogenetic estimate of relationships among the species within the genus. We sequenced three mitochondrial gene regions (COI, 16S, and 12S) and one nuclear region (28S) from 40 species of the genus Euastacus, as well as one undescribed species. Using these data, we estimated the phylogenetic relationships within the genus using maximum-likelihood, parsimony, and Bayesian Markov Chain Monte Carlo analyses. Using Bayes factors to test different model hypotheses, we found that the best phylogeny supports monophyletic groupings of all but two recognized species and suggests a widespread ancestor that diverged by vicariance. We also show that Eitastacus and Astacopsis are most likely monophyletic sister genera. We use the resulting phylogeny as a framework to test biogeographic hypotheses relating to the diversification of the genus. (c) 2005 Elsevier Inc. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Eukaryotic genomes display segmental patterns of variation in various properties, including GC content and degree of evolutionary conservation. DNA segmentation algorithms are aimed at identifying statistically significant boundaries between such segments. Such algorithms may provide a means of discovering new classes of functional elements in eukaryotic genomes. This paper presents a model and an algorithm for Bayesian DNA segmentation and considers the feasibility of using it to segment whole eukaryotic genomes. The algorithm is tested on a range of simulated and real DNA sequences, and the following conclusions are drawn. Firstly, the algorithm correctly identifies non-segmented sequence, and can thus be used to reject the null hypothesis of uniformity in the property of interest. Secondly, estimates of the number and locations of change-points produced by the algorithm are robust to variations in algorithm parameters and initial starting conditions and correspond to real features in the data. Thirdly, the algorithm is successfully used to segment human chromosome 1 according to GC content, thus demonstrating the feasibility of Bayesian segmentation of eukaryotic genomes. The software described in this paper is available from the author's website (www.uq.edu.au/similar to uqjkeith/) or upon request to the author.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The ERS-1 satellite carries a scatterometer which measures the amount of radiation scattered back toward the satellite by the ocean's surface. These measurements can be used to infer wind vectors. The implementation of a neural network based forward model which maps wind vectors to radar backscatter is addressed. Input noise cannot be neglected. To account for this noise, a Bayesian framework is adopted. However, Markov Chain Monte Carlo sampling is too computationally expensive. Instead, gradient information is used with a non-linear optimisation algorithm to find the maximum em a posteriori probability values of the unknown variables. The resulting models are shown to compare well with the current operational model when visualised in the target space.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The ERS-1 satellite carries a scatterometer which measures the amount of radiation scattered back toward the satellite by the ocean's surface. These measurements can be used to infer wind vectors. The implementation of a neural network based forward model which maps wind vectors to radar backscatter is addressed. Input noise cannot be neglected. To account for this noise, a Bayesian framework is adopted. However, Markov Chain Monte Carlo sampling is too computationally expensive. Instead, gradient information is used with a non-linear optimisation algorithm to find the maximum em a posteriori probability values of the unknown variables. The resulting models are shown to compare well with the current operational model when visualised in the target space.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Introduced in this paper is a Bayesian model for isolating the resonant frequency from combustion chamber resonance. The model shown in this paper focused on characterising the initial rise in the resonant frequency to investigate the rise of in-cylinder bulk temperature associated with combustion. By resolving the model parameters, it is possible to determine: the start of pre-mixed combustion, the start of diffusion combustion, the initial resonant frequency, the resonant frequency as a function of crank angle, the in-cylinder bulk temperature as a function of crank angle and the trapped mass as a function of crank angle. The Bayesian method allows for individual cycles to be examined without cycle-averaging|allowing inter-cycle variability studies. Results are shown for a turbo-charged, common-rail compression ignition engine run at 2000 rpm and full load.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Phase-type distributions represent the time to absorption for a finite state Markov chain in continuous time, generalising the exponential distribution and providing a flexible and useful modelling tool. We present a new reversible jump Markov chain Monte Carlo scheme for performing a fully Bayesian analysis of the popular Coxian subclass of phase-type models; the convenient Coxian representation involves fewer parameters than a more general phase-type model. The key novelty of our approach is that we model covariate dependence in the mean whilst using the Coxian phase-type model as a very general residual distribution. Such incorporation of covariates into the model has not previously been attempted in the Bayesian literature. A further novelty is that we also propose a reversible jump scheme for investigating structural changes to the model brought about by the introduction of Erlang phases. Our approach addresses more questions of inference than previous Bayesian treatments of this model and is automatic in nature. We analyse an example dataset comprising lengths of hospital stays of a sample of patients collected from two Australian hospitals to produce a model for a patient's expected length of stay which incorporates the effects of several covariates. This leads to interesting conclusions about what contributes to length of hospital stay with implications for hospital planning. We compare our results with an alternative classical analysis of these data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis addresses computational challenges arising from Bayesian analysis of complex real-world problems. Many of the models and algorithms designed for such analysis are ‘hybrid’ in nature, in that they are a composition of components for which their individual properties may be easily described but the performance of the model or algorithm as a whole is less well understood. The aim of this research project is to after a better understanding of the performance of hybrid models and algorithms. The goal of this thesis is to analyse the computational aspects of hybrid models and hybrid algorithms in the Bayesian context. The first objective of the research focuses on computational aspects of hybrid models, notably a continuous finite mixture of t-distributions. In the mixture model, an inference of interest is the number of components, as this may relate to both the quality of model fit to data and the computational workload. The analysis of t-mixtures using Markov chain Monte Carlo (MCMC) is described and the model is compared to the Normal case based on the goodness of fit. Through simulation studies, it is demonstrated that the t-mixture model can be more flexible and more parsimonious in terms of number of components, particularly for skewed and heavytailed data. The study also reveals important computational issues associated with the use of t-mixtures, which have not been adequately considered in the literature. The second objective of the research focuses on computational aspects of hybrid algorithms for Bayesian analysis. Two approaches will be considered: a formal comparison of the performance of a range of hybrid algorithms and a theoretical investigation of the performance of one of these algorithms in high dimensions. For the first approach, the delayed rejection algorithm, the pinball sampler, the Metropolis adjusted Langevin algorithm, and the hybrid version of the population Monte Carlo (PMC) algorithm are selected as a set of examples of hybrid algorithms. Statistical literature shows how statistical efficiency is often the only criteria for an efficient algorithm. In this thesis the algorithms are also considered and compared from a more practical perspective. This extends to the study of how individual algorithms contribute to the overall efficiency of hybrid algorithms, and highlights weaknesses that may be introduced by the combination process of these components in a single algorithm. The second approach to considering computational aspects of hybrid algorithms involves an investigation of the performance of the PMC in high dimensions. It is well known that as a model becomes more complex, computation may become increasingly difficult in real time. In particular the importance sampling based algorithms, including the PMC, are known to be unstable in high dimensions. This thesis examines the PMC algorithm in a simplified setting, a single step of the general sampling, and explores a fundamental problem that occurs in applying importance sampling to a high-dimensional problem. The precision of the computed estimate from the simplified setting is measured by the asymptotic variance of the estimate under conditions on the importance function. Additionally, the exponential growth of the asymptotic variance with the dimension is demonstrated and we illustrates that the optimal covariance matrix for the importance function can be estimated in a special case.