950 resultados para Generalized Gibbs Sampler
Resumo:
Eukaryotic genomes display segmental patterns of variation in various properties, including GC content and degree of evolutionary conservation. DNA segmentation algorithms are aimed at identifying statistically significant boundaries between such segments. Such algorithms may provide a means of discovering new classes of functional elements in eukaryotic genomes. This paper presents a model and an algorithm for Bayesian DNA segmentation and considers the feasibility of using it to segment whole eukaryotic genomes. The algorithm is tested on a range of simulated and real DNA sequences, and the following conclusions are drawn. Firstly, the algorithm correctly identifies non-segmented sequence, and can thus be used to reject the null hypothesis of uniformity in the property of interest. Secondly, estimates of the number and locations of change-points produced by the algorithm are robust to variations in algorithm parameters and initial starting conditions and correspond to real features in the data. Thirdly, the algorithm is successfully used to segment human chromosome 1 according to GC content, thus demonstrating the feasibility of Bayesian segmentation of eukaryotic genomes. The software described in this paper is available from the author's website (www.uq.edu.au/similar to uqjkeith/) or upon request to the author.
Resumo:
A recent development of the Markov chain Monte Carlo (MCMC) technique is the emergence of MCMC samplers that allow transitions between different models. Such samplers make possible a range of computational tasks involving models, including model selection, model evaluation, model averaging and hypothesis testing. An example of this type of sampler is the reversible jump MCMC sampler, which is a generalization of the Metropolis-Hastings algorithm. Here, we present a new MCMC sampler of this type. The new sampler is a generalization of the Gibbs sampler, but somewhat surprisingly, it also turns out to encompass as particular cases all of the well-known MCMC samplers, including those of Metropolis, Barker, and Hastings. Moreover, the new sampler generalizes the reversible jump MCMC. It therefore appears to be a very general framework for MCMC sampling. This paper describes the new sampler and illustrates its use in three applications in Computational Biology, specifically determination of consensus sequences, phylogenetic inference and delineation of isochores via multiple change-point analysis.
Resumo:
There are several versions of the lognormal distribution in the statistical literature, one is based in the exponential transformation of generalized normal distribution (GN). This paper presents the Bayesian analysis for the generalized lognormal distribution (logGN) considering independent non-informative Jeffreys distributions for the parameters as well as the procedure for implementing the Gibbs sampler to obtain the posterior distributions of parameters. The results are used to analyze failure time models with right-censored and uncensored data. The proposed method is illustrated using actual failure time data of computers.
Resumo:
Generalized linear mixed models with semiparametric random effects are useful in a wide variety of Bayesian applications. When the random effects arise from a mixture of Dirichlet process (MDP) model, normal base measures and Gibbs sampling procedures based on the Pólya urn scheme are often used to simulate posterior draws. These algorithms are applicable in the conjugate case when (for a normal base measure) the likelihood is normal. In the non-conjugate case, the algorithms proposed by MacEachern and Müller (1998) and Neal (2000) are often applied to generate posterior samples. Some common problems associated with simulation algorithms for non-conjugate MDP models include convergence and mixing difficulties. This paper proposes an algorithm based on the Pólya urn scheme that extends the Gibbs sampling algorithms to non-conjugate models with normal base measures and exponential family likelihoods. The algorithm proceeds by making Laplace approximations to the likelihood function, thereby reducing the procedure to that of conjugate normal MDP models. To ensure the validity of the stationary distribution in the non-conjugate case, the proposals are accepted or rejected by a Metropolis-Hastings step. In the special case where the data are normally distributed, the algorithm is identical to the Gibbs sampler.
Resumo:
Com o objetivo de verificar a existência da interação genótipo x ambiente, sob a forma de heterogeneidade de variâncias para a produção de leite na espécie bubalina e o seu impacto na avaliação genética dos animais, utilizando a inferência Bayesiana por meio de Amostrador de Gibbs, foram utilizados 5.484 registros de produção de leite referentes à produções de 2.994 búfalas predominantemente Murrah, filhas de 150 reprodutores, acasalados com 1130 matrizes, cujos partos ocorreram entre os anos de 1974 e 2004. Os registros foram provenientes do Programa de Melhoramento Genético dos Bubalinos (PROMEBUL) com a adição de registros provenientes do rebanho da EMBRAPA Amazônia Oriental -EAO, localizada em Belém, Pará. Foram estabelecidas classes de rebanho-ano de parto e de acordo com o desvio padrão de cada classe, os registros de produção de leite foram classificados em classes de alto e baixo desvio-padrão fenotípico. Posteriormente, os dados foram analisados desconsiderando e considerando as classes de desvio-padrão. O modelo utilizado empregou os efeitos fixos referentes às classes de rebanho-ano, mês de parto e covariáveis idade da fêmea ao parto e duração da lactação, além do efeito aleatório de animal, ambiente permanente e ambiente temporário. Para os efeitos fixos, foi assumido distribuição à priori uniforme e para os componentes de (co)variâncias foram assumidas distribuições priori qui-quadrado inversa e Wishart invertida. As médias observadas e desvio-padrão para produção de leite nas classes de alto e baixo desvio-padrão e em análise geral, foram iguais a 1870,21±758,78, 1900,50±587,76 e 1885,48±677,98, respectivamente. As médias posteriores para os componentes de variâncias foram maiores na classe de alto desvio-padrão. A herdabilidade obtida na classe de alto desvio-padrão foi próxima do valor observado na análise geral e inferior ao valor encontrado na classe de baixo desvio-padrão fenotípico. A correlação genética para produção de leite entre as classes de desvio-padrão foi igual a 0,58. As correlações de Spearman entre os valores genéticos para a produção de leite obtidos em análise geral com os valores obtidos nas classes de alto e baixo desvio padrão foram iguais a 0,94 e 0,93, respectivamente, para todos os reprodutores. Para uma amostra dos 10 melhores reprodutores, as mesmas correlações foram iguais a 0,94 e 0,47, respectivamente. Tais resultados revelam presença de heterogeneidade de variâncias entre rebanhos e esta heterogeneidade de variâncias é resultante de fatores ambientais, que podem levar a uma classificação errônea dos melhores reprodutores geneticamente para a produção leite.
Resumo:
The paper investigates a Bayesian hierarchical model for the analysis of categorical longitudinal data from a large social survey of immigrants to Australia. Data for each subject are observed on three separate occasions, or waves, of the survey. One of the features of the data set is that observations for some variables are missing for at least one wave. A model for the employment status of immigrants is developed by introducing, at the first stage of a hierarchical model, a multinomial model for the response and then subsequent terms are introduced to explain wave and subject effects. To estimate the model, we use the Gibbs sampler, which allows missing data for both the response and the explanatory variables to be imputed at each iteration of the algorithm, given some appropriate prior distributions. After accounting for significant covariate effects in the model, results show that the relative probability of remaining unemployed diminished with time following arrival in Australia.
Resumo:
A class of multi-process models is developed for collections of time indexed count data. Autocorrelation in counts is achieved with dynamic models for the natural parameter of the binomial distribution. In addition to modeling binomial time series, the framework includes dynamic models for multinomial and Poisson time series. Markov chain Monte Carlo (MCMC) and Po ́lya-Gamma data augmentation (Polson et al., 2013) are critical for fitting multi-process models of counts. To facilitate computation when the counts are high, a Gaussian approximation to the P ́olya- Gamma random variable is developed.
Three applied analyses are presented to explore the utility and versatility of the framework. The first analysis develops a model for complex dynamic behavior of themes in collections of text documents. Documents are modeled as a “bag of words”, and the multinomial distribution is used to characterize uncertainty in the vocabulary terms appearing in each document. State-space models for the natural parameters of the multinomial distribution induce autocorrelation in themes and their proportional representation in the corpus over time.
The second analysis develops a dynamic mixed membership model for Poisson counts. The model is applied to a collection of time series which record neuron level firing patterns in rhesus monkeys. The monkey is exposed to two sounds simultaneously, and Gaussian processes are used to smoothly model the time-varying rate at which the neuron’s firing pattern fluctuates between features associated with each sound in isolation.
The third analysis presents a switching dynamic generalized linear model for the time-varying home run totals of professional baseball players. The model endows each player with an age specific latent natural ability class and a performance enhancing drug (PED) use indicator. As players age, they randomly transition through a sequence of ability classes in a manner consistent with traditional aging patterns. When the performance of the player significantly deviates from the expected aging pattern, he is identified as a player whose performance is consistent with PED use.
All three models provide a mechanism for sharing information across related series locally in time. The models are fit with variations on the P ́olya-Gamma Gibbs sampler, MCMC convergence diagnostics are developed, and reproducible inference is emphasized throughout the dissertation.