849 resultados para GIBBS SAMPLER
Resumo:
Eukaryotic genomes display segmental patterns of variation in various properties, including GC content and degree of evolutionary conservation. DNA segmentation algorithms are aimed at identifying statistically significant boundaries between such segments. Such algorithms may provide a means of discovering new classes of functional elements in eukaryotic genomes. This paper presents a model and an algorithm for Bayesian DNA segmentation and considers the feasibility of using it to segment whole eukaryotic genomes. The algorithm is tested on a range of simulated and real DNA sequences, and the following conclusions are drawn. Firstly, the algorithm correctly identifies non-segmented sequence, and can thus be used to reject the null hypothesis of uniformity in the property of interest. Secondly, estimates of the number and locations of change-points produced by the algorithm are robust to variations in algorithm parameters and initial starting conditions and correspond to real features in the data. Thirdly, the algorithm is successfully used to segment human chromosome 1 according to GC content, thus demonstrating the feasibility of Bayesian segmentation of eukaryotic genomes. The software described in this paper is available from the author's website (www.uq.edu.au/similar to uqjkeith/) or upon request to the author.
Resumo:
Com o objetivo de verificar a existência da interação genótipo x ambiente, sob a forma de heterogeneidade de variâncias para a produção de leite na espécie bubalina e o seu impacto na avaliação genética dos animais, utilizando a inferência Bayesiana por meio de Amostrador de Gibbs, foram utilizados 5.484 registros de produção de leite referentes à produções de 2.994 búfalas predominantemente Murrah, filhas de 150 reprodutores, acasalados com 1130 matrizes, cujos partos ocorreram entre os anos de 1974 e 2004. Os registros foram provenientes do Programa de Melhoramento Genético dos Bubalinos (PROMEBUL) com a adição de registros provenientes do rebanho da EMBRAPA Amazônia Oriental -EAO, localizada em Belém, Pará. Foram estabelecidas classes de rebanho-ano de parto e de acordo com o desvio padrão de cada classe, os registros de produção de leite foram classificados em classes de alto e baixo desvio-padrão fenotípico. Posteriormente, os dados foram analisados desconsiderando e considerando as classes de desvio-padrão. O modelo utilizado empregou os efeitos fixos referentes às classes de rebanho-ano, mês de parto e covariáveis idade da fêmea ao parto e duração da lactação, além do efeito aleatório de animal, ambiente permanente e ambiente temporário. Para os efeitos fixos, foi assumido distribuição à priori uniforme e para os componentes de (co)variâncias foram assumidas distribuições priori qui-quadrado inversa e Wishart invertida. As médias observadas e desvio-padrão para produção de leite nas classes de alto e baixo desvio-padrão e em análise geral, foram iguais a 1870,21±758,78, 1900,50±587,76 e 1885,48±677,98, respectivamente. As médias posteriores para os componentes de variâncias foram maiores na classe de alto desvio-padrão. A herdabilidade obtida na classe de alto desvio-padrão foi próxima do valor observado na análise geral e inferior ao valor encontrado na classe de baixo desvio-padrão fenotípico. A correlação genética para produção de leite entre as classes de desvio-padrão foi igual a 0,58. As correlações de Spearman entre os valores genéticos para a produção de leite obtidos em análise geral com os valores obtidos nas classes de alto e baixo desvio padrão foram iguais a 0,94 e 0,93, respectivamente, para todos os reprodutores. Para uma amostra dos 10 melhores reprodutores, as mesmas correlações foram iguais a 0,94 e 0,47, respectivamente. Tais resultados revelam presença de heterogeneidade de variâncias entre rebanhos e esta heterogeneidade de variâncias é resultante de fatores ambientais, que podem levar a uma classificação errônea dos melhores reprodutores geneticamente para a produção leite.
Resumo:
A recent development of the Markov chain Monte Carlo (MCMC) technique is the emergence of MCMC samplers that allow transitions between different models. Such samplers make possible a range of computational tasks involving models, including model selection, model evaluation, model averaging and hypothesis testing. An example of this type of sampler is the reversible jump MCMC sampler, which is a generalization of the Metropolis-Hastings algorithm. Here, we present a new MCMC sampler of this type. The new sampler is a generalization of the Gibbs sampler, but somewhat surprisingly, it also turns out to encompass as particular cases all of the well-known MCMC samplers, including those of Metropolis, Barker, and Hastings. Moreover, the new sampler generalizes the reversible jump MCMC. It therefore appears to be a very general framework for MCMC sampling. This paper describes the new sampler and illustrates its use in three applications in Computational Biology, specifically determination of consensus sequences, phylogenetic inference and delineation of isochores via multiple change-point analysis.
Resumo:
Statistical modeling of traffic crashes has been of interest to researchers for decades. Over the most recent decade many crash models have accounted for extra-variation in crash counts—variation over and above that accounted for by the Poisson density. The extra-variation – or dispersion – is theorized to capture unaccounted for variation in crashes across sites. The majority of studies have assumed fixed dispersion parameters in over-dispersed crash models—tantamount to assuming that unaccounted for variation is proportional to the expected crash count. Miaou and Lord [Miaou, S.P., Lord, D., 2003. Modeling traffic crash-flow relationships for intersections: dispersion parameter, functional form, and Bayes versus empirical Bayes methods. Transport. Res. Rec. 1840, 31–40] challenged the fixed dispersion parameter assumption, and examined various dispersion parameter relationships when modeling urban signalized intersection accidents in Toronto. They suggested that further work is needed to determine the appropriateness of the findings for rural as well as other intersection types, to corroborate their findings, and to explore alternative dispersion functions. This study builds upon the work of Miaou and Lord, with exploration of additional dispersion functions, the use of an independent data set, and presents an opportunity to corroborate their findings. Data from Georgia are used in this study. A Bayesian modeling approach with non-informative priors is adopted, using sampling-based estimation via Markov Chain Monte Carlo (MCMC) and the Gibbs sampler. A total of eight model specifications were developed; four of them employed traffic flows as explanatory factors in mean structure while the remainder of them included geometric factors in addition to major and minor road traffic flows. The models were compared and contrasted using the significance of coefficients, standard deviance, chi-square goodness-of-fit, and deviance information criteria (DIC) statistics. The findings indicate that the modeling of the dispersion parameter, which essentially explains the extra-variance structure, depends greatly on how the mean structure is modeled. In the presence of a well-defined mean function, the extra-variance structure generally becomes insignificant, i.e. the variance structure is a simple function of the mean. It appears that extra-variation is a function of covariates when the mean structure (expected crash count) is poorly specified and suffers from omitted variables. In contrast, when sufficient explanatory variables are used to model the mean (expected crash count), extra-Poisson variation is not significantly related to these variables. If these results are generalizable, they suggest that model specification may be improved by testing extra-variation functions for significance. They also suggest that known influences of expected crash counts are likely to be different than factors that might help to explain unaccounted for variation in crashes across sites
Resumo:
Statisticians along with other scientists have made significant computational advances that enable the estimation of formerly complex statistical models. The Bayesian inference framework combined with Markov chain Monte Carlo estimation methods such as the Gibbs sampler enable the estimation of discrete choice models such as the multinomial logit (MNL) model. MNL models are frequently applied in transportation research to model choice outcomes such as mode, destination, or route choices or to model categorical outcomes such as crash outcomes. Recent developments allow for the modification of the potentially limiting assumptions of MNL such as the independence from irrelevant alternatives (IIA) property. However, relatively little transportation-related research has focused on Bayesian MNL models, the tractability of which is of great value to researchers and practitioners alike. This paper addresses MNL model specification issues in the Bayesian framework, such as the value of including prior information on parameters, allowing for nonlinear covariate effects, and extensions to random parameter models, so changing the usual limiting IIA assumption. This paper also provides an example that demonstrates, using route-choice data, the considerable potential of the Bayesian MNL approach with many transportation applications. This paper then concludes with a discussion of the pros and cons of this Bayesian approach and identifies when its application is worthwhile
Resumo:
The measurement error model is a well established statistical method for regression problems in medical sciences, although rarely used in ecological studies. While the situations in which it is appropriate may be less common in ecology, there are instances in which there may be benefits in its use for prediction and estimation of parameters of interest. We have chosen to explore this topic using a conditional independence model in a Bayesian framework using a Gibbs sampler, as this gives a great deal of flexibility, allowing us to analyse a number of different models without losing generality. Using simulations and two examples, we show how the conditional independence model can be used in ecology, and when it is appropriate.