950 resultados para Generalized Gibbs sampler


100.00% 100.00%



The generalized Gibbs sampler (GGS) is a recently developed Markov chain Monte Carlo (MCMC) technique that enables Gibbs-like sampling of state spaces that lack a convenient representation in terms of a fixed coordinate system. This paper describes a new sampler, called the tree sampler, which uses the GGS to sample from a state space consisting of phylogenetic trees. The tree sampler is useful for a wide range of phylogenetic applications, including Bayesian, maximum likelihood, and maximum parsimony methods. A fast new algorithm to search for a maximum parsimony phylogeny is presented, using the tree sampler in the context of simulated annealing. The mathematics underlying the algorithm is explained and its time complexity is analyzed. The method is tested on two large data sets consisting of 123 sequences and 500 sequences, respectively. The new algorithm is shown to compare very favorably in terms of speed and accuracy to the program DNAPARS from the PHYLIP package.


100.00% 100.00%



Eukaryotic genomes display segmental patterns of variation in various properties, including GC content and degree of evolutionary conservation. DNA segmentation algorithms are aimed at identifying statistically significant boundaries between such segments. Such algorithms may provide a means of discovering new classes of functional elements in eukaryotic genomes. This paper presents a model and an algorithm for Bayesian DNA segmentation and considers the feasibility of using it to segment whole eukaryotic genomes. The algorithm is tested on a range of simulated and real DNA sequences, and the following conclusions are drawn. Firstly, the algorithm correctly identifies non-segmented sequence, and can thus be used to reject the null hypothesis of uniformity in the property of interest. Secondly, estimates of the number and locations of change-points produced by the algorithm are robust to variations in algorithm parameters and initial starting conditions and correspond to real features in the data. Thirdly, the algorithm is successfully used to segment human chromosome 1 according to GC content, thus demonstrating the feasibility of Bayesian segmentation of eukaryotic genomes. The software described in this paper is available from the author's website (www.uq.edu.au/similar to uqjkeith/) or upon request to the author.


100.00% 100.00%



A recent development of the Markov chain Monte Carlo (MCMC) technique is the emergence of MCMC samplers that allow transitions between different models. Such samplers make possible a range of computational tasks involving models, including model selection, model evaluation, model averaging and hypothesis testing. An example of this type of sampler is the reversible jump MCMC sampler, which is a generalization of the Metropolis-Hastings algorithm. Here, we present a new MCMC sampler of this type. The new sampler is a generalization of the Gibbs sampler, but somewhat surprisingly, it also turns out to encompass as particular cases all of the well-known MCMC samplers, including those of Metropolis, Barker, and Hastings. Moreover, the new sampler generalizes the reversible jump MCMC. It therefore appears to be a very general framework for MCMC sampling. This paper describes the new sampler and illustrates its use in three applications in Computational Biology, specifically determination of consensus sequences, phylogenetic inference and delineation of isochores via multiple change-point analysis.


100.00% 100.00%



The local thermodynamics of a system with long-range interactions in d dimensions is studied using the mean-field approximation. Long-range interactions are introduced through pair interaction potentials that decay as a power law in the interparticle distance. We compute the local entropy, Helmholtz free energy, and grand potential per particle in the microcanonical, canonical, and grand canonical ensembles, respectively. From the local entropy per particle we obtain the local equation of state of the system by using the condition of local thermodynamic equilibrium. This local equation of state has the form of the ideal gas equation of state, but with the density depending on the potential characterizing long-range interactions. By volume integration of the relation between the different thermodynamic potentials at the local level, we find the corresponding equation satisfied by the potentials at the global level. It is shown that the potential energy enters as a thermodynamic variable that modifies the global thermodynamic potentials. As a result, we find a generalized Gibbs-Duhem equation that relates the potential energy to the temperature, pressure, and chemical potential. For the marginal case where the power of the decaying interaction potential is equal to the dimension of the space, the usual Gibbs-Duhem equation is recovered. As examples of the application of this equation, we consider spatially uniform interaction potentials and the self-gravitating gas. We also point out a close relationship with the thermodynamics of small systems.


100.00% 100.00%



There are several versions of the lognormal distribution in the statistical literature, one is based in the exponential transformation of generalized normal distribution (GN). This paper presents the Bayesian analysis for the generalized lognormal distribution (logGN) considering independent non-informative Jeffreys distributions for the parameters as well as the procedure for implementing the Gibbs sampler to obtain the posterior distributions of parameters. The results are used to analyze failure time models with right-censored and uncensored data. The proposed method is illustrated using actual failure time data of computers.


100.00% 100.00%



Generalized linear mixed models with semiparametric random effects are useful in a wide variety of Bayesian applications. When the random effects arise from a mixture of Dirichlet process (MDP) model, normal base measures and Gibbs sampling procedures based on the Pólya urn scheme are often used to simulate posterior draws. These algorithms are applicable in the conjugate case when (for a normal base measure) the likelihood is normal. In the non-conjugate case, the algorithms proposed by MacEachern and Müller (1998) and Neal (2000) are often applied to generate posterior samples. Some common problems associated with simulation algorithms for non-conjugate MDP models include convergence and mixing difficulties. This paper proposes an algorithm based on the Pólya urn scheme that extends the Gibbs sampling algorithms to non-conjugate models with normal base measures and exponential family likelihoods. The algorithm proceeds by making Laplace approximations to the likelihood function, thereby reducing the procedure to that of conjugate normal MDP models. To ensure the validity of the stationary distribution in the non-conjugate case, the proposals are accepted or rejected by a Metropolis-Hastings step. In the special case where the data are normally distributed, the algorithm is identical to the Gibbs sampler.


90.00% 90.00%



Com o objetivo de verificar a existência da interação genótipo x ambiente, sob a forma de heterogeneidade de variâncias para a produção de leite na espécie bubalina e o seu impacto na avaliação genética dos animais, utilizando a inferência Bayesiana por meio de Amostrador de Gibbs, foram utilizados 5.484 registros de produção de leite referentes à produções de 2.994 búfalas predominantemente Murrah, filhas de 150 reprodutores, acasalados com 1130 matrizes, cujos partos ocorreram entre os anos de 1974 e 2004. Os registros foram provenientes do Programa de Melhoramento Genético dos Bubalinos (PROMEBUL) com a adição de registros provenientes do rebanho da EMBRAPA Amazônia Oriental -EAO, localizada em Belém, Pará. Foram estabelecidas classes de rebanho-ano de parto e de acordo com o desvio padrão de cada classe, os registros de produção de leite foram classificados em classes de alto e baixo desvio-padrão fenotípico. Posteriormente, os dados foram analisados desconsiderando e considerando as classes de desvio-padrão. O modelo utilizado empregou os efeitos fixos referentes às classes de rebanho-ano, mês de parto e covariáveis idade da fêmea ao parto e duração da lactação, além do efeito aleatório de animal, ambiente permanente e ambiente temporário. Para os efeitos fixos, foi assumido distribuição à priori uniforme e para os componentes de (co)variâncias foram assumidas distribuições priori qui-quadrado inversa e Wishart invertida. As médias observadas e desvio-padrão para produção de leite nas classes de alto e baixo desvio-padrão e em análise geral, foram iguais a 1870,21±758,78, 1900,50±587,76 e 1885,48±677,98, respectivamente. As médias posteriores para os componentes de variâncias foram maiores na classe de alto desvio-padrão. A herdabilidade obtida na classe de alto desvio-padrão foi próxima do valor observado na análise geral e inferior ao valor encontrado na classe de baixo desvio-padrão fenotípico. A correlação genética para produção de leite entre as classes de desvio-padrão foi igual a 0,58. As correlações de Spearman entre os valores genéticos para a produção de leite obtidos em análise geral com os valores obtidos nas classes de alto e baixo desvio padrão foram iguais a 0,94 e 0,93, respectivamente, para todos os reprodutores. Para uma amostra dos 10 melhores reprodutores, as mesmas correlações foram iguais a 0,94 e 0,47, respectivamente. Tais resultados revelam presença de heterogeneidade de variâncias entre rebanhos e esta heterogeneidade de variâncias é resultante de fatores ambientais, que podem levar a uma classificação errônea dos melhores reprodutores geneticamente para a produção leite.


90.00% 90.00%



The paper investigates a Bayesian hierarchical model for the analysis of categorical longitudinal data from a large social survey of immigrants to Australia. Data for each subject are observed on three separate occasions, or waves, of the survey. One of the features of the data set is that observations for some variables are missing for at least one wave. A model for the employment status of immigrants is developed by introducing, at the first stage of a hierarchical model, a multinomial model for the response and then subsequent terms are introduced to explain wave and subject effects. To estimate the model, we use the Gibbs sampler, which allows missing data for both the response and the explanatory variables to be imputed at each iteration of the algorithm, given some appropriate prior distributions. After accounting for significant covariate effects in the model, results show that the relative probability of remaining unemployed diminished with time following arrival in Australia.


90.00% 90.00%



A class of multi-process models is developed for collections of time indexed count data. Autocorrelation in counts is achieved with dynamic models for the natural parameter of the binomial distribution. In addition to modeling binomial time series, the framework includes dynamic models for multinomial and Poisson time series. Markov chain Monte Carlo (MCMC) and Po ́lya-Gamma data augmentation (Polson et al., 2013) are critical for fitting multi-process models of counts. To facilitate computation when the counts are high, a Gaussian approximation to the P ́olya- Gamma random variable is developed.

Three applied analyses are presented to explore the utility and versatility of the framework. The first analysis develops a model for complex dynamic behavior of themes in collections of text documents. Documents are modeled as a “bag of words”, and the multinomial distribution is used to characterize uncertainty in the vocabulary terms appearing in each document. State-space models for the natural parameters of the multinomial distribution induce autocorrelation in themes and their proportional representation in the corpus over time.

The second analysis develops a dynamic mixed membership model for Poisson counts. The model is applied to a collection of time series which record neuron level firing patterns in rhesus monkeys. The monkey is exposed to two sounds simultaneously, and Gaussian processes are used to smoothly model the time-varying rate at which the neuron’s firing pattern fluctuates between features associated with each sound in isolation.

The third analysis presents a switching dynamic generalized linear model for the time-varying home run totals of professional baseball players. The model endows each player with an age specific latent natural ability class and a performance enhancing drug (PED) use indicator. As players age, they randomly transition through a sequence of ability classes in a manner consistent with traditional aging patterns. When the performance of the player significantly deviates from the expected aging pattern, he is identified as a player whose performance is consistent with PED use.

All three models provide a mechanism for sharing information across related series locally in time. The models are fit with variations on the P ́olya-Gamma Gibbs sampler, MCMC convergence diagnostics are developed, and reproducible inference is emphasized throughout the dissertation.


90.00% 90.00%



The emergence of hydrodynamic features in off-equilibrium (1 + 1)-dimensional integrable quantum systems has been the object of increasing attention in recent years. In this Master Thesis, we combine Thermodynamic Bethe Ansatz (TBA) techniques for finite-temperature quantum field theories with the Generalized Hydrodynamics (GHD) picture to provide a theoretical and numerical analysis of Zamolodchikov’s staircase model both at thermal equilibrium and in inhomogeneous generalized Gibbs ensembles. The staircase model is a diagonal (1 + 1)-dimensional integrable scattering theory with the remarkable property of roaming between infinitely many critical points when moving along a renormalization group trajectory. Namely, the finite-temperature dimensionless ground-state energy of the system approaches the central charges of all the minimal unitary conformal field theories (CFTs) M_p as the temperature varies. Within the GHD framework we develop a detailed study of the staircase model’s hydrodynamics and compare its quite surprising features to those displayed by a class of non-diagonal massless models flowing between adjacent points in the M_p series. Finally, employing both TBA and GHD techniques, we generalize to higher-spin local and quasi-local conserved charges the results obtained by B. Doyon and D. Bernard [1] for the steady-state energy current in off-equilibrium conformal field theories.


80.00% 80.00%



A common breeding strategy is to carry out basic studies to investigate the hypothesis of a single gene controlling the trait (major gene) with or without polygenes of minor effect. In this study we used Bayesian inference to fit genetic additive-dominance models of inheritance to plant breeding experiments with multiple generations. Normal densities with different means, according to the major gene genotype, were considered in a linear model in which the design matrix of the genetic effects had unknown coefficients (which were estimated in individual basis). An actual data set from an inheritance study of partenocarpy in zucchini (Cucurbita pepo L.) was used for illustration. Model fitting included posterior probabilities for all individual genotypes. Analysis agrees with results in the literature but this approach was far more efficient than previous alternatives assuming that design matrix was known for the generations. Partenocarpy in zucchini is controlled by a major gene with important additive effect and partial dominance.


80.00% 80.00%



O objetivo deste trabalho foi utilizar o método Bayesiano no ajuste do modelo de Wood a dados de produção de leite de cabras da raça Saanen. Dois grupos de animais da primeira e segunda lactação foram considerados. Amostras das distribuições marginais a posteriori dos parâmetros do modelo de Wood e das funções de produção derivadas desses parâmetros - pico de produção, tempo do pico de produção, persistência e produção total de leite - foram obtidas pelo algoritmo Gibbs Sampler. As inferências foram feitas em cada população e os resultados mostraram diferenças na taxa de decréscimo da produção após o pico e na persistência, indicando maior produção nos animais de segunda lactação. Realizou-se um estudo de simulação de dados para avaliar o método Bayesiano sob diferentes estruturas de matrizes de covariâncias dos parâmetros. Os resultados desse estudo indicam que o método é eficiente no estudo das curvas de lactação quando a matriz de covariância apresenta alta correlação dos parâmetros.


80.00% 80.00%



Les méthodes de Monte Carlo par chaînes de Markov (MCCM) sont des méthodes servant à échantillonner à partir de distributions de probabilité. Ces techniques se basent sur le parcours de chaînes de Markov ayant pour lois stationnaires les distributions à échantillonner. Étant donné leur facilité d’application, elles constituent une des approches les plus utilisées dans la communauté statistique, et tout particulièrement en analyse bayésienne. Ce sont des outils très populaires pour l’échantillonnage de lois de probabilité complexes et/ou en grandes dimensions. Depuis l’apparition de la première méthode MCCM en 1953 (la méthode de Metropolis, voir [10]), l’intérêt pour ces méthodes, ainsi que l’éventail d’algorithmes disponibles ne cessent de s’accroître d’une année à l’autre. Bien que l’algorithme Metropolis-Hastings (voir [8]) puisse être considéré comme l’un des algorithmes de Monte Carlo par chaînes de Markov les plus généraux, il est aussi l’un des plus simples à comprendre et à expliquer, ce qui en fait un algorithme idéal pour débuter. Il a été sujet de développement par plusieurs chercheurs. L’algorithme Metropolis à essais multiples (MTM), introduit dans la littérature statistique par [9], est considéré comme un développement intéressant dans ce domaine, mais malheureusement son implémentation est très coûteuse (en termes de temps). Récemment, un nouvel algorithme a été développé par [1]. Il s’agit de l’algorithme Metropolis à essais multiples revisité (MTM revisité), qui définit la méthode MTM standard mentionnée précédemment dans le cadre de l’algorithme Metropolis-Hastings sur un espace étendu. L’objectif de ce travail est, en premier lieu, de présenter les méthodes MCCM, et par la suite d’étudier et d’analyser les algorithmes Metropolis-Hastings ainsi que le MTM standard afin de permettre aux lecteurs une meilleure compréhension de l’implémentation de ces méthodes. Un deuxième objectif est d’étudier les perspectives ainsi que les inconvénients de l’algorithme MTM revisité afin de voir s’il répond aux attentes de la communauté statistique. Enfin, nous tentons de combattre le problème de sédentarité de l’algorithme MTM revisité, ce qui donne lieu à un tout nouvel algorithme. Ce nouvel algorithme performe bien lorsque le nombre de candidats générés à chaque itérations est petit, mais sa performance se dégrade à mesure que ce nombre de candidats croît.


80.00% 80.00%



We present a model of market participation in which the presence of non-negligible fixed costs leads to random censoring of the traditional double-hurdle model. Fixed costs arise when household resources must be devoted a priori to the decision to participate in the market. These costs, usually of time, are manifested in non-negligible minimum-efficient supplies and supply correspondence that requires modification of the traditional Tobit regression. The costs also complicate econometric estimation of household behavior. These complications are overcome by application of the Gibbs sampler. The algorithm thus derived provides robust estimates of the fixed-costs, double-hurdle model. The model and procedures are demonstrated in an application to milk market participation in the Ethiopian highlands.


80.00% 80.00%



Scrotal circumference data from 47,605 Nellore young bulls, measured at around 18 mo of age (SC18), were analyzed simultaneously with 27,924 heifer pregnancy (HP) and 80,831 stayability (STAY) records to estimate their additive genetic relationships. Additionally, the possibility that economically relevant traits measured directly in females could replace SC18 as a selection criterion was verified. Heifer pregnancy was defined as the observation that a heifer conceived and remained pregnant, which was assessed by rectal palpation at 60 d. Females were exposed to sires for the first time at about 14 mo of age (between 11 and 16 mo). Stayability was defined as whether or not a cow calved every year up to 5 yr of age, when the opportunity to breed was provided. A Bayesian linear-threshold-threshold analysis via Gibbs sampler was used to estimate the variance and covariance components of the multitrait model. Heritability estimates were 0.42 +/- 0.01, 0.53 +/- 0.03, and 0.10 +/- 0.01, for SC18, HP, and STAY, respectively. The genetic correlation estimates were 0.29 +/- 0.05, 0.19 +/- 0.05, and 0.64 +/- 0.07 between SC18 and HP, SC18 and STAY, and HP and STAY, respectively. The residual correlation estimate between HP and STAY was -0.08 +/- 0.03. The heritability values indicate the existence of considerable genetic variance for SC18 and HP traits. However, genetic correlations between SC18 and the female reproductive traits analyzed in the present study can only be considered moderate. The small residual correlation between HP and STAY suggests that environmental effects common to both traits are not major. The large heritability estimate for HP and the high genetic correlation between HP and STAY obtained in the present study confirm that EPD for HP can be used to select bulls for the production of precocious, fertile, and long-lived daughters. Moreover, SC18 could be incorporated in multitrait analysis to improve the prediction accuracy for HP genetic merit of young bulls.