975 resultados para markov chains monte carlo methods
Resumo:
L'Anàlisi de la supervivència s'utilitza en diferents camps per analitzar el temps transcorregut entre dos esdeveniments. El que distingeix l'anàlisi de la supervivència d'altres àrees de l'estadística és que les dades normalment estan censurades. La censura en un interval apareix quan l'esdeveniment final d'interès no és directament observable i només se sap que el temps de fallada està en un interval concret. Un esquema de censura més complex encara apareix quan tant el temps inicial com el temps final estan censurats en un interval. Aquesta situació s'anomena doble censura. En aquest article donem una descripció formal d'un mètode bayesà paramètric per a l'anàlisi de dades censurades en un interval i dades doblement censurades així com unes indicacions clares de la seva utilització o pràctica. La metodologia proposada s'ilustra amb dades d'una cohort de pacients hemofílics que es varen infectar amb el virus VIH a principis dels anys 1980's.
Resumo:
In recent work we have developed a novel variational inference method for partially observed systems governed by stochastic differential equations. In this paper we provide a comparison of the Variational Gaussian Process Smoother with an exact solution computed using a Hybrid Monte Carlo approach to path sampling, applied to a stochastic double well potential model. It is demonstrated that the variational smoother provides us a very accurate estimate of mean path while conditional variance is slightly underestimated. We conclude with some remarks as to the advantages and disadvantages of the variational smoother. © 2008 Springer Science + Business Media LLC.
Resumo:
A RET network consists of a network of photo-active molecules called chromophores that can participate in inter-molecular energy transfer called resonance energy transfer (RET). RET networks are used in a variety of applications including cryptographic devices, storage systems, light harvesting complexes, biological sensors, and molecular rulers. In this dissertation, we focus on creating a RET device called closed-diffusive exciton valve (C-DEV) in which the input to output transfer function is controlled by an external energy source, similar to a semiconductor transistor like the MOSFET. Due to their biocompatibility, molecular devices like the C-DEVs can be used to introduce computing power in biological, organic, and aqueous environments such as living cells. Furthermore, the underlying physics in RET devices are stochastic in nature, making them suitable for stochastic computing in which true random distribution generation is critical.
In order to determine a valid configuration of chromophores for the C-DEV, we developed a systematic process based on user-guided design space pruning techniques and built-in simulation tools. We show that our C-DEV is 15x better than C-DEVs designed using ad hoc methods that rely on limited data from prior experiments. We also show ways in which the C-DEV can be improved further and how different varieties of C-DEVs can be combined to form more complex logic circuits. Moreover, the systematic design process can be used to search for valid chromophore network configurations for a variety of RET applications.
We also describe a feasibility study for a technique used to control the orientation of chromophores attached to DNA. Being able to control the orientation can expand the design space for RET networks because it provides another parameter to tune their collective behavior. While results showed limited control over orientation, the analysis required the development of a mathematical model that can be used to determine the distribution of dipoles in a given sample of chromophore constructs. The model can be used to evaluate the feasibility of other potential orientation control techniques.
Resumo:
Yksi keskeisimmistä tehtävistä matemaattisten mallien tilastollisessa analyysissä on mallien tuntemattomien parametrien estimointi. Tässä diplomityössä ollaan kiinnostuneita tuntemattomien parametrien jakaumista ja niiden muodostamiseen sopivista numeerisista menetelmistä, etenkin tapauksissa, joissa malli on epälineaarinen parametrien suhteen. Erilaisten numeeristen menetelmien osalta pääpaino on Markovin ketju Monte Carlo -menetelmissä (MCMC). Nämä laskentaintensiiviset menetelmät ovat viime aikoina kasvattaneet suosiotaan lähinnä kasvaneen laskentatehon vuoksi. Sekä Markovin ketjujen että Monte Carlo -simuloinnin teoriaa on esitelty työssä siinä määrin, että menetelmien toimivuus saadaan perusteltua. Viime aikoina kehitetyistä menetelmistä tarkastellaan etenkin adaptiivisia MCMC menetelmiä. Työn lähestymistapa on käytännönläheinen ja erilaisia MCMC -menetelmien toteutukseen liittyviä asioita korostetaan. Työn empiirisessä osuudessa tarkastellaan viiden esimerkkimallin tuntemattomien parametrien jakaumaa käyttäen hyväksi teoriaosassa esitettyjä menetelmiä. Mallit kuvaavat kemiallisia reaktioita ja kuvataan tavallisina differentiaaliyhtälöryhminä. Mallit on kerätty kemisteiltä Lappeenrannan teknillisestä yliopistosta ja Åbo Akademista, Turusta.
Resumo:
Multi-dimensional classification (MDC) is the supervised learning problem where an instance is associated with multiple classes, rather than with a single class, as in traditional classification problems. Since these classes are often strongly correlated, modeling the dependencies between them allows MDC methods to improve their performance – at the expense of an increased computational cost. In this paper we focus on the classifier chains (CC) approach for modeling dependencies, one of the most popular and highest-performing methods for multi-label classification (MLC), a particular case of MDC which involves only binary classes (i.e., labels). The original CC algorithm makes a greedy approximation, and is fast but tends to propagate errors along the chain. Here we present novel Monte Carlo schemes, both for finding a good chain sequence and performing efficient inference. Our algorithms remain tractable for high-dimensional data sets and obtain the best predictive performance across several real data sets.
Resumo:
Markov chain Monte Carlo (MCMC) is a methodology that is gaining widespread use in the phylogenetics community and is central to phylogenetic software packages such as MrBayes. An important issue for users of MCMC methods is how to select appropriate values for adjustable parameters such as the length of the Markov chain or chains, the sampling density, the proposal mechanism, and, if Metropolis-coupled MCMC is being used, the number of heated chains and their temperatures. Although some parameter settings have been examined in detail in the literature, others are frequently chosen with more regard to computational time or personal experience with other data sets. Such choices may lead to inadequate sampling of tree space or an inefficient use of computational resources. We performed a detailed study of convergence and mixing for 70 randomly selected, putatively orthologous protein sets with different sizes and taxonomic compositions. Replicated runs from multiple random starting points permit a more rigorous assessment of convergence, and we developed two novel statistics, delta and epsilon, for this purpose. Although likelihood values invariably stabilized quickly, adequate sampling of the posterior distribution of tree topologies took considerably longer. Our results suggest that multimodality is common for data sets with 30 or more taxa and that this results in slow convergence and mixing. However, we also found that the pragmatic approach of combining data from several short, replicated runs into a metachain to estimate bipartition posterior probabilities provided good approximations, and that such estimates were no worse in approximating a reference posterior distribution than those obtained using a single long run of the same length as the metachain. Precision appears to be best when heated Markov chains have low temperatures, whereas chains with high temperatures appear to sample trees with high posterior probabilities only rarely. [Bayesian phylogenetic inference; heating parameter; Markov chain Monte Carlo; replicated chains.]
Resumo:
We present quasi-Monte Carlo analogs of Monte Carlo methods for some linear algebra problems: solving systems of linear equations, computing extreme eigenvalues, and matrix inversion. Reformulating the problems as solving integral equations with a special kernels and domains permits us to analyze the quasi-Monte Carlo methods with bounds from numerical integration. Standard Monte Carlo methods for integration provide a convergence rate of O(N^(−1/2)) using N samples. Quasi-Monte Carlo methods use quasirandom sequences with the resulting convergence rate for numerical integration as good as O((logN)^k)N^(−1)). We have shown theoretically and through numerical tests that the use of quasirandom sequences improves both the magnitude of the error and the convergence rate of the considered Monte Carlo methods. We also analyze the complexity of considered quasi-Monte Carlo algorithms and compare them to the complexity of the analogous Monte Carlo and deterministic algorithms.
Resumo:
This thesis is concerned with the state and parameter estimation in state space models. The estimation of states and parameters is an important task when mathematical modeling is applied to many different application areas such as the global positioning systems, target tracking, navigation, brain imaging, spread of infectious diseases, biological processes, telecommunications, audio signal processing, stochastic optimal control, machine learning, and physical systems. In Bayesian settings, the estimation of states or parameters amounts to computation of the posterior probability density function. Except for a very restricted number of models, it is impossible to compute this density function in a closed form. Hence, we need approximation methods. A state estimation problem involves estimating the states (latent variables) that are not directly observed in the output of the system. In this thesis, we use the Kalman filter, extended Kalman filter, Gauss–Hermite filters, and particle filters to estimate the states based on available measurements. Among these filters, particle filters are numerical methods for approximating the filtering distributions of non-linear non-Gaussian state space models via Monte Carlo. The performance of a particle filter heavily depends on the chosen importance distribution. For instance, inappropriate choice of the importance distribution can lead to the failure of convergence of the particle filter algorithm. In this thesis, we analyze the theoretical Lᵖ particle filter convergence with general importance distributions, where p ≥2 is an integer. A parameter estimation problem is considered with inferring the model parameters from measurements. For high-dimensional complex models, estimation of parameters can be done by Markov chain Monte Carlo (MCMC) methods. In its operation, the MCMC method requires the unnormalized posterior distribution of the parameters and a proposal distribution. In this thesis, we show how the posterior density function of the parameters of a state space model can be computed by filtering based methods, where the states are integrated out. This type of computation is then applied to estimate parameters of stochastic differential equations. Furthermore, we compute the partial derivatives of the log-posterior density function and use the hybrid Monte Carlo and scaled conjugate gradient methods to infer the parameters of stochastic differential equations. The computational efficiency of MCMC methods is highly depend on the chosen proposal distribution. A commonly used proposal distribution is Gaussian. In this kind of proposal, the covariance matrix must be well tuned. To tune it, adaptive MCMC methods can be used. In this thesis, we propose a new way of updating the covariance matrix using the variational Bayesian adaptive Kalman filter algorithm.
Resumo:
This article presents a statistical method for detecting recombination in DNA sequence alignments, which is based on combining two probabilistic graphical models: (1) a taxon graph (phylogenetic tree) representing the relationship between the taxa, and (2) a site graph (hidden Markov model) representing interactions between different sites in the DNA sequence alignments. We adopt a Bayesian approach and sample the parameters of the model from the posterior distribution with Markov chain Monte Carlo, using a Metropolis-Hastings and Gibbs-within-Gibbs scheme. The proposed method is tested on various synthetic and real-world DNA sequence alignments, and we compare its performance with the established detection methods RECPARS, PLATO, and TOPAL, as well as with two alternative parameter estimation schemes.
Resumo:
Many scientific and engineering applications involve inverting large matrices or solving systems of linear algebraic equations. Solving these problems with proven algorithms for direct methods can take very long to compute, as they depend on the size of the matrix. The computational complexity of the stochastic Monte Carlo methods depends only on the number of chains and the length of those chains. The computing power needed by inherently parallel Monte Carlo methods can be satisfied very efficiently by distributed computing technologies such as Grid computing. In this paper we show how a load balanced Monte Carlo method for computing the inverse of a dense matrix can be constructed, show how the method can be implemented on the Grid, and demonstrate how efficiently the method scales on multiple processors. (C) 2007 Elsevier B.V. All rights reserved.
Resumo:
Many well-established statistical methods in genetics were developed in a climate of severe constraints on computational power. Recent advances in simulation methodology now bring modern, flexible statistical methods within the reach of scientists having access to a desktop workstation. We illustrate the potential advantages now available by considering the problem of assessing departures from Hardy-Weinberg (HW) equilibrium. Several hypothesis tests of HW have been established, as well as a variety of point estimation methods for the parameter which measures departures from HW under the inbreeding model. We propose a computational, Bayesian method for assessing departures from HW, which has a number of important advantages over existing approaches. The method incorporates the effects-of uncertainty about the nuisance parameters--the allele frequencies--as well as the boundary constraints on f (which are functions of the nuisance parameters). Results are naturally presented visually, exploiting the graphics capabilities of modern computer environments to allow straightforward interpretation. Perhaps most importantly, the method is founded on a flexible, likelihood-based modelling framework, which can incorporate the inbreeding model if appropriate, but also allows the assumptions of the model to he investigated and, if necessary, relaxed. Under appropriate conditions, information can be shared across loci and, possibly, across populations, leading to more precise estimation. The advantages of the method are illustrated by application both to simulated data and to data analysed by alternative methods in the recent literature.
Resumo:
Regulatory authorities in many countries, in order to maintain an acceptable balance between appropriate customer service qualities and costs, are introducing a performance-based regulation. These regulations impose penalties, and in some cases rewards, which introduce a component of financial risk to an electric power utility due to the uncertainty associated with preserving a specific level of system reliability. In Brazil, for instance, one of the reliability indices receiving special attention by the utilities is the Maximum Continuous Interruption Duration per customer (MCID). This paper describes a chronological Monte Carlo simulation approach to evaluate probability distributions of reliability indices, including the MCID, and the corresponding penalties. In order to get the desired efficiency, modern computational techniques are used for modeling (UML -Unified Modeling Language) as well as for programming (Object- Oriented Programming). Case studies on a simple distribution network and on real Brazilian distribution systems are presented and discussed. © Copyright KTH 2006.
Resumo:
Latent class regression models are useful tools for assessing associations between covariates and latent variables. However, evaluation of key model assumptions cannot be performed using methods from standard regression models due to the unobserved nature of latent outcome variables. This paper presents graphical diagnostic tools to evaluate whether or not latent class regression models adhere to standard assumptions of the model: conditional independence and non-differential measurement. An integral part of these methods is the use of a Markov Chain Monte Carlo estimation procedure. Unlike standard maximum likelihood implementations for latent class regression model estimation, the MCMC approach allows us to calculate posterior distributions and point estimates of any functions of parameters. It is this convenience that allows us to provide the diagnostic methods that we introduce. As a motivating example we present an analysis focusing on the association between depression and socioeconomic status, using data from the Epidemiologic Catchment Area study. We consider a latent class regression analysis investigating the association between depression and socioeconomic status measures, where the latent variable depression is regressed on education and income indicators, in addition to age, gender, and marital status variables. While the fitted latent class regression model yields interesting results, the model parameters are found to be invalid due to the violation of model assumptions. The violation of these assumptions is clearly identified by the presented diagnostic plots. These methods can be applied to standard latent class and latent class regression models, and the general principle can be extended to evaluate model assumptions in other types of models.
Resumo:
Markov chain Monte Carlo is a method of producing a correlated sample in order to estimate features of a complicated target distribution via simple ergodic averages. A fundamental question in MCMC applications is when should the sampling stop? That is, when are the ergodic averages good estimates of the desired quantities? We consider a method that stops the MCMC sampling the first time the width of a confidence interval based on the ergodic averages is less than a user-specified value. Hence calculating Monte Carlo standard errors is a critical step in assessing the output of the simulation. In particular, we consider the regenerative simulation and batch means methods of estimating the variance of the asymptotic normal distribution. We describe sufficient conditions for the strong consistency and asymptotic normality of both methods and investigate their finite sample properties in a variety of examples.