988 resultados para Monte Carlo Algorithms
Resumo:
Les méthodes de Monte Carlo par chaînes de Markov (MCCM) sont des méthodes servant à échantillonner à partir de distributions de probabilité. Ces techniques se basent sur le parcours de chaînes de Markov ayant pour lois stationnaires les distributions à échantillonner. Étant donné leur facilité d’application, elles constituent une des approches les plus utilisées dans la communauté statistique, et tout particulièrement en analyse bayésienne. Ce sont des outils très populaires pour l’échantillonnage de lois de probabilité complexes et/ou en grandes dimensions. Depuis l’apparition de la première méthode MCCM en 1953 (la méthode de Metropolis, voir [10]), l’intérêt pour ces méthodes, ainsi que l’éventail d’algorithmes disponibles ne cessent de s’accroître d’une année à l’autre. Bien que l’algorithme Metropolis-Hastings (voir [8]) puisse être considéré comme l’un des algorithmes de Monte Carlo par chaînes de Markov les plus généraux, il est aussi l’un des plus simples à comprendre et à expliquer, ce qui en fait un algorithme idéal pour débuter. Il a été sujet de développement par plusieurs chercheurs. L’algorithme Metropolis à essais multiples (MTM), introduit dans la littérature statistique par [9], est considéré comme un développement intéressant dans ce domaine, mais malheureusement son implémentation est très coûteuse (en termes de temps). Récemment, un nouvel algorithme a été développé par [1]. Il s’agit de l’algorithme Metropolis à essais multiples revisité (MTM revisité), qui définit la méthode MTM standard mentionnée précédemment dans le cadre de l’algorithme Metropolis-Hastings sur un espace étendu. L’objectif de ce travail est, en premier lieu, de présenter les méthodes MCCM, et par la suite d’étudier et d’analyser les algorithmes Metropolis-Hastings ainsi que le MTM standard afin de permettre aux lecteurs une meilleure compréhension de l’implémentation de ces méthodes. Un deuxième objectif est d’étudier les perspectives ainsi que les inconvénients de l’algorithme MTM revisité afin de voir s’il répond aux attentes de la communauté statistique. Enfin, nous tentons de combattre le problème de sédentarité de l’algorithme MTM revisité, ce qui donne lieu à un tout nouvel algorithme. Ce nouvel algorithme performe bien lorsque le nombre de candidats générés à chaque itérations est petit, mais sa performance se dégrade à mesure que ce nombre de candidats croît.
Resumo:
The goal of this thesis is the acceleration of numerical calculations of QCD observables, both at leading order and next–to–leading order in the coupling constant. In particular, the optimization of helicity and spin summation in the context of VEGAS Monte Carlo algorithms is investigated. In the literature, two such methods are mentioned but without detailed analyses. Only one of these methods can be used at next–to–leading order. This work presents a total of five different methods that replace the helicity sums with a Monte Carlo integration. This integration can be combined with the existing phase space integral, in the hope that this causes less overhead than the complete summation. For three of these methods, an extension to existing subtraction terms is developed which is required to enable next–to–leading order calculations. All methods are analyzed with respect to efficiency, accuracy, and ease of implementation before they are compared with each other. In this process, one method shows clear advantages in relation to all others.
Resumo:
In this article we propose an exact efficient simulation algorithm for the generalized von Mises circular distribution of order two. It is an acceptance-rejection algorithm with a piecewise linear envelope based on the local extrema and the inflexion points of the generalized von Mises density of order two. We show that these points can be obtained from the roots of polynomials and degrees four and eight, which can be easily obtained by the methods of Ferrari and Weierstrass. A comparative study with the von Neumann acceptance-rejection, with the ratio-of-uniforms and with a Markov chain Monte Carlo algorithms shows that this new method is generally the most efficient.
Resumo:
Stochastic simulation is an important and practical technique for computing probabilities of rare events, like the payoff probability of a financial option, the probability that a queue exceeds a certain level or the probability of ruin of the insurer's risk process. Rare events occur so infrequently, that they cannot be reasonably recorded during a standard simulation procedure: specifc simulation algorithms which thwart the rarity of the event to simulate are required. An important algorithm in this context is based on changing the sampling distribution and it is called importance sampling. Optimal Monte Carlo algorithms for computing rare event probabilities are either logarithmic eficient or possess bounded relative error.
Resumo:
We introduce gradient-domain rendering for Monte Carlo image synthesis.While previous gradient-domain Metropolis Light Transport sought to distribute more samples in areas of high gradients, we show, in contrast, that estimating image gradients is also possible using standard (non-Metropolis) Monte Carlo algorithms, and furthermore, that even without changing the sample distribution, this often leads to significant error reduction. This broadens the applicability of gradient rendering considerably. To gain insight into the conditions under which gradient-domain sampling is beneficial, we present a frequency analysis that compares Monte Carlo sampling of gradients followed by Poisson reconstruction to traditional Monte Carlo sampling. Finally, we describe Gradient-Domain Path Tracing (G-PT), a relatively simple modification of the standard path tracing algorithm that can yield far superior results.
Resumo:
If a regenerative process is represented as semi-regenerative, we derive formulae enabling us to calculate basic characteristics associated with the first occurrence time starting from corresponding characteristics for the semi-regenerative process. Recursive equations, integral equations, and Monte-Carlo algorithms are proposed for practical solving of the problem.
Resumo:
MSC subject classification: 65C05, 65U05.
Resumo:
2002 Mathematics Subject Classification: 65C05
Resumo:
Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.
Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.
One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.
Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.
In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.
Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.
The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.
Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.
Resumo:
In this paper a computational implementation of an evolutionary algorithm (EA) is shown in order to tackle the problem of reconfiguring radial distribution systems. The developed module considers power quality indices such as long duration interruptions and customer process disruptions due to voltage sags, by using the Monte Carlo simulation method. Power quality costs are modeled into the mathematical problem formulation, which are added to the cost of network losses. As for the EA codification proposed, a decimal representation is used. The EA operators, namely selection, recombination and mutation, which are considered for the reconfiguration algorithm, are herein analyzed. A number of selection procedures are analyzed, namely tournament, elitism and a mixed technique using both elitism and tournament. The recombination operator was developed by considering a chromosome structure representation that maps the network branches and system radiality, and another structure that takes into account the network topology and feasibility of network operation to exchange genetic material. The topologies regarding the initial population are randomly produced so as radial configurations are produced through the Prim and Kruskal algorithms that rapidly build minimum spanning trees. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
This work aims at investigating the impact of treating breast cancer using different radiation therapy (RT) techniques – forwardly-planned intensity-modulated, f-IMRT, inversely-planned IMRT and dynamic conformal arc (DCART) RT – and their effects on the whole-breast irradiation and in the undesirable irradiation of the surrounding healthy tissues. Two algorithms of iPlan BrainLAB treatment planning system were compared: Pencil Beam Convolution (PBC) and commercial Monte Carlo (iMC). Seven left-sided breast patients submitted to breast-conserving surgery were enrolled in the study. For each patient, four RT techniques – f-IMRT, IMRT using 2-fields and 5-fields (IMRT2 and IMRT5, respectively) and DCART – were applied. The dose distributions in the planned target volume (PTV) and the dose to the organs at risk (OAR) were compared analyzing dose–volume histograms; further statistical analysis was performed using IBM SPSS v20 software. For PBC, all techniques provided adequate coverage of the PTV. However, statistically significant dose differences were observed between the techniques, in the PTV, OAR and also in the pattern of dose distribution spreading into normal tissues. IMRT5 and DCART spread low doses into greater volumes of normal tissue, right breast, right lung and heart than tangential techniques. However, IMRT5 plans improved distributions for the PTV, exhibiting better conformity and homogeneity in target and reduced high dose percentages in ipsilateral OAR. DCART did not present advantages over any of the techniques investigated. Differences were also found comparing the calculation algorithms: PBC estimated higher doses for the PTV, ipsilateral lung and heart than the iMC algorithm predicted.
Resumo:
Identification of order of an Autoregressive Moving Average Model (ARMA) by the usual graphical method is subjective. Hence, there is a need of developing a technique to identify the order without employing the graphical investigation of series autocorrelations. To avoid subjectivity, this thesis focuses on determining the order of the Autoregressive Moving Average Model using Reversible Jump Markov Chain Monte Carlo (RJMCMC). The RJMCMC selects the model from a set of the models suggested by better fitting, standard deviation errors and the frequency of accepted data. Together with deep analysis of the classical Box-Jenkins modeling methodology the integration with MCMC algorithms has been focused through parameter estimation and model fitting of ARMA models. This helps to verify how well the MCMC algorithms can treat the ARMA models, by comparing the results with graphical method. It has been seen that the MCMC produced better results than the classical time series approach.
Resumo:
This paper addresses the analysis of probabilistic corrosion time initiation in reinforced concrete structures exposed to ions chloride penetration. Structural durability is an important criterion which must be evaluated in every type of structure, especially when these structures are constructed in aggressive atmospheres. Considering reinforced concrete members, chloride diffusion process is widely used to evaluate the durability. Therefore, at modelling this phenomenon, corrosion of reinforcements can be better estimated and prevented. These processes begin when a threshold level of chlorides concentration is reached at the steel bars of reinforcements. Despite the robustness of several models proposed in the literature, deterministic approaches fail to predict accurately the corrosion time initiation due to the inherently randomness observed in this process. In this regard, the durability can be more realistically represented using probabilistic approaches. A probabilistic analysis of ions chloride penetration is presented in this paper. The ions chloride penetration is simulated using the Fick's second law of diffusion. This law represents the chloride diffusion process, considering time dependent effects. The probability of failure is calculated using Monte Carlo simulation and the First Order Reliability Method (FORM) with a direct coupling approach. Some examples are considered in order to study these phenomena and a simplified method is proposed to determine optimal values for concrete cover.
Resumo:
The inherent stochastic character of most of the physical quantities involved in engineering models has led to an always increasing interest for probabilistic analysis. Many approaches to stochastic analysis have been proposed. However, it is widely acknowledged that the only universal method available to solve accurately any kind of stochastic mechanics problem is Monte Carlo Simulation. One of the key parts in the implementation of this technique is the accurate and efficient generation of samples of the random processes and fields involved in the problem at hand. In the present thesis an original method for the simulation of homogeneous, multi-dimensional, multi-variate, non-Gaussian random fields is proposed. The algorithm has proved to be very accurate in matching both the target spectrum and the marginal probability. The computational efficiency and robustness are very good too, even when dealing with strongly non-Gaussian distributions. What is more, the resulting samples posses all the relevant, welldefined and desired properties of “translation fields”, including crossing rates and distributions of extremes. The topic of the second part of the thesis lies in the field of non-destructive parametric structural identification. Its objective is to evaluate the mechanical characteristics of constituent bars in existing truss structures, using static loads and strain measurements. In the cases of missing data and of damages that interest only a small portion of the bar, Genetic Algorithm have proved to be an effective tool to solve the problem.
Resumo:
In this thesis we present techniques that can be used to speed up the calculation of perturbative matrix elements for observables with many legs ($n = 3, 4, 5, 6, 7, ldots$). We investigate several ways to achieve this, including the use of Monte Carlo methods, the leading-color approximation, numerically less precise but faster operations, and SSE-vectorization. An important idea is the use of enquote{random polarizations} for which we derive subtraction terms for the real corrections in next-to-leading order calculations. We present the effectiveness of all these methods in the context of electron-positron scattering to $n$ jets, $n$ ranging from two to seven.