10 resultados para Markov-chain Monte Carlo
em Helda - Digital Repository of University of Helsinki
Resumo:
Genetics, the science of heredity and variation in living organisms, has a central role in medicine, in breeding crops and livestock, and in studying fundamental topics of biological sciences such as evolution and cell functioning. Currently the field of genetics is under a rapid development because of the recent advances in technologies by which molecular data can be obtained from living organisms. In order that most information from such data can be extracted, the analyses need to be carried out using statistical models that are tailored to take account of the particular genetic processes. In this thesis we formulate and analyze Bayesian models for genetic marker data of contemporary individuals. The major focus is on the modeling of the unobserved recent ancestry of the sampled individuals (say, for tens of generations or so), which is carried out by using explicit probabilistic reconstructions of the pedigree structures accompanied by the gene flows at the marker loci. For such a recent history, the recombination process is the major genetic force that shapes the genomes of the individuals, and it is included in the model by assuming that the recombination fractions between the adjacent markers are known. The posterior distribution of the unobserved history of the individuals is studied conditionally on the observed marker data by using a Markov chain Monte Carlo algorithm (MCMC). The example analyses consider estimation of the population structure, relatedness structure (both at the level of whole genomes as well as at each marker separately), and haplotype configurations. For situations where the pedigree structure is partially known, an algorithm to create an initial state for the MCMC algorithm is given. Furthermore, the thesis includes an extension of the model for the recent genetic history to situations where also a quantitative phenotype has been measured from the contemporary individuals. In that case the goal is to identify positions on the genome that affect the observed phenotypic values. This task is carried out within the Bayesian framework, where the number and the relative effects of the quantitative trait loci are treated as random variables whose posterior distribution is studied conditionally on the observed genetic and phenotypic data. In addition, the thesis contains an extension of a widely-used haplotyping method, the PHASE algorithm, to settings where genetic material from several individuals has been pooled together, and the allele frequencies of each pool are determined in a single genotyping.
Resumo:
Markov random fields (MRF) are popular in image processing applications to describe spatial dependencies between image units. Here, we take a look at the theory and the models of MRFs with an application to improve forest inventory estimates. Typically, autocorrelation between study units is a nuisance in statistical inference, but we take an advantage of the dependencies to smooth noisy measurements by borrowing information from the neighbouring units. We build a stochastic spatial model, which we estimate with a Markov chain Monte Carlo simulation method. The smooth values are validated against another data set increasing our confidence that the estimates are more accurate than the originals.
Resumo:
In this thesis the use of the Bayesian approach to statistical inference in fisheries stock assessment is studied. The work was conducted in collaboration of the Finnish Game and Fisheries Research Institute by using the problem of monitoring and prediction of the juvenile salmon population in the River Tornionjoki as an example application. The River Tornionjoki is the largest salmon river flowing into the Baltic Sea. This thesis tackles the issues of model formulation and model checking as well as computational problems related to Bayesian modelling in the context of fisheries stock assessment. Each article of the thesis provides a novel method either for extracting information from data obtained via a particular type of sampling system or for integrating the information about the fish stock from multiple sources in terms of a population dynamics model. Mark-recapture and removal sampling schemes and a random catch sampling method are covered for the estimation of the population size. In addition, a method for estimating the stock composition of a salmon catch based on DNA samples is also presented. For most of the articles, Markov chain Monte Carlo (MCMC) simulation has been used as a tool to approximate the posterior distribution. Problems arising from the sampling method are also briefly discussed and potential solutions for these problems are proposed. Special emphasis in the discussion is given to the philosophical foundation of the Bayesian approach in the context of fisheries stock assessment. It is argued that the role of subjective prior knowledge needed in practically all parts of a Bayesian model should be recognized and consequently fully utilised in the process of model formulation.
Resumo:
Elucidating the mechanisms responsible for the patterns of species abundance, diversity, and distribution within and across ecological systems is a fundamental research focus in ecology. Species abundance patterns are shaped in a convoluted way by interplays between inter-/intra-specific interactions, environmental forcing, demographic stochasticity, and dispersal. Comprehensive models and suitable inferential and computational tools for teasing out these different factors are quite limited, even though such tools are critically needed to guide the implementation of management and conservation strategies, the efficacy of which rests on a realistic evaluation of the underlying mechanisms. This is even more so in the prevailing context of concerns over climate change progress and its potential impacts on ecosystems. This thesis utilized the flexible hierarchical Bayesian modelling framework in combination with the computer intensive methods known as Markov chain Monte Carlo, to develop methodologies for identifying and evaluating the factors that control the structure and dynamics of ecological communities. These methodologies were used to analyze data from a range of taxa: macro-moths (Lepidoptera), fish, crustaceans, birds, and rodents. Environmental stochasticity emerged as the most important driver of community dynamics, followed by density dependent regulation; the influence of inter-specific interactions on community-level variances was broadly minor. This thesis contributes to the understanding of the mechanisms underlying the structure and dynamics of ecological communities, by showing directly that environmental fluctuations rather than inter-specific competition dominate the dynamics of several systems. This finding emphasizes the need to better understand how species are affected by the environment and acknowledge species differences in their responses to environmental heterogeneity, if we are to effectively model and predict their dynamics (e.g. for management and conservation purposes). The thesis also proposes a model-based approach to integrating the niche and neutral perspectives on community structure and dynamics, making it possible for the relative importance of each category of factors to be evaluated in light of field data.
Resumo:
Aerosols impact the planet and our daily lives through various effects, perhaps most notably those related to their climatic and health-related consequences. While there are several primary particle sources, secondary new particle formation from precursor vapors is also known to be a frequent, global phenomenon. Nevertheless, the formation mechanism of new particles, as well as the vapors participating in the process, remain a mystery. This thesis consists of studies on new particle formation specifically from the point of view of numerical modeling. A dependence of formation rate of 3 nm particles on the sulphuric acid concentration to the power of 1-2 has been observed. This suggests nucleation mechanism to be of first or second order with respect to the sulphuric acid concentration, in other words the mechanisms based on activation or kinetic collision of clusters. However, model studies have had difficulties in replicating the small exponents observed in nature. The work done in this thesis indicates that the exponents may be lowered by the participation of a co-condensing (and potentially nucleating) low-volatility organic vapor, or by increasing the assumed size of the critical clusters. On the other hand, the presented new and more accurate method for determining the exponent indicates high diurnal variability. Additionally, these studies included several semi-empirical nucleation rate parameterizations as well as a detailed investigation of the analysis used to determine the apparent particle formation rate. Due to their high proportion of the earth's surface area, oceans could potentially prove to be climatically significant sources of secondary particles. In the lack of marine observation data, new particle formation events in a coastal region were parameterized and studied. Since the formation mechanism is believed to be similar, the new parameterization was applied in a marine scenario. The work showed that marine CCN production is feasible in the presence of additional vapors contributing to particle growth. Finally, a new method to estimate concentrations of condensing organics was developed. The algorithm utilizes a Markov chain Monte Carlo method to determine the required combination of vapor concentrations by comparing a measured particle size distribution with one from an aerosol dynamics process model. The evaluation indicated excellent agreement against model data, and initial results with field data appear sound as well.
Resumo:
A better understanding of the limiting step in a first order phase transition, the nucleation process, is of major importance to a variety of scientific fields ranging from atmospheric sciences to nanotechnology and even to cosmology. This is due to the fact that in most phase transitions the new phase is separated from the mother phase by a free energy barrier. This barrier is crossed in a process called nucleation. Nowadays it is considered that a significant fraction of all atmospheric particles is produced by vapor-to liquid nucleation. In atmospheric sciences, as well as in other scientific fields, the theoretical treatment of nucleation is mostly based on a theory known as the Classical Nucleation Theory. However, the Classical Nucleation Theory is known to have only a limited success in predicting the rate at which vapor-to-liquid nucleation takes place at given conditions. This thesis studies the unary homogeneous vapor-to-liquid nucleation from a statistical mechanics viewpoint. We apply Monte Carlo simulations of molecular clusters to calculate the free energy barrier separating the vapor and liquid phases and compare our results against the laboratory measurements and Classical Nucleation Theory predictions. According to our results, the work of adding a monomer to a cluster in equilibrium vapour is accurately described by the liquid drop model applied by the Classical Nucleation Theory, once the clusters are larger than some threshold size. The threshold cluster sizes contain only a few or some tens of molecules depending on the interaction potential and temperature. However, the error made in modeling the smallest of clusters as liquid drops results in an erroneous absolute value for the cluster work of formation throughout the size range, as predicted by the McGraw-Laaksonen scaling law. By calculating correction factors to Classical Nucleation Theory predictions for the nucleation barriers of argon and water, we show that the corrected predictions produce nucleation rates that are in good comparison with experiments. For the smallest clusters, the deviation between the simulation results and the liquid drop values are accurately modelled by the low order virial coefficients at modest temperatures and vapour densities, or in other words, in the validity range of the non-interacting cluster theory by Frenkel, Band and Bilj. Our results do not indicate a need for a size dependent replacement free energy correction. The results also indicate that Classical Nucleation Theory predicts the size of the critical cluster correctly. We also presents a new method for the calculation of the equilibrium vapour density, surface tension size dependence and planar surface tension directly from cluster simulations. We also show how the size dependence of the cluster surface tension in equimolar surface is a function of virial coefficients, a result confirmed by our cluster simulations.
Resumo:
A better understanding of the limiting step in a first order phase transition, the nucleation process, is of major importance to a variety of scientific fields ranging from atmospheric sciences to nanotechnology and even to cosmology. This is due to the fact that in most phase transitions the new phase is separated from the mother phase by a free energy barrier. This barrier is crossed in a process called nucleation. Nowadays it is considered that a significant fraction of all atmospheric particles is produced by vapor-to liquid nucleation. In atmospheric sciences, as well as in other scientific fields, the theoretical treatment of nucleation is mostly based on a theory known as the Classical Nucleation Theory. However, the Classical Nucleation Theory is known to have only a limited success in predicting the rate at which vapor-to-liquid nucleation takes place at given conditions. This thesis studies the unary homogeneous vapor-to-liquid nucleation from a statistical mechanics viewpoint. We apply Monte Carlo simulations of molecular clusters to calculate the free energy barrier separating the vapor and liquid phases and compare our results against the laboratory measurements and Classical Nucleation Theory predictions. According to our results, the work of adding a monomer to a cluster in equilibrium vapour is accurately described by the liquid drop model applied by the Classical Nucleation Theory, once the clusters are larger than some threshold size. The threshold cluster sizes contain only a few or some tens of molecules depending on the interaction potential and temperature. However, the error made in modeling the smallest of clusters as liquid drops results in an erroneous absolute value for the cluster work of formation throughout the size range, as predicted by the McGraw-Laaksonen scaling law. By calculating correction factors to Classical Nucleation Theory predictions for the nucleation barriers of argon and water, we show that the corrected predictions produce nucleation rates that are in good comparison with experiments. For the smallest clusters, the deviation between the simulation results and the liquid drop values are accurately modelled by the low order virial coefficients at modest temperatures and vapour densities, or in other words, in the validity range of the non-interacting cluster theory by Frenkel, Band and Bilj. Our results do not indicate a need for a size dependent replacement free energy correction. The results also indicate that Classical Nucleation Theory predicts the size of the critical cluster correctly. We also presents a new method for the calculation of the equilibrium vapour density, surface tension size dependence and planar surface tension directly from cluster simulations. We also show how the size dependence of the cluster surface tension in equimolar surface is a function of virial coefficients, a result confirmed by our cluster simulations.