965 resultados para Markov-chain Monte Carlo
Resumo:
Mark Pagel, Andrew Meade (2004). A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Systematic Biology, 53(4), 571-581. RAE2008
Resumo:
Partial occlusions are commonplace in a variety of real world computer vision applications: surveillance, intelligent environments, assistive robotics, autonomous navigation, etc. While occlusion handling methods have been proposed, most methods tend to break down when confronted with numerous occluders in a scene. In this paper, a layered image-plane representation for tracking people through substantial occlusions is proposed. An image-plane representation of motion around an object is associated with a pre-computed graphical model, which can be instantiated efficiently during online tracking. A global state and observation space is obtained by linking transitions between layers. A Reversible Jump Markov Chain Monte Carlo approach is used to infer the number of people and track them online. The method outperforms two state-of-the-art methods for tracking over extended occlusions, given videos of a parking lot with numerous vehicles and a laboratory with many desks and workstations.
Resumo:
A novel method that combines shape-based object recognition and image segmentation is proposed for shape retrieval from images. Given a shape prior represented in a multi-scale curvature form, the proposed method identifies the target objects in images by grouping oversegmented image regions. The problem is formulated in a unified probabilistic framework and solved by a stochastic Markov Chain Monte Carlo (MCMC) mechanism. By this means, object segmentation and recognition are accomplished simultaneously. Within each sampling move during the simulation process,probabilistic region grouping operations are influenced by both the image information and the shape similarity constraint. The latter constraint is measured by a partial shape matching process. A generalized parallel algorithm by Barbu and Zhu,combined with a large sampling jump and other implementation improvements, greatly speeds up the overall stochastic process. The proposed method supports the segmentation and recognition of multiple occluded objects in images. Experimental results are provided for both synthetic and real images.
Resumo:
We consider the problem of variable selection in regression modeling in high-dimensional spaces where there is known structure among the covariates. This is an unconventional variable selection problem for two reasons: (1) The dimension of the covariate space is comparable, and often much larger, than the number of subjects in the study, and (2) the covariate space is highly structured, and in some cases it is desirable to incorporate this structural information in to the model building process. We approach this problem through the Bayesian variable selection framework, where we assume that the covariates lie on an undirected graph and formulate an Ising prior on the model space for incorporating structural information. Certain computational and statistical problems arise that are unique to such high-dimensional, structured settings, the most interesting being the phenomenon of phase transitions. We propose theoretical and computational schemes to mitigate these problems. We illustrate our methods on two different graph structures: the linear chain and the regular graph of degree k. Finally, we use our methods to study a specific application in genomics: the modeling of transcription factor binding sites in DNA sequences. © 2010 American Statistical Association.
Resumo:
We develop a model for stochastic processes with random marginal distributions. Our model relies on a stick-breaking construction for the marginal distribution of the process, and introduces dependence across locations by using a latent Gaussian copula model as the mechanism for selecting the atoms. The resulting latent stick-breaking process (LaSBP) induces a random partition of the index space, with points closer in space having a higher probability of being in the same cluster. We develop an efficient and straightforward Markov chain Monte Carlo (MCMC) algorithm for computation and discuss applications in financial econometrics and ecology. This article has supplementary material online.
Resumo:
We describe a strategy for Markov chain Monte Carlo analysis of non-linear, non-Gaussian state-space models involving batch analysis for inference on dynamic, latent state variables and fixed model parameters. The key innovation is a Metropolis-Hastings method for the time series of state variables based on sequential approximation of filtering and smoothing densities using normal mixtures. These mixtures are propagated through the non-linearities using an accurate, local mixture approximation method, and we use a regenerating procedure to deal with potential degeneracy of mixture components. This provides accurate, direct approximations to sequential filtering and retrospective smoothing distributions, and hence a useful construction of global Metropolis proposal distributions for simulation of posteriors for the set of states. This analysis is embedded within a Gibbs sampler to include uncertain fixed parameters. We give an example motivated by an application in systems biology. Supplemental materials provide an example based on a stochastic volatility model as well as MATLAB code.
Resumo:
Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape.
We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations.
We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites.
Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets.
This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.
Resumo:
We propose a novel unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation is to represent the pattern of links between records as a {\em bipartite} graph, in which records are directly linked to latent true individuals, and only indirectly linked to other records. This flexible new representation of the linkage structure naturally allows us to estimate the attributes of the unique observable people in the population, calculate $k$-way posterior probabilities of matches across records, and propagate the uncertainty of record linkage into later analyses. Our linkage structure lends itself to an efficient, linear-time, hybrid Markov chain Monte Carlo algorithm, which overcomes many obstacles encountered by previously proposed methods of record linkage, despite the high dimensional parameter space. We assess our results on real and simulated data.
Resumo:
The lesser sandeel Ammodytes marinus is a key species in the North Sea ecosystem, transferring energy from planktonic producers to top predators. Previous studies have shown a long-term decline in the size of 0-group sandeels in the western North Sea, but they were unable to pinpoint the mechanism (later hatching, slower growth or changes in size-dependent mortality) or cause. To investigate the first 2 possibilities we combined 2 independent time series of sandeel size, namely data from chick-feeding Atlantic puffins Fratercula arctica and from the Continuous Plankton Recorder (CPR), in a novel statistical model implemented using Markov Chain Monte Carlo (MCMC). The model estimated annual mean length on 1 July, as well as hatching date and growth rate for sandeels from 1973 to 2006. Mean length-at-date declined by 22% over this period, corresponding to a 60% decrease in energy content, with a sharper decline since 2002. Up to the mid-1990s, the decline was associated with a trend towards later hatching. Subsequently, hatching became earlier again, and the continued trend towards smaller size appears to have been driven by lower growth rates, particularly in the most recent years, although we could not rule out changes in size-dependent mortality. Our findings point to major changes in key aspects of sandeel life history, which we consider are most likely due to direct and indirect temperature-related changes over a range of biotic factors, including the seasonal distribution of copepods and intra- and inter-specific competition with planktivorous fish. The results have implications both for the many predators of sandeels and for age and size of maturation in this aggregation of North Sea sandeels.
Resumo:
Raised bog peat deposits form important archives for reconstructing past changes in climate. Precise and reliable age models are of vital importance for interpreting such archives. We propose enhanced, Markov chain Monte Carlo based methods for obtaining age models from radiocarbon-dated peat cores, based on the assumption of piecewise linear accumulation. Included are automatic choice of sections, a measure of the goodness of fit and outlier downweighting. The approach is illustrated by using a peat core from the Netherlands.
Resumo:
Some of the first results are reported from RISE - a new fast camera mounted on the Liverpool Telescope primarily designed to obtain high time-resolution light curves of transiting extrasolar planets for the purpose of transit timing. A full and partial transit of WASP-3 are presented, and a Markov-Chain Monte Carlo analysis is used to update the parameters from the discovery paper. This results in a planetary radius of 1.29(-0.12)(+0.05) R-J and therefore a density of 0.82(-0.09)(+0.14) rho(J), consistent with previous results. The inclination is 85.06(-0.15)(+0.16) deg, in agreement (but with a significant improvement in the precision) with the previously determined value. Central transit times are found to be consistent with the ephemeris given in the discovery paper; however, a new ephemeris calculated using the longer baseline results in T-c(0) = 2 454 605.55915 +/- 0.00023 HJD and P = 1.846835 +/- 0.000002 days.
Resumo:
We present nine newly observed transits of TrES-3, taken as part of a transit timing program using the RISE instrument on the Liverpool Telescope. A Markov-Chain Monte Carlo analysis was used to determine the planet star radius ratio and inclination of the system, which were found to be R-p/R-star = 0.1664(-0.0018)(+0.0011) and i = 81.73(-0.04)(+0.13), respectively, consistent with previous results. The central transit times and uncertainties were also calculated, using a residual-permutation algorithm as an independent check on the errors. A re-analysis of eight previously published TrES-3 light curves was conducted to determine the transit times and uncertainties using consistent techniques. Whilst the transit times were not found to be in agreement with a linear ephemeris, giving chi(2) = 35.07 for 15 degrees of freedom, we interpret this to be the result of systematics in the light curves rather than a real transit timing variation. This is because the light curves that show the largest deviation from a constant period either have relatively little out-of-transit coverage or have clear systematics. A new ephemeris was calculated using the transit times and was found to be T-c(0) = 2454632.62610 +/- 0.00006 HJD and P = 1.3061864 +/- 0.0000005 days. The transit times were then used to place upper mass limits as a function of the period ratio of a potential perturbing planet, showing that our data are sufficiently sensitive to have probed sub-Earth mass planets in both interior and exterior 2:1 resonances, assuming that the additional planet is in an initially circular orbit.
Resumo:
We report the discovery of WASP-10b, a new transiting extrasolar planet (ESP) discovered by the Wide Angle Search for Planets ( WASP) Consortium and confirmed using Nordic Optical Telescope FIbre-fed Echelle Spectrograph and SOPHIE radial velocity data. A 3.09-d period, 29 mmag transit depth and 2.36 h duration are derived for WASP-10b using WASP and high-precision photometric observations. Simultaneous fitting to the photometric and radial velocity data using a Markov Chain Monte Carlo procedure leads to a planet radius of 1.28R(J), a mass of 2.96M(J) and eccentricity of approximate to 0.06. WASP-10b is one of the more massive transiting ESPs, and we compare its characteristics to the current sample of transiting ESP, where there is currently little information for masses greater than approximate to 2M(J) and non-zero eccentricities. WASP-10's host star, GSC 2752-00114 (USNO-B1.0 1214-0586164) is among the fainter stars in the WASP sample, with V = 12.7 and a spectral type of K5. This result shows promise for future late-type dwarf star surveys.
Resumo:
The IntCal04 and Marine04 radiocarbon calibration curves have been updated from 12 cal kBP (cal kBP is here defined as thousands of calibrated years before AD 1950), and extended to 50 cal kBP, utilizing newly available data sets that meet the IntCal Working Group criteria for pristine corals and other carbonates and for quantification of uncertainty in both the 14C and calendar timescales as established in 2002. No change was made to the curves from 0-12 cal kBP. The curves were constructed using a Markov chain Monte Carlo (MCMC) implementation of the random walk model used for IntCal04 and Marine04. The new curves were ratified at the 20th International Radiocarbon Conference in June 2009 and are available in the Supplemental Material at www.radiocarbon.org.