12 resultados para State-space models

em Duke University


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We describe a strategy for Markov chain Monte Carlo analysis of non-linear, non-Gaussian state-space models involving batch analysis for inference on dynamic, latent state variables and fixed model parameters. The key innovation is a Metropolis-Hastings method for the time series of state variables based on sequential approximation of filtering and smoothing densities using normal mixtures. These mixtures are propagated through the non-linearities using an accurate, local mixture approximation method, and we use a regenerating procedure to deal with potential degeneracy of mixture components. This provides accurate, direct approximations to sequential filtering and retrospective smoothing distributions, and hence a useful construction of global Metropolis proposal distributions for simulation of posteriors for the set of states. This analysis is embedded within a Gibbs sampler to include uncertain fixed parameters. We give an example motivated by an application in systems biology. Supplemental materials provide an example based on a stochastic volatility model as well as MATLAB code.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape.

We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations.

We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites.

Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets.

This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A class of multi-process models is developed for collections of time indexed count data. Autocorrelation in counts is achieved with dynamic models for the natural parameter of the binomial distribution. In addition to modeling binomial time series, the framework includes dynamic models for multinomial and Poisson time series. Markov chain Monte Carlo (MCMC) and Po ́lya-Gamma data augmentation (Polson et al., 2013) are critical for fitting multi-process models of counts. To facilitate computation when the counts are high, a Gaussian approximation to the P ́olya- Gamma random variable is developed.

Three applied analyses are presented to explore the utility and versatility of the framework. The first analysis develops a model for complex dynamic behavior of themes in collections of text documents. Documents are modeled as a “bag of words”, and the multinomial distribution is used to characterize uncertainty in the vocabulary terms appearing in each document. State-space models for the natural parameters of the multinomial distribution induce autocorrelation in themes and their proportional representation in the corpus over time.

The second analysis develops a dynamic mixed membership model for Poisson counts. The model is applied to a collection of time series which record neuron level firing patterns in rhesus monkeys. The monkey is exposed to two sounds simultaneously, and Gaussian processes are used to smoothly model the time-varying rate at which the neuron’s firing pattern fluctuates between features associated with each sound in isolation.

The third analysis presents a switching dynamic generalized linear model for the time-varying home run totals of professional baseball players. The model endows each player with an age specific latent natural ability class and a performance enhancing drug (PED) use indicator. As players age, they randomly transition through a sequence of ability classes in a manner consistent with traditional aging patterns. When the performance of the player significantly deviates from the expected aging pattern, he is identified as a player whose performance is consistent with PED use.

All three models provide a mechanism for sharing information across related series locally in time. The models are fit with variations on the P ́olya-Gamma Gibbs sampler, MCMC convergence diagnostics are developed, and reproducible inference is emphasized throughout the dissertation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multi-output Gaussian processes provide a convenient framework for multi-task problems. An illustrative and motivating example of a multi-task problem is multi-region electrophysiological time-series data, where experimentalists are interested in both power and phase coherence between channels. Recently, the spectral mixture (SM) kernel was proposed to model the spectral density of a single task in a Gaussian process framework. This work develops a novel covariance kernel for multiple outputs, called the cross-spectral mixture (CSM) kernel. This new, flexible kernel represents both the power and phase relationship between multiple observation channels. The expressive capabilities of the CSM kernel are demonstrated through implementation of 1) a Bayesian hidden Markov model, where the emission distribution is a multi-output Gaussian process with a CSM covariance kernel, and 2) a Gaussian process factor analysis model, where factor scores represent the utilization of cross-spectral neural circuits. Results are presented for measured multi-region electrophysiological data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The problem of social diffusion has animated sociological thinking on topics ranging from the spread of an idea, an innovation or a disease, to the foundations of collective behavior and political polarization. While network diffusion has been a productive metaphor, the reality of diffusion processes is often muddier. Ideas and innovations diffuse differently from diseases, but, with a few exceptions, the diffusion of ideas and innovations has been modeled under the same assumptions as the diffusion of disease. In this dissertation, I develop two new diffusion models for "socially meaningful" contagions that address two of the most significant problems with current diffusion models: (1) that contagions can only spread along observed ties, and (2) that contagions do not change as they spread between people. I augment insights from these statistical and simulation models with an analysis of an empirical case of diffusion - the use of enterprise collaboration software in a large technology company. I focus the empirical study on when people abandon innovations, a crucial, and understudied aspect of the diffusion of innovations. Using timestamped posts, I analyze when people abandon software to a high degree of detail.

To address the first problem, I suggest a latent space diffusion model. Rather than treating ties as stable conduits for information, the latent space diffusion model treats ties as random draws from an underlying social space, and simulates diffusion over the social space. Theoretically, the social space model integrates both actor ties and attributes simultaneously in a single social plane, while incorporating schemas into diffusion processes gives an explicit form to the reciprocal influences that cognition and social environment have on each other. Practically, the latent space diffusion model produces statistically consistent diffusion estimates where using the network alone does not, and the diffusion with schemas model shows that introducing some cognitive processing into diffusion processes changes the rate and ultimate distribution of the spreading information. To address the second problem, I suggest a diffusion model with schemas. Rather than treating information as though it is spread without changes, the schema diffusion model allows people to modify information they receive to fit an underlying mental model of the information before they pass the information to others. Combining the latent space models with a schema notion for actors improves our models for social diffusion both theoretically and practically.

The empirical case study focuses on how the changing value of an innovation, introduced by the innovations' network externalities, influences when people abandon the innovation. In it, I find that people are least likely to abandon an innovation when other people in their neighborhood currently use the software as well. The effect is particularly pronounced for supervisors' current use and number of supervisory team members who currently use the software. This case study not only points to an important process in the diffusion of innovation, but also suggests a new approach -- computerized collaboration systems -- to collecting and analyzing data on organizational processes.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper focuses on the nature of jamming, as seen in two-dimensional frictional granular systems consisting of photoelastic particles. The photoelastic technique is unique at this time, in its capability to provide detailed particle-scale information on forces and kinematic quantities such as particle displacements and rotations. These experiments first explore isotropic stress states near point J through measurements of the mean contact number per particle, Z, and the pressure, P as functions of the packing fraction, . In this case, the experiments show some but not all aspects of jamming, as expected on the basis of simulations and models that typically assume conservative, hence frictionless, forces between particles. Specifically, there is a rapid growth in Z, at a reasonable which we identify with as c. It is possible to fit Z and P, to power law expressions in - c above c, and to obtain exponents that are in agreement with simulations and models. However, the experiments differ from theory on several points, as typified by the rounding that is observed in Z and P near c. The application of shear to these same 2D granular systems leads to phenomena that are qualitatively different from the standard picture of jamming. In particular, there is a range of packing fractions below c, where the application of shear strain at constant leads to jammed stress-anisotropic states, i.e. they have a non-zero shear stress, τ. The application of shear strain to an initially isotropically compressed (hence jammed) state, does not lead to an unjammed state per se. Rather, shear strain at constant first leads to an increase of both τ and P. Additional strain leads to a succession of jammed states interspersed with relatively localized failures of the force network leading to other stress-anisotropic states that are jammed at typically somewhat lower stress. The locus of jammed states requires a state space that involves not only and τ, but also P. P, τ, and Z are all hysteretic functions of shear strain for fixed . However, we find that both P and τ are roughly linear functions of Z for strains large enough to jam the system. This implies that these shear-jammed states satisfy a Coulomb like-relation, τ = μP. © 2010 The Royal Society of Chemistry.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The advances in three related areas of state-space modeling, sequential Bayesian learning, and decision analysis are addressed, with the statistical challenges of scalability and associated dynamic sparsity. The key theme that ties the three areas is Bayesian model emulation: solving challenging analysis/computational problems using creative model emulators. This idea defines theoretical and applied advances in non-linear, non-Gaussian state-space modeling, dynamic sparsity, decision analysis and statistical computation, across linked contexts of multivariate time series and dynamic networks studies. Examples and applications in financial time series and portfolio analysis, macroeconomics and internet studies from computational advertising demonstrate the utility of the core methodological innovations.

Chapter 1 summarizes the three areas/problems and the key idea of emulating in those areas. Chapter 2 discusses the sequential analysis of latent threshold models with use of emulating models that allows for analytical filtering to enhance the efficiency of posterior sampling. Chapter 3 examines the emulator model in decision analysis, or the synthetic model, that is equivalent to the loss function in the original minimization problem, and shows its performance in the context of sequential portfolio optimization. Chapter 4 describes the method for modeling the steaming data of counts observed on a large network that relies on emulating the whole, dependent network model by independent, conjugate sub-models customized to each set of flow. Chapter 5 reviews those advances and makes the concluding remarks.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Free energy calculations are a computational method for determining thermodynamic quantities, such as free energies of binding, via simulation.

Currently, due to computational and algorithmic limitations, free energy calculations are limited in scope.

In this work, we propose two methods for improving the efficiency of free energy calculations.

First, we expand the state space of alchemical intermediates, and show that this expansion enables us to calculate free energies along lower variance paths.

We use Q-learning, a reinforcement learning technique, to discover and optimize paths at low computational cost.

Second, we reduce the cost of sampling along a given path by using sequential Monte Carlo samplers.

We develop a new free energy estimator, pCrooks (pairwise Crooks), a variant on the Crooks fluctuation theorem (CFT), which enables decomposition of the variance of the free energy estimate for discrete paths, while retaining beneficial characteristics of CFT.

Combining these two advancements, we show that for some test models, optimal expanded-space paths have a nearly 80% reduction in variance relative to the standard path.

Additionally, our free energy estimator converges at a more consistent rate and on average 1.8 times faster when we enable path searching, even when the cost of path discovery and refinement is considered.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Numerical approximation of the long time behavior of a stochastic di.erential equation (SDE) is considered. Error estimates for time-averaging estimators are obtained and then used to show that the stationary behavior of the numerical method converges to that of the SDE. The error analysis is based on using an associated Poisson equation for the underlying SDE. The main advantages of this approach are its simplicity and universality. It works equally well for a range of explicit and implicit schemes, including those with simple simulation of random variables, and for hypoelliptic SDEs. To simplify the exposition, we consider only the case where the state space of the SDE is a torus, and we study only smooth test functions. However, we anticipate that the approach can be applied more widely. An analogy between our approach and Stein's method is indicated. Some practical implications of the results are discussed. Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This dissertation examined the response to termination of CO2 enrichment of a forest ecosystem exposed to long-term elevated atmospheric CO2 condition, and aimed at investigating responses and their underlying mechanisms of two important factors of carbon cycle in the ecosystem, stomatal conductance and soil respiration. Because the contribution of understory vegetation to the entire ecosystem grew with time, we first investigated the effect of elevated CO2 on understory vegetation. Potential growth enhancing effect of elevated CO2 were not observed, and light seemed to be a limiting factor. Secondly, we examined the importance of aerodynamic conductance to determine canopy conductance, and found that its effect can be negligible. Responses of stomatal conductance and soil respiration were assessed using Bayesian state space model. In two years after the termination of CO2 enrichment, stomatal conductance in formerly elevated CO2 returned to ambient level, while soil respiration became smaller than ambient level and did not recovered to ambient in two years.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A new modality for preventing HIV transmission is emerging in the form of topical microbicides. Some clinical trials have shown some promising results of these methods of protection while other trials have failed to show efficacy. Due to the relatively novel nature of microbicide drug transport, a rigorous, deterministic analysis of that transport can help improve the design of microbicide vehicles and understand results from clinical trials. This type of analysis can aid microbicide product design by helping understand and organize the determinants of drug transport and the potential efficacies of candidate microbicide products.

Microbicide drug transport is modeled as a diffusion process with convection and reaction effects in appropriate compartments. This is applied here to vaginal gels and rings and a rectal enema, all delivering the microbicide drug Tenofovir. Although the focus here is on Tenofovir, the methods established in this dissertation can readily be adapted to other drugs, given knowledge of their physical and chemical properties, such as the diffusion coefficient, partition coefficient, and reaction kinetics. Other dosage forms such as tablets and fiber meshes can also be modeled using the perspective and methods developed here.

The analyses here include convective details of intravaginal flows by both ambient fluid and spreading gels with different rheological properties and applied volumes. These are input to the overall conservation equations for drug mass transport in different compartments. The results are Tenofovir concentration distributions in time and space for a variety of microbicide products and conditions. The Tenofovir concentrations in the vaginal and rectal mucosal stroma are converted, via a coupled reaction equation, to concentrations of Tenofovir diphosphate, which is the active form of the drug that functions as a reverse transcriptase inhibitor against HIV. Key model outputs are related to concentrations measured in experimental pharmacokinetic (PK) studies, e.g. concentrations in biopsies and blood. A new measure of microbicide prophylactic functionality, the Percent Protected, is calculated. This is the time dependent volume of the entire stroma (and thus fraction of host cells therein) in which Tenofovir diphosphate concentrations equal or exceed a target prophylactic value, e.g. an EC50.

Results show the prophylactic potentials of the studied microbicide vehicles against HIV infections. Key design parameters for each are addressed in application of the models. For a vaginal gel, fast spreading at small volume is more effective than slower spreading at high volume. Vaginal rings are shown to be most effective if inserted and retained as close to the fornix as possible. Because of the long half-life of Tenofovir diphosphate, temporary removal of the vaginal ring (after achieving steady state) for up to 24h does not appreciably diminish Percent Protected. However, full steady state (for the entire stromal volume) is not achieved until several days after ring insertion. Delivery of Tenofovir to the rectal mucosa by an enema is dominated by surface area of coated mucosa and whether the interiors of rectal crypts are filled with the enema fluid. For the enema 100% Percent Protected is achieved much more rapidly than for vaginal products, primarily because of the much thinner epithelial layer of the mucosa. For example, 100% Percent Protected can be achieved with a one minute enema application, and 15 minute wait time.

Results of these models have good agreement with experimental pharmacokinetic data, in animals and clinical trials. They also improve upon traditional, empirical PK modeling, and this is illustrated here. Our deterministic approach can inform design of sampling in clinical trials by indicating time periods during which significant changes in drug concentrations occur in different compartments. More fundamentally, the work here helps delineate the determinants of microbicide drug delivery. This information can be the key to improved, rational design of microbicide products and their dosage regimens.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Light rainfall is the baseline input to the annual water budget in mountainous landscapes through the tropics and at mid-latitudes. In the Southern Appalachians, the contribution from light rainfall ranges from 50-60% during wet years to 80-90% during dry years, with convective activity and tropical cyclone input providing most of the interannual variability. The Southern Appalachians is a region characterized by rich biodiversity that is vulnerable to land use/land cover changes due to its proximity to a rapidly growing population. Persistent near surface moisture and associated microclimates observed in this region has been well documented since the colonization of the area in terms of species health, fire frequency, and overall biodiversity. The overarching objective of this research is to elucidate the microphysics of light rainfall and the dynamics of low level moisture in the inner region of the Southern Appalachians during the warm season, with a focus on orographically mediated processes. The overarching research hypothesis is that physical processes leading to and governing the life cycle of orographic fog, low level clouds, and precipitation, and their interactions, are strongly tied to landform, land cover, and the diurnal cycles of flow patterns, radiative forcing, and surface fluxes at the ridge-valley scale. The following science questions will be addressed specifically: 1) How do orographic clouds and fog affect the hydrometeorological regime from event to annual scale and as a function of terrain characteristics and land cover?; 2) What are the source areas, governing processes, and relevant time-scales of near surface moisture convergence patterns in the region?; and 3) What are the four dimensional microphysical and dynamical characteristics, including variability and controlling factors and processes, of fog and light rainfall? The research was conducted with two major components: 1) ground-based high-quality observations using multi-sensor platforms and 2) interpretive numerical modeling guided by the analysis of the in situ data collection. Findings illuminate a high level of spatial – down to the ridge scale - and temporal – from event to annual scale - heterogeneity in observations, and a significant impact on the hydrological regime as a result of seeder-feeder interactions among fog, low level clouds, and stratiform rainfall that enhance coalescence efficiency and lead to significantly higher rainfall rates at the land surface. Specifically, results show that enhancement of an event up to one order of magnitude in short-term accumulation can occur as a result of concurrent fog presence. Results also show that events are modulated strongly by terrain characteristics including elevation, slope, geometry, and land cover. These factors produce interactions between highly localized flows and gradients of temperature and moisture with larger scale circulations. Resulting observations of DSD and rainfall patterns are stratified by region and altitude and exhibit clear diurnal and seasonal cycles.