7 resultados para State space formulation
em Duke University
Resumo:
A class of multi-process models is developed for collections of time indexed count data. Autocorrelation in counts is achieved with dynamic models for the natural parameter of the binomial distribution. In addition to modeling binomial time series, the framework includes dynamic models for multinomial and Poisson time series. Markov chain Monte Carlo (MCMC) and Po ́lya-Gamma data augmentation (Polson et al., 2013) are critical for fitting multi-process models of counts. To facilitate computation when the counts are high, a Gaussian approximation to the P ́olya- Gamma random variable is developed.
Three applied analyses are presented to explore the utility and versatility of the framework. The first analysis develops a model for complex dynamic behavior of themes in collections of text documents. Documents are modeled as a “bag of words”, and the multinomial distribution is used to characterize uncertainty in the vocabulary terms appearing in each document. State-space models for the natural parameters of the multinomial distribution induce autocorrelation in themes and their proportional representation in the corpus over time.
The second analysis develops a dynamic mixed membership model for Poisson counts. The model is applied to a collection of time series which record neuron level firing patterns in rhesus monkeys. The monkey is exposed to two sounds simultaneously, and Gaussian processes are used to smoothly model the time-varying rate at which the neuron’s firing pattern fluctuates between features associated with each sound in isolation.
The third analysis presents a switching dynamic generalized linear model for the time-varying home run totals of professional baseball players. The model endows each player with an age specific latent natural ability class and a performance enhancing drug (PED) use indicator. As players age, they randomly transition through a sequence of ability classes in a manner consistent with traditional aging patterns. When the performance of the player significantly deviates from the expected aging pattern, he is identified as a player whose performance is consistent with PED use.
All three models provide a mechanism for sharing information across related series locally in time. The models are fit with variations on the P ́olya-Gamma Gibbs sampler, MCMC convergence diagnostics are developed, and reproducible inference is emphasized throughout the dissertation.
Resumo:
The advances in three related areas of state-space modeling, sequential Bayesian learning, and decision analysis are addressed, with the statistical challenges of scalability and associated dynamic sparsity. The key theme that ties the three areas is Bayesian model emulation: solving challenging analysis/computational problems using creative model emulators. This idea defines theoretical and applied advances in non-linear, non-Gaussian state-space modeling, dynamic sparsity, decision analysis and statistical computation, across linked contexts of multivariate time series and dynamic networks studies. Examples and applications in financial time series and portfolio analysis, macroeconomics and internet studies from computational advertising demonstrate the utility of the core methodological innovations.
Chapter 1 summarizes the three areas/problems and the key idea of emulating in those areas. Chapter 2 discusses the sequential analysis of latent threshold models with use of emulating models that allows for analytical filtering to enhance the efficiency of posterior sampling. Chapter 3 examines the emulator model in decision analysis, or the synthetic model, that is equivalent to the loss function in the original minimization problem, and shows its performance in the context of sequential portfolio optimization. Chapter 4 describes the method for modeling the steaming data of counts observed on a large network that relies on emulating the whole, dependent network model by independent, conjugate sub-models customized to each set of flow. Chapter 5 reviews those advances and makes the concluding remarks.
Resumo:
Multi-output Gaussian processes provide a convenient framework for multi-task problems. An illustrative and motivating example of a multi-task problem is multi-region electrophysiological time-series data, where experimentalists are interested in both power and phase coherence between channels. Recently, the spectral mixture (SM) kernel was proposed to model the spectral density of a single task in a Gaussian process framework. This work develops a novel covariance kernel for multiple outputs, called the cross-spectral mixture (CSM) kernel. This new, flexible kernel represents both the power and phase relationship between multiple observation channels. The expressive capabilities of the CSM kernel are demonstrated through implementation of 1) a Bayesian hidden Markov model, where the emission distribution is a multi-output Gaussian process with a CSM covariance kernel, and 2) a Gaussian process factor analysis model, where factor scores represent the utilization of cross-spectral neural circuits. Results are presented for measured multi-region electrophysiological data.
Resumo:
This dissertation examined the response to termination of CO2 enrichment of a forest ecosystem exposed to long-term elevated atmospheric CO2 condition, and aimed at investigating responses and their underlying mechanisms of two important factors of carbon cycle in the ecosystem, stomatal conductance and soil respiration. Because the contribution of understory vegetation to the entire ecosystem grew with time, we first investigated the effect of elevated CO2 on understory vegetation. Potential growth enhancing effect of elevated CO2 were not observed, and light seemed to be a limiting factor. Secondly, we examined the importance of aerodynamic conductance to determine canopy conductance, and found that its effect can be negligible. Responses of stomatal conductance and soil respiration were assessed using Bayesian state space model. In two years after the termination of CO2 enrichment, stomatal conductance in formerly elevated CO2 returned to ambient level, while soil respiration became smaller than ambient level and did not recovered to ambient in two years.
Resumo:
Free energy calculations are a computational method for determining thermodynamic quantities, such as free energies of binding, via simulation.
Currently, due to computational and algorithmic limitations, free energy calculations are limited in scope.
In this work, we propose two methods for improving the efficiency of free energy calculations.
First, we expand the state space of alchemical intermediates, and show that this expansion enables us to calculate free energies along lower variance paths.
We use Q-learning, a reinforcement learning technique, to discover and optimize paths at low computational cost.
Second, we reduce the cost of sampling along a given path by using sequential Monte Carlo samplers.
We develop a new free energy estimator, pCrooks (pairwise Crooks), a variant on the Crooks fluctuation theorem (CFT), which enables decomposition of the variance of the free energy estimate for discrete paths, while retaining beneficial characteristics of CFT.
Combining these two advancements, we show that for some test models, optimal expanded-space paths have a nearly 80% reduction in variance relative to the standard path.
Additionally, our free energy estimator converges at a more consistent rate and on average 1.8 times faster when we enable path searching, even when the cost of path discovery and refinement is considered.
Resumo:
Recent research into resting-state functional magnetic resonance imaging (fMRI) has shown that the brain is very active during rest. This thesis work utilizes blood oxygenation level dependent (BOLD) signals to investigate the spatial and temporal functional network information found within resting-state data, and aims to investigate the feasibility of extracting functional connectivity networks using different methods as well as the dynamic variability within some of the methods. Furthermore, this work looks into producing valid networks using a sparsely-sampled sub-set of the original data.
In this work we utilize four main methods: independent component analysis (ICA), principal component analysis (PCA), correlation, and a point-processing technique. Each method comes with unique assumptions, as well as strengths and limitations into exploring how the resting state components interact in space and time.
Correlation is perhaps the simplest technique. Using this technique, resting-state patterns can be identified based on how similar the time profile is to a seed region’s time profile. However, this method requires a seed region and can only identify one resting state network at a time. This simple correlation technique is able to reproduce the resting state network using subject data from one subject’s scan session as well as with 16 subjects.
Independent component analysis, the second technique, has established software programs that can be used to implement this technique. ICA can extract multiple components from a data set in a single analysis. The disadvantage is that the resting state networks it produces are all independent of each other, making the assumption that the spatial pattern of functional connectivity is the same across all the time points. ICA is successfully able to reproduce resting state connectivity patterns for both one subject and a 16 subject concatenated data set.
Using principal component analysis, the dimensionality of the data is compressed to find the directions in which the variance of the data is most significant. This method utilizes the same basic matrix math as ICA with a few important differences that will be outlined later in this text. Using this method, sometimes different functional connectivity patterns are identifiable but with a large amount of noise and variability.
To begin to investigate the dynamics of the functional connectivity, the correlation technique is used to compare the first and second halves of a scan session. Minor differences are discernable between the correlation results of the scan session halves. Further, a sliding window technique is implemented to study the correlation coefficients through different sizes of correlation windows throughout time. From this technique it is apparent that the correlation level with the seed region is not static throughout the scan length.
The last method introduced, a point processing method, is one of the more novel techniques because it does not require analysis of the continuous time points. Here, network information is extracted based on brief occurrences of high or low amplitude signals within a seed region. Because point processing utilizes less time points from the data, the statistical power of the results is lower. There are also larger variations in DMN patterns between subjects. In addition to boosted computational efficiency, the benefit of using a point-process method is that the patterns produced for different seed regions do not have to be independent of one another.
This work compares four unique methods of identifying functional connectivity patterns. ICA is a technique that is currently used by many scientists studying functional connectivity patterns. The PCA technique is not optimal for the level of noise and the distribution of the data sets. The correlation technique is simple and obtains good results, however a seed region is needed and the method assumes that the DMN regions is correlated throughout the entire scan. Looking at the more dynamic aspects of correlation changing patterns of correlation were evident. The last point-processing method produces a promising results of identifying functional connectivity networks using only low and high amplitude BOLD signals.
Resumo:
Electrostatic interaction is a strong force that attracts positively and negatively charged molecules to each other. Such an interaction is formed between positively charged polycationic polymers and negatively charged nucleic acids. In this dissertation, the electrostatic attraction between polycationic polymers and nucleic acids is exploited for applications in oral gene delivery and nucleic acid scavenging. An enhanced nanoparticle for oral gene delivery of a human Factor IX (hFIX) plasmid is developed using the polycationic polysaccharide, chitosan (Ch), in combination with protamine sulfate (PS) to treat hemophilia B. For nucleic acid scavenging purposes, the development of an effective nucleic acid scavenging nanofiber platform is described for dampening hyper-inflammation and reducing the formation of biofilms.
Non-viral gene therapy may be an attractive alternative to chronic protein replacement therapy. Orally administered non-viral gene vectors have been investigated for more than one decade with little progress made beyond the initial studies. Oral administration has many benefits over intravenous injection including patient compliance and overall cost; however, effective oral gene delivery systems remain elusive. To date, only chitosan carriers have demonstrated successful oral gene delivery due to chitosan’s stability via the oral route. In this study, we increase the transfection efficiency of the chitosan gene carrier by adding protamine sulfate to the nanoparticle formulation. The addition of protamine sulfate to the chitosan nanoparticles results in up to 42x higher in vitro transfection efficiency than chitosan nanoparticles without protamine sulfate. Therapeutic levels of hFIX protein are detected after oral delivery of Ch/PS/phFIX nanoparticles in 5/12 mice in vivo, ranging from 3 -132 ng/mL, as compared to levels below 4 ng/mL in 1/12 mice given Ch/phFIX nanoparticles. These results indicate the protamine sulfate enhances the transfection efficiency of chitosan and should be considered as an effective ternary component for applications in oral gene delivery.
Dying cells release nucleic acids (NA) and NA-complexes that activate the inflammatory pathways of immune cells. Sustained activation of these pathways contributes to chronic inflammation related to autoimmune diseases including systemic lupus erythematosus, rheumatoid arthritis, and inflammatory bowel disease. Studies have shown that certain soluble, cationic polymers can scavenge extracellular nucleic acids and inhibit RNA-and DNA-mediated activation of Toll-like receptors (TLRs) and inflammation. In this study, the cationic polymers are incorporated onto insoluble nanofibers, enabling local scavenging of negatively charged pro-inflammatory species such as damage-associated molecular pattern (DAMP) molecules in the extracellular space, reducing cytotoxicity related to unwanted internalization of soluble cationic polymers. In vitro data show that electrospun nanofibers grafted with cationic polymers, termed nucleic acid scavenging nanofibers (NASFs), can scavenge nucleic acid-based agonists of TLR 3 and TLR 9 directly from serum and prevent the production of NF-ĸB, an immune system activating transcription factor while also demonstrating low cytotoxicity. NASFs formed from poly (styrene-alt-maleic anhydride) conjugated with 1.8 kDa branched polyethylenimine (bPEI) resulted in randomly aligned fibers with diameters of 486±9 nm. NASFs effectively eliminate the immune stimulating response of NA based agonists CpG (TLR 9) and poly (I:C) (TLR 3) while not affecting the activation caused by the non-nucleic acid TLR agonist pam3CSK4. Results in a more biologically relevant context of doxorubicin-induced cell death in RAW cells demonstrates that NASFs block ~25-40% of NF-ĸβ response in Ramos-Blue cells treated with RAW extracellular debris, ie DAMPs, following doxorubicin treatment. Together, these data demonstrate that the formation of cationic NASFs by a simple, replicable, modular technique is effective and that such NASFs are capable of modulating localized inflammatory responses.
An understandable way to clinically apply the NASF is as a wound bandage. Chronic wounds are a serious clinical problem that is attributed to an extended period of inflammation as well as the presence of biofilms. An NASF bandage can potentially have two benefits in the treatment of chronic wounds by reducing the inflammation and preventing biofilm formation. NASF can prevent biofilm formation by reducing the NA present in the wound bed, therefore removing large components of what the bacteria use to develop their biofilm matrix, the extracellular polymeric substance, without which the biofilm cannot develop. The NASF described above is used to show the effect of the nucleic acid scavenging technology on in vitro and in vivo biofilm formation of P. aeruginosa, S. aureus, and S. epidermidis biofilms. The in vitro studies demonstrated that the NASFs were able to significantly reduce the biofilm formation in all three bacterial strains. In vivo studies of the NASF on mouse wounds infected with biofilm show that the NASF retain their functionality and are able to scavenge DNA, RNA, and protein from the wound bed. The NASF remove DNA that are maintaining the inflammatory state of the open wound and contributing to the extracellular polymeric substance (EPS), such as mtDNA, and also removing proteins that are required for bacteria/biofilm formation and maintenance such as chaperonin, ribosomal proteins, succinyl CoA-ligase, and polymerases. However, the NASF are not successful at decreasing the wound healing time because their repeated application and removal disrupts the wound bed and removes proteins required for wound healing such as fibronectin, vibronectin, keratin, and plasminogen. Further optimization of NASF treatment duration and potential combination treatments should be tested to reduce the unwanted side effects of increased wound healing time.