8 resultados para VECTOR SPACE MODEL
em Duke University
Resumo:
The problem of social diffusion has animated sociological thinking on topics ranging from the spread of an idea, an innovation or a disease, to the foundations of collective behavior and political polarization. While network diffusion has been a productive metaphor, the reality of diffusion processes is often muddier. Ideas and innovations diffuse differently from diseases, but, with a few exceptions, the diffusion of ideas and innovations has been modeled under the same assumptions as the diffusion of disease. In this dissertation, I develop two new diffusion models for "socially meaningful" contagions that address two of the most significant problems with current diffusion models: (1) that contagions can only spread along observed ties, and (2) that contagions do not change as they spread between people. I augment insights from these statistical and simulation models with an analysis of an empirical case of diffusion - the use of enterprise collaboration software in a large technology company. I focus the empirical study on when people abandon innovations, a crucial, and understudied aspect of the diffusion of innovations. Using timestamped posts, I analyze when people abandon software to a high degree of detail.
To address the first problem, I suggest a latent space diffusion model. Rather than treating ties as stable conduits for information, the latent space diffusion model treats ties as random draws from an underlying social space, and simulates diffusion over the social space. Theoretically, the social space model integrates both actor ties and attributes simultaneously in a single social plane, while incorporating schemas into diffusion processes gives an explicit form to the reciprocal influences that cognition and social environment have on each other. Practically, the latent space diffusion model produces statistically consistent diffusion estimates where using the network alone does not, and the diffusion with schemas model shows that introducing some cognitive processing into diffusion processes changes the rate and ultimate distribution of the spreading information. To address the second problem, I suggest a diffusion model with schemas. Rather than treating information as though it is spread without changes, the schema diffusion model allows people to modify information they receive to fit an underlying mental model of the information before they pass the information to others. Combining the latent space models with a schema notion for actors improves our models for social diffusion both theoretically and practically.
The empirical case study focuses on how the changing value of an innovation, introduced by the innovations' network externalities, influences when people abandon the innovation. In it, I find that people are least likely to abandon an innovation when other people in their neighborhood currently use the software as well. The effect is particularly pronounced for supervisors' current use and number of supervisory team members who currently use the software. This case study not only points to an important process in the diffusion of innovation, but also suggests a new approach -- computerized collaboration systems -- to collecting and analyzing data on organizational processes.
Resumo:
Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape.
We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations.
We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites.
Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets.
This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.
Resumo:
This dissertation examined the response to termination of CO2 enrichment of a forest ecosystem exposed to long-term elevated atmospheric CO2 condition, and aimed at investigating responses and their underlying mechanisms of two important factors of carbon cycle in the ecosystem, stomatal conductance and soil respiration. Because the contribution of understory vegetation to the entire ecosystem grew with time, we first investigated the effect of elevated CO2 on understory vegetation. Potential growth enhancing effect of elevated CO2 were not observed, and light seemed to be a limiting factor. Secondly, we examined the importance of aerodynamic conductance to determine canopy conductance, and found that its effect can be negligible. Responses of stomatal conductance and soil respiration were assessed using Bayesian state space model. In two years after the termination of CO2 enrichment, stomatal conductance in formerly elevated CO2 returned to ambient level, while soil respiration became smaller than ambient level and did not recovered to ambient in two years.
Resumo:
Interleukin-1 beta (IL1β) is a proinflammatory cytokine that mediates arthritic pathologies. Our objectives were to evaluate pain and limb dysfunction resulting from IL1β over-expression in the rat knee and to investigate the ability of local IL1 receptor antagonist (IL1Ra) delivery to reverse-associated pathology. IL1β over-expression was induced in the right knees of 30 Wistar rats via intra-articular injection of rat fibroblasts retrovirally infected with human IL1β cDNA. A subset of animals received a 30 µl intra-articular injection of saline or human IL1Ra on day 1 after cell delivery (0.65 µg/µl hIL1Ra, n = 7 per group). Joint swelling, gait, and sensitivity were investigated over 1 week. On day 8, animals were sacrificed and joints were collected for histological evaluation. Joint inflammation and elevated levels of endogenous IL1β were observed in knees receiving IL1β-infected fibroblasts. Asymmetric gaits favoring the affected limb and heightened mechanical sensitivity (allodynia) reflected a unilateral pathology. Histopathology revealed cartilage loss on the femoral groove and condyle of affected joints. Intra-articular IL1Ra injection failed to restore gait and sensitivity to preoperative levels and did not reduce cartilage degeneration observed in histopathology. Joint swelling and degeneration subsequent to IL1β over-expression is associated limb hypersensitivity and gait compensation. Intra-articular IL1Ra delivery did not result in marked improvement for this model; this may be driven by rapid clearance of administered IL1Ra from the joint space. These results motivate work to further investigate the behavioral consequences of monoarticular arthritis and sustained release drug delivery strategies for the joint space.
Resumo:
We describe a strategy for Markov chain Monte Carlo analysis of non-linear, non-Gaussian state-space models involving batch analysis for inference on dynamic, latent state variables and fixed model parameters. The key innovation is a Metropolis-Hastings method for the time series of state variables based on sequential approximation of filtering and smoothing densities using normal mixtures. These mixtures are propagated through the non-linearities using an accurate, local mixture approximation method, and we use a regenerating procedure to deal with potential degeneracy of mixture components. This provides accurate, direct approximations to sequential filtering and retrospective smoothing distributions, and hence a useful construction of global Metropolis proposal distributions for simulation of posteriors for the set of states. This analysis is embedded within a Gibbs sampler to include uncertain fixed parameters. We give an example motivated by an application in systems biology. Supplemental materials provide an example based on a stochastic volatility model as well as MATLAB code.
Resumo:
Technological advances in genotyping have given rise to hypothesis-based association studies of increasing scope. As a result, the scientific hypotheses addressed by these studies have become more complex and more difficult to address using existing analytic methodologies. Obstacles to analysis include inference in the face of multiple comparisons, complications arising from correlations among the SNPs (single nucleotide polymorphisms), choice of their genetic parametrization and missing data. In this paper we present an efficient Bayesian model search strategy that searches over the space of genetic markers and their genetic parametrization. The resulting method for Multilevel Inference of SNP Associations, MISA, allows computation of multilevel posterior probabilities and Bayes factors at the global, gene and SNP level, with the prior distribution on SNP inclusion in the model providing an intrinsic multiplicity correction. We use simulated data sets to characterize MISA's statistical power, and show that MISA has higher power to detect association than standard procedures. Using data from the North Carolina Ovarian Cancer Study (NCOCS), MISA identifies variants that were not identified by standard methods and have been externally "validated" in independent studies. We examine sensitivity of the NCOCS results to prior choice and method for imputing missing data. MISA is available in an R package on CRAN.
Resumo:
Late outgrowth endothelial progenitor cells (EPCs) derived from the peripheral blood of patients with significant coronary artery disease were sodded into the lumens of small diameter expanded polytetrafluoroethylene (ePTFE) vascular grafts. Grafts (1mm inner diameter) were denucleated and sodded either with native EPCs or with EPCs transfected with an adenoviral vector containing the gene for human thrombomodulin (EPC+AdTM). EPC+AdTM was shown to increase the in vitro rate of graft activated protein C (APC) production 4-fold over grafts sodded with untransfected EPCs (p<0.05). Unsodded control and EPC-sodded and EPC+AdTM-sodded grafts were implanted bilaterally into the femoral arteries of athymic rats for 7 or 28 days. Unsodded control grafts, both with and without denucleation treatment, each exhibited 7 day patency rates of 25%. Unsodded grafts showed extensive thrombosis and were not tested for patency over 28 days. In contrast, grafts sodded with untransfected EPCs or EPC+AdTM both had 7 day patency rates of 88-89% and 28 day patency rates of 75-88%. Intimal hyperplasia was observed near both the proximal and distal anastomoses in all sodded graft conditions but did not appear to be the primary occlusive failure event. This in vivo study suggests autologous EPCs derived from the peripheral blood of patients with coronary artery disease may improve the performance of synthetic vascular grafts, although no differences were observed between untransfected EPCs and TM transfected EPCs.
Resumo:
This chapter presents a model averaging approach in the M-open setting using sample re-use methods to approximate the predictive distribution of future observations. It first reviews the standard M-closed Bayesian Model Averaging approach and decision-theoretic methods for producing inferences and decisions. It then reviews model selection from the M-complete and M-open perspectives, before formulating a Bayesian solution to model averaging in the M-open perspective. It constructs optimal weights for MOMA:M-open Model Averaging using a decision-theoretic framework, where models are treated as part of the ‘action space’ rather than unknown states of nature. Using ‘incompatible’ retrospective and prospective models for data from a case-control study, the chapter demonstrates that MOMA gives better predictive accuracy than the proxy models. It concludes with open questions and future directions.