3 resultados para Molekular Dynamik, Simulation, Modellierung, Protein, Coarse Graining
em Duke University
Resumo:
Computer simulations of reaction processes in solution in general rely on the definition of a reaction coordinate and the determination of the thermodynamic changes of the system along the reaction coordinate. The reaction coordinate often is constituted of characteristic geometrical properties of the reactive solute species, while the contributions of solvent molecules are implicitly included in the thermodynamics of the solute degrees of freedoms. However, solvent dynamics can provide the driving force for the reaction process, and in such cases explicit description of the solvent contribution in the free energy of the reaction process becomes necessary. We report here a method that can be used to analyze the solvent contributions to the reaction activation free energies from the combined QM/MM minimum free-energy path simulations. The method was applied to the self-exchange S(N)2 reaction of CH(3)Cl + Cl(-), showing that the importance of solvent-solute interactions to the reaction process. The results were further discussed in the context of coupling between solvent and solute molecules in reaction processes.
Resumo:
Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape.
We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations.
We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites.
Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets.
This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.
Resumo:
Proteins are specialized molecules that catalyze most of the reactions that can sustain life, and they become functional by folding into a specific 3D structure. Despite their importance, the question, "how do proteins fold?" - first pondered in in the 1930's - is still listed as one of the top unanswered scientific questions as of 2005, according to the journal Science. Answering this question would provide a foundation for understanding protein function and would enable improved drug targeting, efficient biofuel production, and stronger biomaterials. Much of what we currently know about protein folding comes from studies on small, single-domain proteins, which may be quite different from the folding of large, multidomain proteins that predominate the proteomes of all organisms.
In this thesis I will discuss my work to fill this gap in understanding by studying the unfolding and refolding of large, multidomain proteins using the powerful combination of single-molecule force-spectroscopy experiments and molecular dynamic simulations.
The three model proteins studied - Luciferase, Protein S, and Streptavidin - lend insight into the inter-domain dependence for unfolding and the subdomain stabilization of binding ligands, and ultimately provide new insight into atomistic details of the intermediate states along the folding pathway.