5 resultados para Computational studies

em Duke University


Relevância:

60.00% 60.00%

Publicador:

Resumo:

The central dogma of molecular biology relies on the correct Watson-Crick (WC) geometry of canonical deoxyribonucleic acid (DNA) dG•dC and dA•dT base pairs to replicate and transcribe genetic information with speed and an astonishing level of fidelity. In addition, the Watson-Crick geometry of canonical ribonucleic acid (RNA) rG•rC and rA•rU base pairs is highly conserved to ensure that proteins are translated with high fidelity. However, numerous other potential nucleobase tautomeric and ionic configurations are possible that can give rise to entirely new pairing modes between the nucleotide bases. Very early on, James Watson and Francis Crick recognized their importance and in 1953 postulated that if bases adopted one of their less energetically disfavored tautomeric forms (and later ionic forms) during replication it could lead to the formation of a mismatch with a Watson-Crick-like geometry and could give rise to “natural mutations.”

Since this time numerous studies have provided evidence in support of this hypothesis and have expanded upon it; computational studies have addressed the energetic feasibilities of different nucleobases’ tautomeric and ionic forms in siico; crystallographic studies have trapped different mismatches with WC-like geometries in polymerase or ribosome active sites. However, no direct evidence has been given for (i) the direct existence of these WC-like mismatches in canonical DNA duplex, RNA duplexes, or non-coding RNAs; (ii) which, if any, tautomeric or ionic form stabilizes the WC-like geometry. This thesis utilizes nuclear magnetic resonance (NMR) spectroscopy and rotating frame relaxation dispersion (R1ρ RD) in combination with density functional theory (DFT), biochemical assays, and targeted chemical perturbations to show that (i) dG•dT mismatches in DNA duplexes, as well as rG•rU mismatches RNA duplexes and non-coding RNAs, transiently adopt a WC-like geometry that is stabilized by (ii) an interconnected network of rapidly interconverting rare tautomers and anionic bases. These results support Watson and Crick’s tautomer hypothesis, but additionally support subsequent hypotheses invoking anionic mismatches and ultimately tie them together. This dissertation shows that a common mismatch can adopt a Watson-Crick-like geometry globally, in both DNA and RNA, and whose geometry is stabilized by a kinetically linked network of rare tautomeric and anionic bases. The studies herein also provide compelling evidence for their involvement in spontaneous replication and translation errors.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article describes advances in statistical computation for large-scale data analysis in structured Bayesian mixture models via graphics processing unit (GPU) programming. The developments are partly motivated by computational challenges arising in fitting models of increasing heterogeneity to increasingly large datasets. An example context concerns common biological studies using high-throughput technologies generating many, very large datasets and requiring increasingly high-dimensional mixture models with large numbers of mixture components.We outline important strategies and processes for GPU computation in Bayesian simulation and optimization approaches, give examples of the benefits of GPU implementations in terms of processing speed and scale-up in ability to analyze large datasets, and provide a detailed, tutorial-style exposition that will benefit readers interested in developing GPU-based approaches in other statistical models. Novel, GPU-oriented approaches to modifying existing algorithms software design can lead to vast speed-up and, critically, enable statistical analyses that presently will not be performed due to compute time limitations in traditional computational environments. Supplementalmaterials are provided with all source code, example data, and details that will enable readers to implement and explore the GPU approach in this mixture modeling context. © 2010 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape.

We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations.

We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites.

Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets.

This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

© 2014 .The adoption of antisense gene silencing as a novel disinfectant for prokaryotic organisms is hindered by poor silencing efficiencies. Few studies have considered the effects of off-targets on silencing efficiencies, especially in prokaryotic organisms. In this computational study, a novel algorithm was developed that determined and sorted the number of off-targets as a function of alignment length in Escherichia coli K-12 MG1655 and Mycobacterium tuberculosis H37Rv. The mean number of off-targets per a single location was calculated to be 14.1. ±. 13.3 and 36.1. ±. 58.5 for the genomes of E. coli K-12 MG1655 and M. tuberculosis H37Rv, respectively. Furthermore, when the entire transcriptome was analyzed, it was found that there was no general gene location that could be targeted to minimize or maximize the number of off-targets. In an effort to determine the effects of off-targets on silencing efficiencies, previously published studies were used. Analyses with acpP, ino1, and marORAB revealed a statistically significant relationship between the number of short alignment length off-targets hybrids and the efficacy of the antisense gene silencing, suggesting that the minimization of off-targets may be beneficial for antisense gene silencing in prokaryotic organisms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the last two decades, the field of homogeneous gold catalysis has been

extremely active, growing at a rapid pace. Another rapidly-growing field—that of

computational chemistry—has often been applied to the investigation of various gold-

catalyzed reaction mechanisms. Unfortunately, a number of recent mechanistic studies

have utilized computational methods that have been shown to be inappropriate and

inaccurate in their description of gold chemistry. This work presents an overview of

available computational methods with a focus on the approximations and limitations

inherent in each, and offers a review of experimentally-characterized gold(I) complexes

and proposed mechanisms as compared with their computationally-modeled

counterparts. No aim is made to identify a “recommended” computational method for

investigations of gold catalysis; rather, discrepancies between experimentally and

computationally obtained values are highlighted, and the systematic errors between

different computational methods are discussed.