939 resultados para DNA data banks
Resumo:
Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape.
We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations.
We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites.
Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets.
This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.
Resumo:
Genetic data from polymorphic microsatellite loci were employed to estimate paternity and maternity in a local population of nine-banded armadillos (Dasypus novemcinctus) in northern Florida. The parentage assessments took advantage of maximum likelihood procedures developed expressly for situations when individuals of neither gender can be excluded a priori as candidate parents. The molecular data for 290 individuals, interpreted alone and in conjunction with detailed biological and spatial information for the population, demonstrate high exclusion probabilities and reasonably strong likelihoods of genetic parentage assignment in many cases; low mean probabilities of successful reproductive contribution to the local population by individual armadillo adults in a given year; and statistically significant microspatial associations of parents and their offspring. Results suggest that molecular assays of highly polymorphic genetic systems can add considerable power to assessments of biological parentage in natural populations even when neither parent is otherwise known.
Resumo:
A robust method for fitting to the results of gel electrophoresis assays of damage to plasmid DNA caused by radiation is presented. This method makes use of nonlinear regression to fit analytically derived dose response curves to observations of the supercoiled, open circular and linear plasmid forms simultaneously, allowing for more accurate results than fitting to individual forms. Comparisons with a commonly used analysis method show that while there is a relatively small benefit between the methods for data sets with small errors, the parameters generated by this method remain much more closely distributed around the true value in the face of increasing measurement uncertainties. This allows for parameters to be specified with greater confidence, reflected in a reduction of errors on fitted parameters. On test data sets, fitted uncertainties were reduced by 30%, similar to the improvement that would be offered by moving from triplicate to fivefold repeats (assuming standard errors). This method has been implemented in a popular spreadsheet package and made available online to improve its accessibility. (C) 2011 by Radiation Research Society
Resumo:
In studies of radiation-induced DNA fragmentation and repair, analytical models may provide rapid and easy-to-use methods to test simple hypotheses regarding the breakage and rejoining mechanisms involved. The random breakage model, according to which lesions are distributed uniformly and independently of each other along the DNA, has been the model most used to describe spatial distribution of radiation-induced DNA damage. Recently several mechanistic approaches have been proposed that model clustered damage to DNA. In general, such approaches focus on the study of initial radiation-induced DNA damage and repair, without considering the effects of additional (unwanted and unavoidable) fragmentation that may take place during the experimental procedures. While most approaches, including measurement of total DNA mass below a specified value, allow for the occurrence of background experimental damage by means of simple subtractive procedures, a more detailed analysis of DNA fragmentation necessitates a more accurate treatment. We have developed a new, relatively simple model of DNA breakage and the resulting rejoining kinetics of broken fragments. Initial radiation-induced DNA damage is simulated using a clustered breakage approach, with three free parameters: the number of independently located clusters, each containing several DNA double-strand breaks (DSBs), the average number of DSBs within a cluster (multiplicity of the cluster), and the maximum allowed radius within which DSBs belonging to the same cluster are distributed. Random breakage is simulated as a special case of the DSB clustering procedure. When the model is applied to the analysis of DNA fragmentation as measured with pulsed-field gel electrophoresis (PFGE), the hypothesis that DSBs in proximity rejoin at a different rate from that of sparse isolated breaks can be tested, since the kinetics of rejoining of fragments of varying size may be followed by means of computer simulations. The problem of how to account for background damage from experimental handling is also carefully considered. We have shown that the conventional procedure of subtracting the background damage from the experimental data may lead to erroneous conclusions during the analysis of both initial fragmentation and DSB rejoining. Despite its relative simplicity, the method presented allows both the quantitative and qualitative description of radiation-induced DNA fragmentation and subsequent rejoining of double-stranded DNA fragments. (C) 2004 by Radiation Research Society.
Resumo:
In the study of complex genetic diseases, the identification of subgroups of patients sharing similar genetic characteristics represents a challenging task, for example, to improve treatment decision. One type of genetic lesion, frequently investigated in such disorders, is the change of the DNA copy number (CN) at specific genomic traits. Non-negative Matrix Factorization (NMF) is a standard technique to reduce the dimensionality of a data set and to cluster data samples, while keeping its most relevant information in meaningful components. Thus, it can be used to discover subgroups of patients from CN profiles. It is however computationally impractical for very high dimensional data, such as CN microarray data. Deciding the most suitable number of subgroups is also a challenging problem. The aim of this work is to derive a procedure to compact high dimensional data, in order to improve NMF applicability without compromising the quality of the clustering. This is particularly important for analyzing high-resolution microarray data. Many commonly used quality measures, as well as our own measures, are employed to decide the number of subgroups and to assess the quality of the results. Our measures are based on the idea of identifying robust subgroups, inspired by biologically/clinically relevance instead of simply aiming at well-separated clusters. We evaluate our procedure using four real independent data sets. In these data sets, our method was able to find accurate subgroups with individual molecular and clinical features and outperformed the standard NMF in terms of accuracy in the factorization fitness function. Hence, it can be useful for the discovery of subgroups of patients with similar CN profiles in the study of heterogeneous diseases.
Resumo:
The DNA mismatch repair (MMR) pathway detects and repairs DNA replication errors. While DNA MMR-proficiency is known to play a key role in the sensitivity to a number of DNA damaging agents, its role in the cytotoxicity of ionizing radiation (IR) is less well characterized. Available literature to date is conflicting regarding the influence of MMR status on radiosensitivity, and this has arisen as a subject of controversy in the field. The aim of this paper is to provide the first comprehensive overview of the experimental data linking MMR proteins and the DNA damage response to IR. A PubMed search was conducted using the key words "DNA mismatch repair" and "ionizing radiation". Relevant articles and their references were reviewed for their association between DNA MMR and IR. Recent data suggest that radiation dose and the type of DNA damage induced may dictate the involvement of the MMR system in the cellular response to IR. In particular, the literature supports a role for the MMR system in DNA damage recognition, cell cycle arrest, DNA repair and apoptosis. In this review we discuss our current understanding of the impact of MMR status on the cellular response to radiation in mammalian cells gained from past and present studies and attempt to provide an explanation for how MMR may determine the response to radiation.
Resumo:
A Work Project, presented as part of the requirements for the Award of a Masters Degree in Finance from the NOVA – School of Business and Economics
Resumo:
Pygmy Shrews in North America have variously been considered to be one species (Sorex hoyi) or two species (S. hoyi and S. thompsoni). Currently, only S. hoyi is recognized. In this study, we examine mitochondrial DNA sequence data for the cytochrome b gene to evaluate the level of differentiation and phylogeographic relationships among eleven samples of Pygmy Shrews from across Canada. Pygmy Shrews from eastern Canada (i.e., Ontario, Quebec, New Brunswick, Nova Scotia, and Prince Edward Island) are distinct from Pygmy Shrews from western Canada (Alberta, Yukon) and Alaska. The average level of sequence divergence between these clades (3.3%) falls within the range of values for other recognized pairs of sister species of shrews. A molecular clock based on third position transversion substitutions suggests that these two lineages diverged between 0.44 and 1.67 million years ago. These molecular phylogenetic data. combined with a reinterpretation of previously published morphological data, are suggestive of separate species status for S. hoyi and S. thompsoni as has been previously argued by others. Further analysis of specimens from geographically intermediate areas (e.g., Manitoba. northern Ontario) is required to determine if there is secondary contact and/or introgression between these two putative species.
Resumo:
Background: Affymetrix GeneChip arrays are widely used for transcriptomic studies in a diverse range of species. Each gene is represented on a GeneChip array by a probe- set, consisting of up to 16 probe-pairs. Signal intensities across probe- pairs within a probe-set vary in part due to different physical hybridisation characteristics of individual probes with their target labelled transcripts. We have previously developed a technique to study the transcriptomes of heterologous species based on hybridising genomic DNA (gDNA) to a GeneChip array designed for a different species, and subsequently using only those probes with good homology. Results: Here we have investigated the effects of hybridising homologous species gDNA to study the transcriptomes of species for which the arrays have been designed. Genomic DNA from Arabidopsis thaliana and rice (Oryza sativa) were hybridised to the Affymetrix Arabidopsis ATH1 and Rice Genome GeneChip arrays respectively. Probe selection based on gDNA hybridisation intensity increased the number of genes identified as significantly differentially expressed in two published studies of Arabidopsis development, and optimised the analysis of technical replicates obtained from pooled samples of RNA from rice. Conclusion: This mixed physical and bioinformatics approach can be used to optimise estimates of gene expression when using GeneChip arrays.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)