926 resultados para genomic fingerprinting
Resumo:
BACKGROUND: The evolutionary relationships of modern birds are among the most challenging to understand in systematic biology and have been debated for centuries. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders, and used the genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomics analyses (Jarvis et al. in press; Zhang et al. in press). Here we release assemblies and datasets associated with the comparative genome analyses, which include 38 newly sequenced avian genomes plus previously released or simultaneously released genomes of Chicken, Zebra finch, Turkey, Pigeon, Peregrine falcon, Duck, Budgerigar, Adelie penguin, Emperor penguin and the Medium Ground Finch. We hope that this resource will serve future efforts in phylogenomics and comparative genomics. FINDINGS: The 38 bird genomes were sequenced using the Illumina HiSeq 2000 platform and assembled using a whole genome shotgun strategy. The 48 genomes were categorized into two groups according to the N50 scaffold size of the assemblies: a high depth group comprising 23 species sequenced at high coverage (>50X) with multiple insert size libraries resulting in N50 scaffold sizes greater than 1 Mb (except the White-throated Tinamou and Bald Eagle); and a low depth group comprising 25 species sequenced at a low coverage (~30X) with two insert size libraries resulting in an average N50 scaffold size of about 50 kb. Repetitive elements comprised 4%-22% of the bird genomes. The assembled scaffolds allowed the homology-based annotation of 13,000 ~ 17000 protein coding genes in each avian genome relative to chicken, zebra finch and human, as well as comparative and sequence conservation analyses. CONCLUSIONS: Here we release full genome assemblies of 38 newly sequenced avian species, link genome assembly downloads for the 7 of the remaining 10 species, and provide a guideline of genomic data that has been generated and used in our Avian Phylogenomics Project. To the best of our knowledge, the Avian Phylogenomics Project is the biggest vertebrate comparative genomics project to date. The genomic data presented here is expected to accelerate further analyses in many fields, including phylogenetics, comparative genomics, evolution, neurobiology, development biology, and other related areas.
Resumo:
Centromeres are chromosomal loci essential for genome stability. Their malfunction can cause chromosome instability associated with cancer, infertility, and birth defects. This study focused on an intriguing centromere on human chromosome 17, which displays normal functional variation. Centromere identity can be found on either of two large arrays of repetitive DNA. We investigated inter-individual sequence variation on these two arrays and found association between array size, array variation, and centromere function. Our data suggest a functional influence of DNA sequence at this critical epigenetic locus.
Resumo:
Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape.
We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations.
We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites.
Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets.
This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.
Resumo:
Copyright © Taylor & Francis Group, LLC 2015.Type 2 diabetes is a major health burden in the United States, and population trends suggest this burden will increase. High interest in, and increased availability of, testing for genetic risk of type 2 diabetes presents a new opportunity for reducing type 2 diabetes risk for many patients; however, to date, there is little evidence that genetic testing positively affects type 2 diabetes prevention. Genetic information may not fit patients illness representations, which may reduce the chances of risk-reducing behavior changes. The present study aimed to examine illness representations in a clinical sample who are at risk for type 2 diabetes and interested in genetic testing. The authors used the Common Sense Model to analyze survey responses of 409 patients with type 2 diabetes risk factors. Patients were interested in genetic testing for type 2 diabetes risk and believed in its importance. Most patients believed that genetic factors are important to developing type 2 diabetes (67%), that diet and exercise are effective in preventing type 2 diabetes (95%), and that lifestyle changes are more effective than drugs (86%). Belief in genetic causality was not related to poorer self-reported health behaviors. These results suggest that patients interest in genetic testing for type 2 diabetes might produce a teachable moment that clinicians can use to counsel behavior change.
Resumo:
BACKGROUND: Patients, clinicians, researchers and payers are seeking to understand the value of using genomic information (as reflected by genotyping, sequencing, family history or other data) to inform clinical decision-making. However, challenges exist to widespread clinical implementation of genomic medicine, a prerequisite for developing evidence of its real-world utility. METHODS: To address these challenges, the National Institutes of Health-funded IGNITE (Implementing GeNomics In pracTicE; www.ignite-genomics.org ) Network, comprised of six projects and a coordinating center, was established in 2013 to support the development, investigation and dissemination of genomic medicine practice models that seamlessly integrate genomic data into the electronic health record and that deploy tools for point of care decision making. IGNITE site projects are aligned in their purpose of testing these models, but individual projects vary in scope and design, including exploring genetic markers for disease risk prediction and prevention, developing tools for using family history data, incorporating pharmacogenomic data into clinical care, refining disease diagnosis using sequence-based mutation discovery, and creating novel educational approaches. RESULTS: This paper describes the IGNITE Network and member projects, including network structure, collaborative initiatives, clinical decision support strategies, methods for return of genomic test results, and educational initiatives for patients and providers. Clinical and outcomes data from individual sites and network-wide projects are anticipated to begin being published over the next few years. CONCLUSIONS: The IGNITE Network is an innovative series of projects and pilot demonstrations aiming to enhance translation of validated actionable genomic information into clinical settings and develop and use measures of outcome in response to genome-based clinical interventions using a pragmatic framework to provide early data and proofs of concept on the utility of these interventions. Through these efforts and collaboration with other stakeholders, IGNITE is poised to have a significant impact on the acceleration of genomic information into medical practice.
Resumo:
info:eu-repo/semantics/published
Resumo:
The E1AF protein belongs to the family of Ets transcription factors and is involved in the regulation of metastasis gene expression. It has recently been reported in an undifferentiated child sarcoma that part of this gene could be fused by translocation to the ews gene. We show here that the human e1af gene, which is located in the q21 region of chromosome 17, is organized in 13 exons distributed along 19 kb of genomic DNA. Its two main functional domains, the acidic domain and the DNA-binding ETS domain, are each encoded by three different exons. The 3'-untranslated region of e1af is 0.7 kb. The 5'-untranslated region is about 0.3 kb and is composed of a first exon upstream from the exon containing the first methionine. These data could possibly accelerate an understanding of the molecular basis of putative inherited diseases linked to E1AF. (C) 1999 Elsevier Science B.V. All rights reserved.
Resumo:
Coccolithoviruses are giant dsDNA viruses that infect Emiliania huxleyi, the most ubiquitous marine microalga. Here, we present the genome of the latest coccolithovirus strain to be sequenced, EhV-99B1, and compare it with two other coccolithovirus genomes (EhV-86 and EhV-163). EhV-99B1 shares a pairwise nucleotide identity of 98% with EhV-163 (the two strains were isolated from the same Norwegian fjord but in different years), and just 96.5% with EhV-86 (isolated in the same spring as EhV-99B1 but in the English Channel). We confirmed and extended the list of relevant genomic differences between these EhVs from the Norwegian fjord and EhVs from the English Channel, namely the removal/insertions of: a phosphate permease, an endonuclease, a transposase, and two specific tRNAs. As a whole, this study provided new clues and insights into the diversity and mechanisms driving the evolution of these large oceanic viruses, in particular those processes involving selfish genetic elements.