13 resultados para Genomic data
em Duke University
Resumo:
BACKGROUND: The evolutionary relationships of modern birds are among the most challenging to understand in systematic biology and have been debated for centuries. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders, and used the genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomics analyses (Jarvis et al. in press; Zhang et al. in press). Here we release assemblies and datasets associated with the comparative genome analyses, which include 38 newly sequenced avian genomes plus previously released or simultaneously released genomes of Chicken, Zebra finch, Turkey, Pigeon, Peregrine falcon, Duck, Budgerigar, Adelie penguin, Emperor penguin and the Medium Ground Finch. We hope that this resource will serve future efforts in phylogenomics and comparative genomics. FINDINGS: The 38 bird genomes were sequenced using the Illumina HiSeq 2000 platform and assembled using a whole genome shotgun strategy. The 48 genomes were categorized into two groups according to the N50 scaffold size of the assemblies: a high depth group comprising 23 species sequenced at high coverage (>50X) with multiple insert size libraries resulting in N50 scaffold sizes greater than 1 Mb (except the White-throated Tinamou and Bald Eagle); and a low depth group comprising 25 species sequenced at a low coverage (~30X) with two insert size libraries resulting in an average N50 scaffold size of about 50 kb. Repetitive elements comprised 4%-22% of the bird genomes. The assembled scaffolds allowed the homology-based annotation of 13,000 ~ 17000 protein coding genes in each avian genome relative to chicken, zebra finch and human, as well as comparative and sequence conservation analyses. CONCLUSIONS: Here we release full genome assemblies of 38 newly sequenced avian species, link genome assembly downloads for the 7 of the remaining 10 species, and provide a guideline of genomic data that has been generated and used in our Avian Phylogenomics Project. To the best of our knowledge, the Avian Phylogenomics Project is the biggest vertebrate comparative genomics project to date. The genomic data presented here is expected to accelerate further analyses in many fields, including phylogenetics, comparative genomics, evolution, neurobiology, development biology, and other related areas.
Resumo:
Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape.
We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations.
We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites.
Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets.
This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.
Resumo:
BACKGROUND: Patients, clinicians, researchers and payers are seeking to understand the value of using genomic information (as reflected by genotyping, sequencing, family history or other data) to inform clinical decision-making. However, challenges exist to widespread clinical implementation of genomic medicine, a prerequisite for developing evidence of its real-world utility. METHODS: To address these challenges, the National Institutes of Health-funded IGNITE (Implementing GeNomics In pracTicE; www.ignite-genomics.org ) Network, comprised of six projects and a coordinating center, was established in 2013 to support the development, investigation and dissemination of genomic medicine practice models that seamlessly integrate genomic data into the electronic health record and that deploy tools for point of care decision making. IGNITE site projects are aligned in their purpose of testing these models, but individual projects vary in scope and design, including exploring genetic markers for disease risk prediction and prevention, developing tools for using family history data, incorporating pharmacogenomic data into clinical care, refining disease diagnosis using sequence-based mutation discovery, and creating novel educational approaches. RESULTS: This paper describes the IGNITE Network and member projects, including network structure, collaborative initiatives, clinical decision support strategies, methods for return of genomic test results, and educational initiatives for patients and providers. Clinical and outcomes data from individual sites and network-wide projects are anticipated to begin being published over the next few years. CONCLUSIONS: The IGNITE Network is an innovative series of projects and pilot demonstrations aiming to enhance translation of validated actionable genomic information into clinical settings and develop and use measures of outcome in response to genome-based clinical interventions using a pragmatic framework to provide early data and proofs of concept on the utility of these interventions. Through these efforts and collaboration with other stakeholders, IGNITE is poised to have a significant impact on the acceleration of genomic information into medical practice.
Resumo:
Phytochromes are red/far-red photoreceptors that play essential roles in diverse plant morphogenetic and physiological responses to light. Despite their functional significance, phytochrome diversity and evolution across photosynthetic eukaryotes remain poorly understood. Using newly available transcriptomic and genomic data we show that canonical plant phytochromes originated in a common ancestor of streptophytes (charophyte algae and land plants). Phytochromes in charophyte algae are structurally diverse, including canonical and non-canonical forms, whereas in land plants, phytochrome structure is highly conserved. Liverworts, hornworts and Selaginella apparently possess a single phytochrome, whereas independent gene duplications occurred within mosses, lycopods, ferns and seed plants, leading to diverse phytochrome families in these clades. Surprisingly, the phytochrome portions of algal and land plant neochromes, a chimera of phytochrome and phototropin, appear to share a common origin. Our results reveal novel phytochrome clades and establish the basis for understanding phytochrome functional evolution in land plants and their algal relatives.
Resumo:
Plant phototropism, the ability to bend toward or away from light, is predominantly controlled by blue-light photoreceptors, the phototropins. Although phototropins have been well-characterized in Arabidopsis thaliana, their evolutionary history is largely unknown. In this study, we complete an in-depth survey of phototropin homologs across land plants and algae using newly available transcriptomic and genomic data. We show that phototropins originated in an ancestor of Viridiplantae (land plants + green algae). Phototropins repeatedly underwent independent duplications in most major land-plant lineages (mosses, lycophytes, ferns, and seed plants), but remained single-copy genes in liverworts and hornworts-an evolutionary pattern shared with another family of photoreceptors, the phytochromes. Following each major duplication event, the phototropins differentiated in parallel, resulting in two specialized, yet partially overlapping, functional forms that primarily mediate either low- or high-light responses. Our detailed phylogeny enables us to not only uncover new phototropin lineages, but also link our understanding of phototropin function in Arabidopsis with what is known in Adiantum and Physcomitrella (the major model organisms outside of flowering plants). We propose that the convergent functional divergences of phototropin paralogs likely contributed to the success of plants through time in adapting to habitats with diverse and heterogeneous light conditions.
Resumo:
In this review, we discuss recent work by the ENIGMA Consortium (http://enigma.ini.usc.edu) - a global alliance of over 500 scientists spread across 200 institutions in 35 countries collectively analyzing brain imaging, clinical, and genetic data. Initially formed to detect genetic influences on brain measures, ENIGMA has grown to over 30 working groups studying 12 major brain diseases by pooling and comparing brain data. In some of the largest neuroimaging studies to date - of schizophrenia and major depression - ENIGMA has found replicable disease effects on the brain that are consistent worldwide, as well as factors that modulate disease effects. In partnership with other consortia including ADNI, CHARGE, IMAGEN and others(1), ENIGMA's genomic screens - now numbering over 30,000 MRI scans - have revealed at least 8 genetic loci that affect brain volumes. Downstream of gene findings, ENIGMA has revealed how these individual variants - and genetic variants in general - may affect both the brain and risk for a range of diseases. The ENIGMA consortium is discovering factors that consistently affect brain structure and function that will serve as future predictors linking individual brain scans and genomic data. It is generating vast pools of normative data on brain measures - from tens of thousands of people - that may help detect deviations from normal development or aging in specific groups of subjects. We discuss challenges and opportunities in applying these predictors to individual subjects and new cohorts, as well as lessons we have learned in ENIGMA's efforts so far.
Resumo:
The ABL family of non-receptor tyrosine kinases, ABL1 (also known as c-ABL) and ABL2 (also known as Arg), links diverse extracellular stimuli to signaling pathways that control cell growth, survival, adhesion, migration and invasion. ABL tyrosine kinases play an oncogenic role in human leukemias. However, the role of ABL kinases in solid tumors including breast cancer progression and metastasis is just emerging.
To evaluate whether ABL family kinases are involved in breast cancer development and metastasis, we first analyzed genomic data from large-scale screen of breast cancer patients. We found that ABL kinases are up-regulated in invasive breast cancer patients and high expression of ABL kinases correlates with poor prognosis and early metastasis. Using xenograft mouse models combined with genetic and pharmacological approaches, we demonstrated that ABL kinases are required for regulating breast cancer progression and metastasis to the bone. Using next generation sequencing and bioinformatics analysis, we uncovered a critical role for ABL kinases in promoting multiple oncogenic pathways including TAZ and STAT5 signaling networks and the epithelial to mesenchymal transition (EMT). These findings revealed a role for ABL kinases in regulating breast cancer tumorigenesis and bone metastasis and provide a rationale for targeting breast tumors with ABL-specific inhibitors.
Resumo:
BACKGROUND: Microsporidia are obligate intracellular, eukaryotic pathogens that infect a wide range of animals from nematodes to humans, and in some cases, protists. The preponderance of evidence as to the origin of the microsporidia reveals a close relationship with the fungi, either within the kingdom or as a sister group to it. Recent phylogenetic studies and gene order analysis suggest that microsporidia share a particularly close evolutionary relationship with the zygomycetes. METHODOLOGY/PRINCIPAL FINDINGS: Here we expanded this analysis and also examined a putative sex-locus for variability between microsporidian populations. Whole genome inspection reveals a unique syntenic gene pair (RPS9-RPL21) present in the vast majority of fungi and the microsporidians but not in other eukaryotic lineages. Two other unique gene fusions (glutamyl-prolyl tRNA synthetase and ubiquitin-ribosomal subunit S30) that are present in metazoans, choanoflagellates, and filasterean opisthokonts are unfused in the fungi and microsporidians. One locus previously found to be conserved in many microsporidian genomes is similar to the sex locus of zygomycetes in gene order and architecture. Both sex-related and sex loci harbor TPT, HMG, and RNA helicase genes forming a syntenic gene cluster. We sequenced and analyzed the sex-related locus in 11 different Encephalitozoon cuniculi isolates and the sibling species E. intestinalis (3 isolates) and E. hellem (1 isolate). There was no evidence for an idiomorphic sex-related locus in this Encephalitozoon species sample. According to sequence-based phylogenetic analyses, the TPT and RNA helicase genes flanking the HMG genes are paralogous rather than orthologous between zygomycetes and microsporidians. CONCLUSION/SIGNIFICANCE: The unique genomic hallmarks between microsporidia and fungi are independent of sequence based phylogenetic comparisons and further contribute to define the borders of the fungal kingdom and support the classification of microsporidia as unusual derived fungi. And the sex/sex-related loci appear to have been subject to frequent gene conversion and translocations in microsporidia and zygomycetes.
Resumo:
From primates to bees, social status regulates reproduction. In the cichlid fish Astatotilapia (Haplochromis) burtoni, subordinate males have reduced fertility and must become dominant to reproduce. This increase in sexual capacity is orchestrated by neurons in the preoptic area, which enlarge in response to dominance and increase expression of gonadotropin-releasing hormone 1 (GnRH1), a peptide critical for reproduction. Using a novel behavioral paradigm, we show for the first time that subordinate males can become dominant within minutes of an opportunity to do so, displaying dramatic changes in body coloration and behavior. We also found that social opportunity induced expression of the immediate-early gene egr-1 in the anterior preoptic area, peaking in regions with high densities of GnRH1 neurons, and not in brain regions that express the related peptides GnRH2 and GnRH3. This genomic response did not occur in stable subordinate or stable dominant males even though stable dominants, like ascending males, displayed dominance behaviors. Moreover, egr-1 in the optic tectum and the cerebellum was similarly induced in all experimental groups, showing that egr-1 induction in the anterior preoptic area of ascending males was specific to this brain region. Because egr-1 codes for a transcription factor important in neural plasticity, induction of egr-1 in the anterior preoptic area by social opportunity could be an early trigger in the molecular cascade that culminates in enhanced fertility and other long-term physiological changes associated with dominance.
Resumo:
Building on the planning efforts of the RCN4GSC project, a workshop was convened in San Diego to bring together experts from genomics and metagenomics, biodiversity, ecology, and bioinformatics with the charge to identify potential for positive interactions and progress, especially building on successes at establishing data standards by the GSC and by the biodiversity and ecological communities. Until recently, the contribution of microbial life to the biomass and biodiversity of the biosphere was largely overlooked (because it was resistant to systematic study). Now, emerging genomic and metagenomic tools are making investigation possible. Initial research findings suggest that major advances are in the offing. Although different research communities share some overlapping concepts and traditions, they differ significantly in sampling approaches, vocabularies and workflows. Likewise, their definitions of 'fitness for use' for data differ significantly, as this concept stems from the specific research questions of most importance in the different fields. Nevertheless, there is little doubt that there is much to be gained from greater coordination and integration. As a first step toward interoperability of the information systems used by the different communities, participants agreed to conduct a case study on two of the leading data standards from the two formerly disparate fields: (a) GSC's standard checklists for genomics and metagenomics and (b) TDWG's Darwin Core standard, used primarily in taxonomy and systematic biology.
Resumo:
Centromeres are chromosomal loci essential for genome stability. Their malfunction can cause chromosome instability associated with cancer, infertility, and birth defects. This study focused on an intriguing centromere on human chromosome 17, which displays normal functional variation. Centromere identity can be found on either of two large arrays of repetitive DNA. We investigated inter-individual sequence variation on these two arrays and found association between array size, array variation, and centromere function. Our data suggest a functional influence of DNA sequence at this critical epigenetic locus.
Resumo:
BACKGROUND: Determining the evolutionary relationships among the major lineages of extant birds has been one of the biggest challenges in systematic biology. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders. We used these genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomic analyses. FINDINGS: Here we present the datasets associated with the phylogenomic analyses, which include sequence alignment files consisting of nucleotides, amino acids, indels, and transposable elements, as well as tree files containing gene trees and species trees. Inferring an accurate phylogeny required generating: 1) A well annotated data set across species based on genome synteny; 2) Alignments with unaligned or incorrectly overaligned sequences filtered out; and 3) Diverse data sets, including genes and their inferred trees, indels, and transposable elements. Our total evidence nucleotide tree (TENT) data set (consisting of exons, introns, and UCEs) gave what we consider our most reliable species tree when using the concatenation-based ExaML algorithm or when using statistical binning with the coalescence-based MP-EST algorithm (which we refer to as MP-EST*). Other data sets, such as the coding sequence of some exons, revealed other properties of genome evolution, namely convergence. CONCLUSIONS: The Avian Phylogenomics Project is the largest vertebrate phylogenomics project to date that we are aware of. The sequence, alignment, and tree data are expected to accelerate analyses in phylogenomics and other related areas.
Resumo:
We examined facilitators and barriers to adoption of genomic services for colorectal care, one of the first genomic medicine applications, within the Veterans Health Administration to shed light on areas for practice change. We conducted semi-structured interviews with 58 clinicians to understand use of the following genomic services for colorectal care: family health history documentation, molecular and genetic testing, and genetic counseling. Data collection and analysis were informed by two conceptual frameworks, the Greenhalgh Diffusion of Innovation and Andersen Behavioral Model, to allow for concurrent examination of both access and innovation factors. Specialists were more likely than primary care clinicians to obtain family history to investigate hereditary colorectal cancer (CRC), but with limited detail; clinicians suggested templates to facilitate retrieval and documentation of family history according to guidelines. Clinicians identified advantage of molecular tumor analysis prior to genetic testing, but tumor testing was infrequently used due to perceived low disease burden. Support from genetic counselors was regarded as facilitative for considering hereditary basis of CRC diagnosis, but there was variability in awareness of and access to this expertise. Our data suggest the need for tools and policies to establish and disseminate well-defined processes for accessing services and adhering to guidelines.