24 resultados para Whole genome mapping
Resumo:
The mechanisms involved in the recognition of microbial pathogens and activation of the immune system have been extensively studied. However, the mechanisms involved in the recovery phase of an infection are incompletely characterized at both the cellular and physiological levels. Here, we establish a Caenorhabditis elegans-Salmonella enterica model of acute infection and antibiotic treatment for studying biological changes during the resolution phase of an infection. Using whole genome expression profiles of acutely infected animals, we found that genes that are markers of innate immunity are down-regulated upon recovery, while genes involved in xenobiotic detoxification, redox regulation, and cellular homeostasis are up-regulated. In silico analyses demonstrated that genes altered during recovery from infection were transcriptionally regulated by conserved transcription factors, including GATA/ELT-2, FOXO/DAF-16, and Nrf/SKN-1. Finally, we found that recovery from an acute bacterial infection is dependent on ELT-2 activity.
Resumo:
BACKGROUND: The evolutionary relationships of modern birds are among the most challenging to understand in systematic biology and have been debated for centuries. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders, and used the genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomics analyses (Jarvis et al. in press; Zhang et al. in press). Here we release assemblies and datasets associated with the comparative genome analyses, which include 38 newly sequenced avian genomes plus previously released or simultaneously released genomes of Chicken, Zebra finch, Turkey, Pigeon, Peregrine falcon, Duck, Budgerigar, Adelie penguin, Emperor penguin and the Medium Ground Finch. We hope that this resource will serve future efforts in phylogenomics and comparative genomics. FINDINGS: The 38 bird genomes were sequenced using the Illumina HiSeq 2000 platform and assembled using a whole genome shotgun strategy. The 48 genomes were categorized into two groups according to the N50 scaffold size of the assemblies: a high depth group comprising 23 species sequenced at high coverage (>50X) with multiple insert size libraries resulting in N50 scaffold sizes greater than 1 Mb (except the White-throated Tinamou and Bald Eagle); and a low depth group comprising 25 species sequenced at a low coverage (~30X) with two insert size libraries resulting in an average N50 scaffold size of about 50 kb. Repetitive elements comprised 4%-22% of the bird genomes. The assembled scaffolds allowed the homology-based annotation of 13,000 ~ 17000 protein coding genes in each avian genome relative to chicken, zebra finch and human, as well as comparative and sequence conservation analyses. CONCLUSIONS: Here we release full genome assemblies of 38 newly sequenced avian species, link genome assembly downloads for the 7 of the remaining 10 species, and provide a guideline of genomic data that has been generated and used in our Avian Phylogenomics Project. To the best of our knowledge, the Avian Phylogenomics Project is the biggest vertebrate comparative genomics project to date. The genomic data presented here is expected to accelerate further analyses in many fields, including phylogenetics, comparative genomics, evolution, neurobiology, development biology, and other related areas.
Resumo:
Dopamine is an important central nervous system transmitter that functions through two classes of receptors (D1 and D2) to influence a diverse range of biological processes in vertebrates. With roles in regulating neural activity, behavior, and gene expression, there has been great interest in understanding the function and evolution dopamine and its receptors. In this study, we use a combination of sequence analyses, microsynteny analyses, and phylogenetic relationships to identify and characterize both the D1 (DRD1A, DRD1B, DRD1C, and DRD1E) and D2 (DRD2, DRD3, and DRD4) dopamine receptor gene families in 43 recently sequenced bird genomes representing the major ordinal lineages across the avian family tree. We show that the common ancestor of all birds possessed at least seven D1 and D2 receptors, followed by subsequent independent losses in some lineages of modern birds. Through comparisons with other vertebrate and invertebrate species we show that two of the D1 receptors, DRD1A and DRD1B, and two of the D2 receptors, DRD2 and DRD3, originated from a whole genome duplication event early in the vertebrate lineage, providing the first conclusive evidence of the origin of these highly conserved receptors. Our findings provide insight into the evolutionary development of an important modulatory component of the central nervous system in vertebrates, and will help further unravel the complex evolutionary and functional relationships among dopamine receptors.
Resumo:
Human genetics has been experiencing a wave of genetic discoveries thanks to the development of several technologies, such as genome-wide association studies (GWAS), whole-exome sequencing, and whole genome sequencing. Despite the massive genetic discoveries of new variants associated with human diseases, several key challenges emerge following the genetic discovery. GWAS is known to be good at identifying the locus associated with the patient phenotype. However, the actually causal variants responsible for the phenotype are often elusive. Another challenge in human genetics is that even the causal mutations are already known, the underlying biological effect might remain largely ambiguous. Functional evaluation plays a key role to solve these key challenges in human genetics both to identify causal variants responsible for the phenotype, and to further develop the biological insights from the disease-causing mutations.
We adopted various methods to characterize the effects of variants identified in human genetic studies, including patient genetic and phenotypic data, RNA chemistry, molecular biology, virology, and multi-electrode array and primary neuronal culture systems. Chapter 1 is a broader introduction for the motivation and challenges for functional evaluation in human genetic studies, and the background of several genetics discoveries, such as hepatitis C treatment response, in which we performed functional characterization.
Chapter 2 focuses on the characterization of causal variants following the GWAS study for hepatitis C treatment response. We characterized a non-coding SNP (rs4803217) of IL28B (IFNL3) in high linkage disequilibrium (LD) with the discovery SNP identified in the GWAS. In this chapter, we used inter-disciplinary approaches to characterize rs4803217 on RNA structure, disease association, and protein translation.
Chapter 3 describes another avenue of functional characterization following GWAS focusing on the novel transcripts and proteins identified near the IL28B (IFNL3) locus. It has been recently speculated that this novel protein, which was named IFNL4, may affect the HCV treatment response and clearance. In this chapter, we used molecular biology, virology, and patient genetic and phenotypic data to further characterize and understand the biology of IFNL4. The efforts in chapter 2 and 3 provided new insights to the candidate causal variant(s) responsible for the GWAS for HCV treatment response, however, more evidence is still required to make claims for the exact causal roles of these variants for the GWAS association.
Chapter 4 aims to characterize a mutation already known to cause a disease (seizure) in a mouse model. We demonstrate the potential use of multi-electrode array (MEA) system for the functional characterization and drug testing on mutations found in neurological diseases, such as seizure. Functional characterization in neurological diseases is relatively challenging and available systematic tools are relatively limited. This chapter shows an exploratory research and example to establish a system for the broader use for functional characterization and translational opportunities for mutations found in neurological diseases.
Overall, this dissertation spans a range of challenges of functional evaluations in human genetics. It is expected that the functional characterization to understand human mutations will become more central in human genetics, because there are still many biological questions remaining to be answered after the explosion of human genetic discoveries. The recent advance in several technologies, including genome editing and pluripotent stem cells, is also expected to make new tools available for functional studies in human diseases.
Resumo:
Endopolyploid cells (hereafter - polyploid cells), which contain whole genome duplications in an otherwise diploid organism, play vital roles in development and physiology of diverse organs such as our heart and liver. Polyploidy is also observed with high frequency in many tumors, and division of such cells frequently creates aneuploidy (chromosomal imbalances), a hallmark of cancer. Despite its frequent occurrence and association with aneuploidy, little is known about the specific role that polyploidy plays in diverse contexts. Using a new model tissue, the Drosophila rectal papilla, we sought to uncover connections between polyploidy and aneuploidy during organ development. Our lab previously discovered that the papillar cells of the Drosophila hindgut undergo developmentally programmed polyploid cell divisions, and that these polyploid cell divisions are highly error-prone. Time-lapse studies of polyploid mitosis revealed that the papillar cells undergo a high percentage of tripolar anaphase, which causes extreme aneuploidy. Despite this massive chromosome imbalance, we found the tripolar daughter cells are viable and support normal organ development and function, suggesting acquiring extra genome sets enables a cell to tolerate the genomic alterations incurred by aneuploidy. We further extended these findings by seeking mechanisms by which the papillar cells tolerated this resultant aneuploidy.
Resumo:
The advent of next-generation sequencing, now nearing a decade in age, has enabled, among other capabilities, measurement of genome-wide sequence features at unprecedented scale and resolution.
In this dissertation, I describe work to understand the genetic underpinnings of non-Hodgkin’s lymphoma through exploration of the epigenetics of its cell of origin, initial characterization and interpretation of driver mutations, and finally, a larger-scale, population-level study that incorporates mutation interpretation with clinical outcome.
In the first research chapter, I describe genomic characteristics of lymphomas through the lens of their cells of origin. Just as many other cancers, such as breast cancer or lung cancer, are categorized based on their cell of origin, lymphoma subtypes can be examined through the context of their normal B Cells of origin, Naïve, Germinal Center, and post-Germinal Center. By applying integrative analysis of the epigenetics of normal B Cells of origin through chromatin-immunoprecipitation sequencing, we find that differences in normal B Cell subtypes are reflected in the mutational landscapes of the cancers that arise from them, namely Mantle Cell, Burkitt, and Diffuse Large B-Cell Lymphoma.
In the next research chapter, I describe our first endeavor into understanding the genetic heterogeneity of Diffuse Large B Cell Lymphoma, the most common form of non-Hodgkin’s lymphoma, which affects 100,000 patients in the world. Through whole-genome sequencing of 1 case as well as whole-exome sequencing of 94 cases, we characterize the most recurrent genetic features of DLBCL and lay the groundwork for a larger study.
In the last research chapter, I describe work to characterize and interpret the whole exomes of 1001 cases of DLBCL in the largest single-cancer study to date. This highly-powered study enabled sub-gene, gene-level, and gene-network level understanding of driver mutations within DLBCL. Moreover, matched genomic and clinical data enabled the connection of these driver mutations to clinical features such as treatment response or overall survival. As sequencing costs continue to drop, whole-exome sequencing will become a routine clinical assay, and another diagnostic dimension in addition to existing methods such as histology. However, to unlock the full utility of sequencing data, we must be able to interpret it. This study undertakes a first step in developing the understanding necessary to uncover the genomic signals of DLBCL hidden within its exomes. However, beyond the scope of this one disease, the experimental and analytical methods can be readily applied to other cancer sequencing studies.
Thus, this dissertation leverages next-generation sequencing analysis to understand the genetic underpinnings of lymphoma, both by examining its normal cells of origin as well as through a large-scale study to sensitively identify recurrently mutated genes and their relationship to clinical outcome.
Resumo:
Recent genomic analyses suggest the importance of combinatorial regulation by broadly expressed transcription factors rather than expression domains characterized by highly specific factors.
Resumo:
Recent emergence of human connectome imaging has led to a high demand on angular and spatial resolutions for diffusion magnetic resonance imaging (MRI). While there have been significant growths in high angular resolution diffusion imaging, the improvement in spatial resolution is still limited due to a number of technical challenges, such as the low signal-to-noise ratio and high motion artifacts. As a result, the benefit of a high spatial resolution in the whole-brain connectome imaging has not been fully evaluated in vivo. In this brief report, the impact of spatial resolution was assessed in a newly acquired whole-brain three-dimensional diffusion tensor imaging data set with an isotropic spatial resolution of 0.85 mm. It was found that the delineation of short cortical association fibers is drastically improved as well as the definition of fiber pathway endings into the gray/white matter boundary-both of which will help construct a more accurate structural map of the human brain connectome.
Resumo:
Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape.
We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations.
We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites.
Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets.
This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.