10 resultados para High-throughput
em Duke University
Resumo:
The advent of next-generation sequencing, now nearing a decade in age, has enabled, among other capabilities, measurement of genome-wide sequence features at unprecedented scale and resolution.
In this dissertation, I describe work to understand the genetic underpinnings of non-Hodgkin’s lymphoma through exploration of the epigenetics of its cell of origin, initial characterization and interpretation of driver mutations, and finally, a larger-scale, population-level study that incorporates mutation interpretation with clinical outcome.
In the first research chapter, I describe genomic characteristics of lymphomas through the lens of their cells of origin. Just as many other cancers, such as breast cancer or lung cancer, are categorized based on their cell of origin, lymphoma subtypes can be examined through the context of their normal B Cells of origin, Naïve, Germinal Center, and post-Germinal Center. By applying integrative analysis of the epigenetics of normal B Cells of origin through chromatin-immunoprecipitation sequencing, we find that differences in normal B Cell subtypes are reflected in the mutational landscapes of the cancers that arise from them, namely Mantle Cell, Burkitt, and Diffuse Large B-Cell Lymphoma.
In the next research chapter, I describe our first endeavor into understanding the genetic heterogeneity of Diffuse Large B Cell Lymphoma, the most common form of non-Hodgkin’s lymphoma, which affects 100,000 patients in the world. Through whole-genome sequencing of 1 case as well as whole-exome sequencing of 94 cases, we characterize the most recurrent genetic features of DLBCL and lay the groundwork for a larger study.
In the last research chapter, I describe work to characterize and interpret the whole exomes of 1001 cases of DLBCL in the largest single-cancer study to date. This highly-powered study enabled sub-gene, gene-level, and gene-network level understanding of driver mutations within DLBCL. Moreover, matched genomic and clinical data enabled the connection of these driver mutations to clinical features such as treatment response or overall survival. As sequencing costs continue to drop, whole-exome sequencing will become a routine clinical assay, and another diagnostic dimension in addition to existing methods such as histology. However, to unlock the full utility of sequencing data, we must be able to interpret it. This study undertakes a first step in developing the understanding necessary to uncover the genomic signals of DLBCL hidden within its exomes. However, beyond the scope of this one disease, the experimental and analytical methods can be readily applied to other cancer sequencing studies.
Thus, this dissertation leverages next-generation sequencing analysis to understand the genetic underpinnings of lymphoma, both by examining its normal cells of origin as well as through a large-scale study to sensitively identify recurrently mutated genes and their relationship to clinical outcome.
Resumo:
Human activities represent a significant burden on the global water cycle, with large and increasing demands placed on limited water resources by manufacturing, energy production and domestic water use. In addition to changing the quantity of available water resources, human activities lead to changes in water quality by introducing a large and often poorly-characterized array of chemical pollutants, which may negatively impact biodiversity in aquatic ecosystems, leading to impairment of valuable ecosystem functions and services. Domestic and industrial wastewaters represent a significant source of pollution to the aquatic environment due to inadequate or incomplete removal of chemicals introduced into waters by human activities. Currently, incomplete chemical characterization of treated wastewaters limits comprehensive risk assessment of this ubiquitous impact to water. In particular, a significant fraction of the organic chemical composition of treated industrial and domestic wastewaters remains uncharacterized at the molecular level. Efforts aimed at reducing the impacts of water pollution on aquatic ecosystems critically require knowledge of the composition of wastewaters to develop interventions capable of protecting our precious natural water resources.
The goal of this dissertation was to develop a robust, extensible and high-throughput framework for the comprehensive characterization of organic micropollutants in wastewaters by high-resolution accurate-mass mass spectrometry. High-resolution mass spectrometry provides the most powerful analytical technique available for assessing the occurrence and fate of organic pollutants in the water cycle. However, significant limitations in data processing, analysis and interpretation have limited this technique in achieving comprehensive characterization of organic pollutants occurring in natural and built environments. My work aimed to address these challenges by development of automated workflows for the structural characterization of organic pollutants in wastewater and wastewater impacted environments by high-resolution mass spectrometry, and to apply these methods in combination with novel data handling routines to conduct detailed fate studies of wastewater-derived organic micropollutants in the aquatic environment.
In Chapter 2, chemoinformatic tools were implemented along with novel non-targeted mass spectrometric analytical methods to characterize, map, and explore an environmentally-relevant “chemical space” in municipal wastewater. This was accomplished by characterizing the molecular composition of known wastewater-derived organic pollutants and substances that are prioritized as potential wastewater contaminants, using these databases to evaluate the pollutant-likeness of structures postulated for unknown organic compounds that I detected in wastewater extracts using high-resolution mass spectrometry approaches. Results showed that application of multiple computational mass spectrometric tools to structural elucidation of unknown organic pollutants arising in wastewaters improved the efficiency and veracity of screening approaches based on high-resolution mass spectrometry. Furthermore, structural similarity searching was essential for prioritizing substances sharing structural features with known organic pollutants or industrial and consumer chemicals that could enter the environment through use or disposal.
I then applied this comprehensive methodological and computational non-targeted analysis workflow to micropollutant fate analysis in domestic wastewaters (Chapter 3), surface waters impacted by water reuse activities (Chapter 4) and effluents of wastewater treatment facilities receiving wastewater from oil and gas extraction activities (Chapter 5). In Chapter 3, I showed that application of chemometric tools aided in the prioritization of non-targeted compounds arising at various stages of conventional wastewater treatment by partitioning high dimensional data into rational chemical categories based on knowledge of organic chemical fate processes, resulting in the classification of organic micropollutants based on their occurrence and/or removal during treatment. Similarly, in Chapter 4, high-resolution sampling and broad-spectrum targeted and non-targeted chemical analysis were applied to assess the occurrence and fate of organic micropollutants in a water reuse application, wherein reclaimed wastewater was applied for irrigation of turf grass. Results showed that organic micropollutant composition of surface waters receiving runoff from wastewater irrigated areas appeared to be minimally impacted by wastewater-derived organic micropollutants. Finally, Chapter 5 presents results of the comprehensive organic chemical composition of oil and gas wastewaters treated for surface water discharge. Concurrent analysis of effluent samples by complementary, broad-spectrum analytical techniques, revealed that low-levels of hydrophobic organic contaminants, but elevated concentrations of polymeric surfactants, which may effect the fate and analysis of contaminants of concern in oil and gas wastewaters.
Taken together, my work represents significant progress in the characterization of polar organic chemical pollutants associated with wastewater-impacted environments by high-resolution mass spectrometry. Application of these comprehensive methods to examine micropollutant fate processes in wastewater treatment systems, water reuse environments, and water applications in oil/gas exploration yielded new insights into the factors that influence transport, transformation, and persistence of organic micropollutants in these systems across an unprecedented breadth of chemical space.
Resumo:
A large proportion of the variation in traits between individuals can be attributed to variation in the nucleotide sequence of the genome. The most commonly studied traits in human genetics are related to disease and disease susceptibility. Although scientists have identified genetic causes for over 4,000 monogenic diseases, the underlying mechanisms of many highly prevalent multifactorial inheritance disorders such as diabetes, obesity, and cardiovascular disease remain largely unknown. Identifying genetic mechanisms for complex traits has been challenging because most of the variants are located outside of protein-coding regions, and determining the effects of such non-coding variants remains difficult. In this dissertation, I evaluate the hypothesis that such non-coding variants contribute to human traits and diseases by altering the regulation of genes rather than the sequence of those genes. I will specifically focus on studies to determine the functional impacts of genetic variation associated with two related complex traits: gestational hyperglycemia and fetal adiposity. At the genomic locus associated with maternal hyperglycemia, we found that genetic variation in regulatory elements altered the expression of the HKDC1 gene. Furthermore, we demonstrated that HKDC1 phosphorylates glucose in vitro and in vivo, thus demonstrating that HKDC1 is a fifth human hexokinase gene. At the fetal-adiposity associated locus, we identified variants that likely alter VEPH1 expression in preadipocytes during differentiation. To make such studies of regulatory variation high-throughput and routine, we developed POP-STARR, a novel high throughput reporter assay that can empirically measure the effects of regulatory variants directly from patient DNA. By combining targeted genome capture technologies with STARR-seq, we assayed thousands of haplotypes from 760 individuals in a single experiment. We subsequently used POP-STARR to identify three key features of regulatory variants: that regulatory variants typically have weak effects on gene expression; that the effects of regulatory variants are often coordinated with respect to disease-risk, suggesting a general mechanism by which the weak effects can together have phenotypic impact; and that nucleotide transversions have larger impacts on enhancer activity than transitions. Together, the findings presented here demonstrate successful strategies for determining the regulatory mechanisms underlying genetic associations with human traits and diseases, and value of doing so for driving novel biological discovery.
Resumo:
Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.
Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.
Resumo:
Cancer comprises a collection of diseases, all of which begin with abnormal tissue growth from various stimuli, including (but not limited to): heredity, genetic mutation, exposure to harmful substances, radiation as well as poor dieting and lack of exercise. The early detection of cancer is vital to providing life-saving, therapeutic intervention. However, current methods for detection (e.g., tissue biopsy, endoscopy and medical imaging) often suffer from low patient compliance and an elevated risk of complications in elderly patients. As such, many are looking to “liquid biopsies” for clues into presence and status of cancer due to its minimal invasiveness and ability to provide rich information about the native tumor. In such liquid biopsies, peripheral blood is drawn from patients and is screened for key biomarkers, chiefly circulating tumor cells (CTCs). Capturing, enumerating and analyzing the genetic and metabolomic characteristics of these CTCs may hold the key for guiding doctors to better understand the source of cancer at an earlier stage for more efficacious disease management.
The isolation of CTCs from whole blood, however, remains a significant challenge due to their (i) low abundance, (ii) lack of a universal surface marker and (iii) epithelial-mesenchymal transition that down-regulates common surface markers (e.g., EpCAM), reducing their likelihood of detection via positive selection assays. These factors potentiate the need for an improved cell isolation strategy that can collect CTCs via both positive and negative selection modalities as to avoid the reliance on a single marker, or set of markers, for more accurate enumeration and diagnosis.
The technologies proposed herein offer a unique set of strategies to focus, sort and template cells in three independent microfluidic modules. The first module exploits ultrasonic standing waves and a class of elastomeric particles for the rapid and discriminate sequestration of cells. This type of cell handling holds promise not only in sorting, but also in the isolation of soluble markers from biofluids. The second module contains components to focus (i.e., arrange) cells via forces from acoustic standing waves and separate cells in a high throughput fashion via free-flow magnetophoresis. The third module uses a printed array of micromagnets to capture magnetically labeled cells into well-defined compartments, enabling on-chip staining and single cell analysis. These technologies can operate in standalone formats, or can be adapted to operate with established analytical technologies, such as flow cytometry. A key advantage of these innovations is their ability to process erythrocyte-lysed blood in a rapid (and thus high throughput) fashion. They can process fluids at a variety of concentrations and flow rates, target cells with various immunophenotypes and sort cells via positive (and potentially negative) selection. These technologies are chip-based, fabricated using standard clean room equipment, towards a disposable clinical tool. With further optimization in design and performance, these technologies might aid in the early detection, and potentially treatment, of cancer and various other physical ailments.
Resumo:
All organisms live in complex habitats that shape the course of their evolution by altering the phenotype expressed by a given genotype (a phenomenon known as phenotypic plasticity) and simultaneously by determining the evolutionary fitness of that phenotype. In some cases, phenotypic evolution may alter the environment experienced by future generations. This dissertation describes how genetic and environmental variation act synergistically to affect the evolution of glucosinolate defensive chemistry and flowering time in Boechera stricta, a wild perennial herb. I focus particularly on plant-associated microbes as a part of the plant’s environment that may alter trait evolution and in turn be affected by the evolution of those traits. In the first chapter I measure glucosinolate production and reproductive fitness of over 1,500 plants grown in common gardens in four diverse natural habitats, to describe how patterns of plasticity and natural selection intersect and may influence glucosinolate evolution. I detected extensive genetic variation for glucosinolate plasticity and determined that plasticity may aid colonization of new habitats by moving phenotypes in the same direction as natural selection. In the second chapter I conduct a greenhouse experiment to test whether naturally-occurring soil microbial communities contributed to the differences in phenotype and selection that I observed in the field experiment. I found that soil microbes cause plasticity of flowering time but not glucosinolate production, and that they may contribute to natural selection on both traits; thus, non-pathogenic plant-associated microbes are an environmental feature that could shape plant evolution. In the third chapter, I combine a multi-year, multi-habitat field experiment with high-throughput amplicon sequencing to determine whether B. stricta-associated microbial communities are shaped by plant genetic variation. I found that plant genotype predicts the diversity and composition of leaf-dwelling bacterial communities, but not root-associated bacterial communities. Furthermore, patterns of host genetic control over associated bacteria were largely site-dependent, indicating an important role for genotype-by-environment interactions in microbiome assembly. Together, my results suggest that soil microbes influence the evolution of plant functional traits and, because they are sensitive to plant genetic variation, this trait evolution may alter the microbial neighborhood of future B. stricta generations. Complex patterns of plasticity, selection, and symbiosis in natural habitats may impact the evolution of glucosinolate profiles in Boechera stricta.
Resumo:
Transcription factors (TFs) control the temporal and spatial expression of target genes by interacting with DNA in a sequence-specific manner. Recent advances in high throughput experiments that measure TF-DNA interactions in vitro and in vivo have facilitated the identification of DNA binding sites for thousands of TFs. However, it remains unclear how each individual TF achieves its specificity, especially in the case of paralogous TFs that recognize distinct target genomic sites despite sharing very similar DNA binding motifs. In my work, I used a combination of high throughput in vitro protein-DNA binding assays and machine-learning algorithms to characterize and model the binding specificity of 11 paralogous TFs from 4 distinct structural families. My work proves that even very closely related paralogous TFs, with indistinguishable DNA binding motifs, oftentimes exhibit differential binding specificity for their genomic target sites, especially for sites with moderate binding affinity. Importantly, the differences I identify in vitro and through computational modeling help explain, at least in part, the differential in vivo genomic targeting by paralogous TFs. Future work will focus on in vivo factors that might also be important for specificity differences between paralogous TFs, such as DNA methylation, interactions with protein cofactors, or the chromatin environment. In this larger context, my work emphasizes the importance of intrinsic DNA binding specificity in targeting of paralogous TFs to the genome.
Resumo:
The Arabidopsis root apical meristem (RAM) is a complex tissue capable of generating all the cell types that ultimately make up the root. The work presented in this thesis takes advantage of the versatility of high-throughput sequencing to address two independent questions about the root meristem. Although a lot of information is known regarding the cell fate decisions that occur at the RAM, cortex specification and differentiation remain poorly understood. In the first part of this thesis, I used an ethylmethanesulfonate (EMS) mutagenized marker line to perform a forward genetics screen. The goal of this screen was to identify novel genes involved in the specification and differentiation of the cortex tissue. Mapping analysis from the results obtained in this screen revealed a new allele of BRASSINOSTEROID4 with abnormal marker expression in the cortex tissue. Although this allele proved to be non-cortex specific, this project highlights new technology that allows mapping of EMS-generated mutations without the need to map-cross or back-cross. In the second part of this thesis, using fluorescence activated cell sorting (FACS) coupled with high throughput sequencing, my collaborators and I generated single-base resolution whole genome DNA methylomes, mRNA transcriptomes, and smallRNA transcriptomes for six different populations of cell types in the Arabidopsis root meristem. We were able to discover that the columella is hypermethylated in the CHH context within transposable elements. This hypermethylation is accompanied by upregulation of the RNA-dependent DNA methylation pathway (RdDM), including higher levels of 24-nt silencing RNAs (siRNAs). In summary, our studies demonstrate the versatility of high-throughput sequencing as a method for identifying single mutations or to perform complex comparative genomic analyses.
Resumo:
Immunity is broadly defined as a mechanism of protection against non-self entities, a process which must be sufficiently robust to both eliminate the initial foreign body and then be maintained over the life of the host. Life-long immunity is impossible without the development of immunological memory, of which a central component is the cellular immune system, or T cells. Cellular immunity hinges upon a naïve T cell pool of sufficient size and breadth to enable Darwinian selection of clones responsive to foreign antigens during an initial encounter. Further, the generation and maintenance of memory T cells is required for rapid clearance responses against repeated insult, and so this small memory pool must be actively maintained by pro-survival cytokine signals over the life of the host.
T cell development, function, and maintenance are regulated on a number of molecular levels through complex regulatory networks. Recently, small non-coding RNAs, miRNAs, have been observed to have profound impacts on diverse aspects of T cell biology by impeding the translation of RNA transcripts to protein. While many miRNAs have been described that alter T cell development or functional differentiation, little is known regarding the role that miRNAs have in T cell maintenance in the periphery at homeostasis.
In Chapter 3 of this dissertation, tools to study miRNA biology and function were developed. First, to understand the effect that miRNA overexpression had on T cell responses, a novel overexpression system was developed to enhance the processing efficiency and ultimate expression of a given miRNA by placing it within an alternative miRNA backbone. Next, a conditional knockout mouse system was devised to specifically delete miR-191 in a cell population expressing recombinase. This strategy was expanded to permit the selective deletion of single miRNAs from within a cluster to discern the effects of specific miRNAs that were previously inaccessible in isolation. Last, to enable the identification of potentially therapeutically viable miRNA function and/or expression modulators, a high-throughput flow cytometry-based screening system utilizing miRNA activity reporters was tested and validated. Thus, several novel and useful tools were developed to assist in the studies described in Chapter 4 and in future miRNA studies.
In Chapter 4 of this dissertation, the role of miR-191 in T cell biology was evaluated. Using tools developed in Chapter 3, miR-191 was observed to be critical for T cell survival following activation-induced cell death, while proliferation was unaffected by alterations in miR-191 expression. Loss of miR-191 led to significant decreases in the numbers of CD4+ and CD8+ T cells in the periphery lymph nodes, but this loss had no impact on the homeostatic activation of either CD4+ or CD8+ cells. These peripheral changes were not caused by gross defects in thymic development, but rather impaired STAT5 phosphorylation downstream of pro-survival cytokine signals. miR-191 does not specifically inhibit STAT5, but rather directly targets the scaffolding protein, IRS1, which in turn alters cytokine-dependent signaling. The defect in peripheral T cell maintenance was exacerbated by the presence of a Bcl-2YFP transgene, which led to even greater peripheral T cell losses in addition to developmental defects. These studies collectively demonstrate that miR-191 controls peripheral T cell maintenance by modulating homeostatic cytokine signaling through the regulation of IRS1 expression and downstream STAT5 phosphorylation.
The studies described in this dissertation collectively demonstrate that miR-191 has a profound role in the maintenance of T cells at homeostasis in the periphery. Importantly, the manipulation of miR-191 altered immune homeostasis without leading to severe immunodeficiency or autoimmunity. As much data exists on the causative agents disrupting active immune responses and the formation of immunological memory, the basic processes underlying the continued maintenance of a functioning immune system must be fully characterized to facilitate the development of methods for promoting healthy immune function throughout the life of the individual. These findings also have powerful implications for the ability of patients with modest perturbations in T cell homeostasis to effectively fight disease and respond to vaccination and may provide valuable targets for therapeutic intervention.