33 resultados para Genome-specific Sequence
em Duke University
Resumo:
BACKGROUND: Mammalian genomes commonly harbor endogenous viral elements. Due to a lack of comparable genome-scale sequence data, far less is known about endogenous viral elements in avian species, even though their small genomes may enable important insights into the patterns and processes of endogenous viral element evolution. RESULTS: Through a systematic screening of the genomes of 48 species sampled across the avian phylogeny we reveal that birds harbor a limited number of endogenous viral elements compared to mammals, with only five viral families observed: Retroviridae, Hepadnaviridae, Bornaviridae, Circoviridae, and Parvoviridae. All nonretroviral endogenous viral elements are present at low copy numbers and in few species, with only endogenous hepadnaviruses widely distributed, although these have been purged in some cases. We also provide the first evidence for endogenous bornaviruses and circoviruses in avian genomes, although at very low copy numbers. A comparative analysis of vertebrate genomes revealed a simple linear relationship between endogenous viral element abundance and host genome size, such that the occurrence of endogenous viral elements in bird genomes is 6- to 13-fold less frequent than in mammals. CONCLUSIONS: These results reveal that avian genomes harbor relatively small numbers of endogenous viruses, particularly those derived from RNA viruses, and hence are either less susceptible to viral invasions or purge them more effectively.
Resumo:
To provide context for the diversification of archosaurs--the group that includes crocodilians, dinosaurs, and birds--we generated draft genomes of three crocodilians: Alligator mississippiensis (the American alligator), Crocodylus porosus (the saltwater crocodile), and Gavialis gangeticus (the Indian gharial). We observed an exceptionally slow rate of genome evolution within crocodilians at all levels, including nucleotide substitutions, indels, transposable element content and movement, gene family evolution, and chromosomal synteny. When placed within the context of related taxa including birds and turtles, this suggests that the common ancestor of all of these taxa also exhibited slow genome evolution and that the comparatively rapid evolution is derived in birds. The data also provided the opportunity to analyze heterozygosity in crocodilians, which indicates a likely reduction in population size for all three taxa through the Pleistocene. Finally, these data combined with newly published bird genomes allowed us to reconstruct the partial genome of the common ancestor of archosaurs, thereby providing a tool to investigate the genetic starting material of crocodilians, birds, and dinosaurs.
Resumo:
The advent of next-generation sequencing, now nearing a decade in age, has enabled, among other capabilities, measurement of genome-wide sequence features at unprecedented scale and resolution.
In this dissertation, I describe work to understand the genetic underpinnings of non-Hodgkin’s lymphoma through exploration of the epigenetics of its cell of origin, initial characterization and interpretation of driver mutations, and finally, a larger-scale, population-level study that incorporates mutation interpretation with clinical outcome.
In the first research chapter, I describe genomic characteristics of lymphomas through the lens of their cells of origin. Just as many other cancers, such as breast cancer or lung cancer, are categorized based on their cell of origin, lymphoma subtypes can be examined through the context of their normal B Cells of origin, Naïve, Germinal Center, and post-Germinal Center. By applying integrative analysis of the epigenetics of normal B Cells of origin through chromatin-immunoprecipitation sequencing, we find that differences in normal B Cell subtypes are reflected in the mutational landscapes of the cancers that arise from them, namely Mantle Cell, Burkitt, and Diffuse Large B-Cell Lymphoma.
In the next research chapter, I describe our first endeavor into understanding the genetic heterogeneity of Diffuse Large B Cell Lymphoma, the most common form of non-Hodgkin’s lymphoma, which affects 100,000 patients in the world. Through whole-genome sequencing of 1 case as well as whole-exome sequencing of 94 cases, we characterize the most recurrent genetic features of DLBCL and lay the groundwork for a larger study.
In the last research chapter, I describe work to characterize and interpret the whole exomes of 1001 cases of DLBCL in the largest single-cancer study to date. This highly-powered study enabled sub-gene, gene-level, and gene-network level understanding of driver mutations within DLBCL. Moreover, matched genomic and clinical data enabled the connection of these driver mutations to clinical features such as treatment response or overall survival. As sequencing costs continue to drop, whole-exome sequencing will become a routine clinical assay, and another diagnostic dimension in addition to existing methods such as histology. However, to unlock the full utility of sequencing data, we must be able to interpret it. This study undertakes a first step in developing the understanding necessary to uncover the genomic signals of DLBCL hidden within its exomes. However, beyond the scope of this one disease, the experimental and analytical methods can be readily applied to other cancer sequencing studies.
Thus, this dissertation leverages next-generation sequencing analysis to understand the genetic underpinnings of lymphoma, both by examining its normal cells of origin as well as through a large-scale study to sensitively identify recurrently mutated genes and their relationship to clinical outcome.
Resumo:
The use of DNA as a polymeric building material transcends its function in biology and is exciting in bionanotechnology for applications ranging from biosensing, to diagnostics, and to targeted drug delivery. These applications are enabled by DNA’s unique structural and chemical properties, embodied as a directional polyanion that exhibits molecular recognition capabilities. Hence, the efficient and precise synthesis of high molecular weight DNA materials has become key to advance DNA bionanotechnology. Current synthesis methods largely rely on either solid phase chemical synthesis or template-dependent polymerase amplification. The inherent step-by-step fashion of solid phase synthesis limits the length of the resulting DNA to typically less than 150 nucleotides. In contrast, polymerase based enzymatic synthesis methods (e.g., polymerase chain reaction) are not limited by product length, but require a DNA template to guide the synthesis. Furthermore, advanced DNA bionanotechnology requires tailorable structural and self-assembly properties. Current synthesis methods, however, often involve multiple conjugating reactions and extensive purification steps.
The research described in this dissertation aims to develop a facile method to synthesize high molecular weight, single stranded DNA (or polynucleotide) with versatile functionalities. We exploit the ability of a template-independent DNA polymerase−terminal deoxynucleotidyl transferase (TdT) to catalyze the polymerization of 2’-deoxyribonucleoside 5’-triphosphates (dNTP, monomer) from the 3’-hydroxyl group of an oligodeoxyribonucleotide (initiator). We termed this enzymatic synthesis method: TdT catalyzed enzymatic polymerization, or TcEP.
Specifically, this dissertation is structured to address three specific research aims. With the objective to generate high molecular weight polynucleotides, Specific Aim 1 studies the reaction kinetics of TcEP by investigating the polymerization of 2’-deoxythymidine 5’-triphosphates (monomer) from the 3’-hydroxyl group of oligodeoxyribothymidine (initiator) using in situ 1H NMR and fluorescent gel electrophoresis. We found that TcEP kinetics follows the “living” chain-growth polycondensation mechanism, and like in “living” polymerizations, the molecular weight of the final product is determined by the starting molar ratio of monomer to initiator. The distribution of the molecular weight is crucially influenced by the molar ratio of initiator to TdT. We developed a reaction kinetics model that allows us to quantitatively describe the reaction and predict the molecular weight of the reaction products.
Specific Aim 2 further explores TcEP’s ability to transcend homo-polynucleotide synthesis by varying the choices of initiators and monomers. We investigated the effects of initiator length and sequence on TcEP, and found that the minimum length of an effective initiator should be 10 nucleotides and that the formation of secondary structures close to the 3’-hydroxyl group can impede the polymerization reaction. We also demonstrated TcEP’s capacity to incorporate a wide range of unnatural dNTPs into the growing chain, such as, hydrophobic fluorescent dNTP and fluoro modified dNTP. By harnessing the encoded nucleotide sequence of an initiator and the chemical diversity of monomers, TcEP enables us to introduce molecular recognition capabilities and chemical functionalities on the 5’-terminus and 3’-terminus, respectively.
Building on TcEP’s synthesis capacities, in Specific Aim 3 we invented a two-step strategy to synthesize diblock amphiphilic polynucleotides, in which the first, hydrophilic block serves as a macro-initiator for the growth of the second block, comprised of natural and/or unnatural nucleotides. By tuning the hydrophilic length, we synthesized the amphiphilic diblock polynucleotides that can self-assemble into micellar structures ranging from star-like to crew-cut morphologies. The observed self-assembly behaviors agree with predictions from dissipative particle dynamics simulations as well as scaling law for polyelectrolyte block copolymers.
In summary, we developed an enzymatic synthesis method (i.e., TcEP) that enables the facile synthesis of high molecular weight polynucleotides with low polydispersity. Although we can control the nucleotide sequence only to a limited extent, TcEP offers a method to integrate an oligodeoxyribonucleotide with specific sequence at the 5’-terminus and to incorporate functional groups along the growing chains simultaneously. Additionally, we used TcEP to synthesize amphiphilic polynucleotides that display self-assemble ability. We anticipate that our facile synthesis method will not only advance molecular biology, but also invigorate materials science and bionanotechnology.
Resumo:
Eukaryotic genomes are mostly composed of noncoding DNA whose role is still poorly understood. Studies in several organisms have shown correlations between the length of the intergenic and genic sequences of a gene and the expression of its corresponding mRNA transcript. Some studies have found a positive relationship between intergenic sequence length and expression diversity between tissues, and concluded that genes under greater regulatory control require more regulatory information in their intergenic sequences. Other reports found a negative relationship between expression level and gene length and the interpretation was that there is selection pressure for highly expressed genes to remain small. However, a correlation between gene sequence length and expression diversity, opposite to that observed for intergenic sequences, has also been reported, and to date there is no testable explanation for this observation. To shed light on these varied and sometimes conflicting results, we performed a thorough study of the relationships between sequence length and gene expression using cell-type (tissue) specific microarray data in Arabidopsis thaliana. We measured median gene expression across tissues (expression level), expression variability between tissues (expression pattern uniformity), and expression variability between replicates (expression noise). We found that intergenic (upstream and downstream) and genic (coding and noncoding) sequences have generally opposite relationships with respect to expression, whether it is tissue variability, median, or expression noise. To explain these results we propose a model, in which the lengths of the intergenic and genic sequences have opposite effects on the ability of the transcribed region of the gene to be epigenetically regulated for differential expression. These findings could shed light on the role and influence of noncoding sequences on gene expression.
Resumo:
Cellular stresses activate the tumor suppressor p53 protein leading to selective binding to DNA response elements (REs) and gene transactivation from a large pool of potential p53 REs (p53REs). To elucidate how p53RE sequences and local chromatin context interact to affect p53 binding and gene transactivation, we mapped genome-wide binding localizations of p53 and H3K4me3 in untreated and doxorubicin (DXR)-treated human lymphoblastoid cells. We examined the relationships among p53 occupancy, gene expression, H3K4me3, chromatin accessibility (DNase 1 hypersensitivity, DHS), ENCODE chromatin states, p53RE sequence, and evolutionary conservation. We observed that the inducible expression of p53-regulated genes was associated with the steady-state chromatin status of the cell. Most highly inducible p53-regulated genes were suppressed at baseline and marked by repressive histone modifications or displayed CTCF binding. Comparison of p53RE sequences residing in different chromatin contexts demonstrated that weaker p53REs resided in open promoters, while stronger p53REs were located within enhancers and repressed chromatin. p53 occupancy was strongly correlated with similarity of the target DNA sequences to the p53RE consensus, but surprisingly, inversely correlated with pre-existing nucleosome accessibility (DHS) and evolutionary conservation at the p53RE. Occupancy by p53 of REs that overlapped transposable element (TE) repeats was significantly higher (p<10-7) and correlated with stronger p53RE sequences (p<10-110) relative to nonTE-associated p53REs, particularly for MLT1H, LTR10B, and Mer61 TEs. However, binding at these elements was generally not associated with transactivation of adjacent genes. Occupied p53REs located in L2-like TEs were unique in displaying highly negative PhyloP scores (predicted fast-evolving) and being associated with altered H3K4me3 and DHS levels. These results underscore the systematic interaction between chromatin status and p53RE context in the induced transactivation response. This p53 regulated response appears to have been tuned via evolutionary processes that may have led to repression and/or utilization of p53REs originating from primate-specific transposon elements.
Resumo:
DNaseI footprinting is an established assay for identifying transcription factor (TF)-DNA interactions with single base pair resolution. High-throughput DNase-seq assays have recently been used to detect in vivo DNase footprints across the genome. Multiple computational approaches have been developed to identify DNase-seq footprints as predictors of TF binding. However, recent studies have pointed to a substantial cleavage bias of DNase and its negative impact on predictive performance of footprinting. To assess the potential for using DNase-seq to identify individual binding sites, we performed DNase-seq on deproteinized genomic DNA and determined sequence cleavage bias. This allowed us to build bias corrected and TF-specific footprint models. The predictive performance of these models demonstrated that predicted footprints corresponded to high-confidence TF-DNA interactions. DNase-seq footprints were absent under a fraction of ChIP-seq peaks, which we show to be indicative of weaker binding, indirect TF-DNA interactions or possible ChIP artifacts. The modeling approach was also able to detect variation in the consensus motifs that TFs bind to. Finally, cell type specific footprints were detected within DNase hypersensitive sites that are present in multiple cell types, further supporting that footprints can identify changes in TF binding that are not detectable using other strategies.
Resumo:
Limited data are available regarding the molecular epidemiology of Mycobacterium tuberculosis (Mtb) strains circulating in Guatemala. Beijing-lineage Mtb strains have gained prevalence worldwide and are associated with increased virulence and drug resistance, but there have been only a few cases reported in Central America. Here we report the first whole genome sequencing of Central American Beijing-lineage strains of Mtb. We find that multiple Beijing-lineage strains, derived from independent founding events, are currently circulating in Guatemala, but overall still represent a relatively small proportion of disease burden. Finally, we identify a specific Beijing-lineage outbreak centered on a poor neighborhood in Guatemala City.
Resumo:
Transcription factors (TFs) control the temporal and spatial expression of target genes by interacting with DNA in a sequence-specific manner. Recent advances in high throughput experiments that measure TF-DNA interactions in vitro and in vivo have facilitated the identification of DNA binding sites for thousands of TFs. However, it remains unclear how each individual TF achieves its specificity, especially in the case of paralogous TFs that recognize distinct target genomic sites despite sharing very similar DNA binding motifs. In my work, I used a combination of high throughput in vitro protein-DNA binding assays and machine-learning algorithms to characterize and model the binding specificity of 11 paralogous TFs from 4 distinct structural families. My work proves that even very closely related paralogous TFs, with indistinguishable DNA binding motifs, oftentimes exhibit differential binding specificity for their genomic target sites, especially for sites with moderate binding affinity. Importantly, the differences I identify in vitro and through computational modeling help explain, at least in part, the differential in vivo genomic targeting by paralogous TFs. Future work will focus on in vivo factors that might also be important for specificity differences between paralogous TFs, such as DNA methylation, interactions with protein cofactors, or the chromatin environment. In this larger context, my work emphasizes the importance of intrinsic DNA binding specificity in targeting of paralogous TFs to the genome.
Resumo:
Somatostatin receptor 2 (SSTR2) is expressed by most medulloblastomas (MEDs). We isolated monoclonal antibodies (MAbs) to the 12-mer (33)QTEPYYDLTSNA(44), which resides in the extracellular domain of the SSTR2 amino terminus, screened the peptide-bound MAbs by fluorescence microassay on D341 and D283 MED cells, and demonstrated homogeneous cell-surface binding, indicating that all cells expressed cell surface-detectable epitopes. Five radiolabeled MAbs were tested for immunoreactive fraction (IRF), affinity (KA) (Scatchard analysis vs. D341 MED cells), and internalization by MED cells. One IgG(3) MAb exhibited a 50-100% IRF, but low KA. Four IgG(2a) MAbs had 46-94% IRFs and modest KAs versus intact cells (0.21-1.2 x 10(8) M(-1)). Following binding of radiolabeled MAbs to D341 MED at 4 degrees C, no significant internalization was observed, which is consistent with results obtained in the absence of ligand. However, all MAbs exhibited long-term association with the cells; binding at 37 degrees C after 2 h was 65-66%, and after 24 h, 52-64%. In tests with MAbs C10 and H5, the number of cell surface receptors per cell, estimated by Scatchard and quantitative FACS analyses, was 3.9 x 10(4) for the "glial" phenotype DAOY MED cell line and 0.6-8.8 x 10(5) for four neuronal phenotype MED cell lines. Our results indicate a potential immunotherapeutic application for these MAbs.
Resumo:
The BUZ/Znf-UBP domain is a protein module found in the cytoplasmic deacetylase HDAC6, E3 ubiquitin ligase BRAP2/IMP, and a subfamily of ubiquitin-specific proteases. Although several BUZ domains have been shown to bind ubiquitin with high affinity by recognizing its C-terminal sequence (RLRGG-COOH), it is currently unknown whether the interaction is sequence-specific or whether the BUZ domains are capable of binding to proteins other than ubiquitin. In this work, the BUZ domains of HDAC6 and Ubp-M were subjected to screening against a one-bead-one-compound (OBOC) peptide library that exhibited random peptide sequences with free C-termini. Sequence analysis of the selected binding peptides as well as alanine scanning studies revealed that the BUZ domains require a C-terminal Gly-Gly motif for binding. At the more N-terminal positions, the two BUZ domains have distinct sequence specificities, allowing them to bind to different peptides and/or proteins. A database search of the human proteome on the basis of the BUZ domain specificities identified 11 and 24 potential partner proteins for Ubp-M and HDAC6 BUZ domains, respectively. Peptides corresponding to the C-terminal sequences of four of the predicted binding partners (FBXO11, histone H4, PTOV1, and FAT10) were synthesized and tested for binding to the BUZ domains by fluorescence polarization. All four peptides bound to the HDAC6 BUZ domain with low micromolar K(D) values and less tightly to the Ubp-M BUZ domain. Finally, in vitro pull-down assays showed that the Ubp-M BUZ domain was capable of binding to the histone H3-histone H4 tetramer protein complex. Our results suggest that BUZ domains are sequence-specific protein-binding modules, with each BUZ domain potentially binding to a different subset of proteins.
Resumo:
We have used analytical ultracentrifugation to characterize the binding of the methionine repressor protein, MetJ, to synthetic oligonucleotides containing zero to five specific recognition sites, called metboxes. For all lengths of DNA studied, MetJ binds more tightly to repeats of the consensus sequence than to naturally occurring metboxes, which exhibit a variable number of deviations from the consensus. Strong cooperative binding occurs only in the presence of two or more tandem metboxes, which facilitate protein-protein contacts between adjacent MetJ dimers, but weak affinity is detected even with DNA containing zero or one metbox. The affinity of MetJ for all of the DNA sequences studied is enhanced by the addition of SAM, the known cofactor for MetJ in the cell. This effect extends to oligos containing zero or one metbox, both of which bind two MetJ dimers. In the presence of a large excess concentration of metbox DNA, the effect of cooperativity is to favor populations of DNA oligos bound by two or more MetJ dimers rather than a stochastic redistribution of the repressor onto all available metboxes. These results illustrate the dynamic range of binding affinity and repressor assembly that MetJ can exhibit with DNA and the effect of the corepressor SAM on binding to both specific and nonspecific DNA.
Resumo:
"Push-pull" chromophores based on extended pi-electron systems have been designed to exhibit exceptionally large molecular hyperpolarizabilities. We have engineered an amphiphilic four-helix bundle peptide to vectorially incorporate such hyperpolarizable chromophores having a metalloporphyrin moiety, with high specificity into the interior core of the bundle. The amphiphilic exterior of the bundle facilitates the formation of densely packed monolayer ensembles of the vectorially oriented peptide-chromophore complexes at the liquid-gas interface. Chemical specificity designed into the ends of the bundle facilitates the subsequent covalent attachment of these monolayer ensembles onto the surface of an inorganic substrate. In this article, we describe the structural characterization of these monolayer ensembles at each stage of their fabrication for one such peptide-chromophore complex designated as AP0-RuPZn. In the accompanying article, we describe the characterization of their macroscopic nonlinear optical properties.
Resumo:
BACKGROUND: The rate of emergence of human pathogens is steadily increasing; most of these novel agents originate in wildlife. Bats, remarkably, are the natural reservoirs of many of the most pathogenic viruses in humans. There are two bat genome projects currently underway, a circumstance that promises to speed the discovery host factors important in the coevolution of bats with their viruses. These genomes, however, are not yet assembled and one of them will provide only low coverage, making the inference of most genes of immunological interest error-prone. Many more wildlife genome projects are underway and intend to provide only shallow coverage. RESULTS: We have developed a statistical method for the assembly of gene families from partial genomes. The method takes full advantage of the quality scores generated by base-calling software, incorporating them into a complete probabilistic error model, to overcome the limitation inherent in the inference of gene family members from partial sequence information. We validated the method by inferring the human IFNA genes from the genome trace archives, and used it to infer 61 type-I interferon genes, and single type-II interferon genes in the bats Pteropus vampyrus and Myotis lucifugus. We confirmed our inferences by direct cloning and sequencing of IFNA, IFNB, IFND, and IFNK in P. vampyrus, and by demonstrating transcription of some of the inferred genes by known interferon-inducing stimuli. CONCLUSION: The statistical trace assembler described here provides a reliable method for extracting information from the many available and forthcoming partial or shallow genome sequencing projects, thereby facilitating the study of a wider variety of organisms with ecological and biomedical significance to humans than would otherwise be possible.
Resumo:
BACKGROUND: There is considerable interest in the development of methods to efficiently identify all coding variants present in large sample sets of humans. There are three approaches possible: whole-genome sequencing, whole-exome sequencing using exon capture methods, and RNA-Seq. While whole-genome sequencing is the most complete, it remains sufficiently expensive that cost effective alternatives are important. RESULTS: Here we provide a systematic exploration of how well RNA-Seq can identify human coding variants by comparing variants identified through high coverage whole-genome sequencing to those identified by high coverage RNA-Seq in the same individual. This comparison allowed us to directly evaluate the sensitivity and specificity of RNA-Seq in identifying coding variants, and to evaluate how key parameters such as the degree of coverage and the expression levels of genes interact to influence performance. We find that although only 40% of exonic variants identified by whole genome sequencing were captured using RNA-Seq; this number rose to 81% when concentrating on genes known to be well-expressed in the source tissue. We also find that a high false positive rate can be problematic when working with RNA-Seq data, especially at higher levels of coverage. CONCLUSIONS: We conclude that as long as a tissue relevant to the trait under study is available and suitable quality control screens are implemented, RNA-Seq is a fast and inexpensive alternative approach for finding coding variants in genes with sufficiently high expression levels.