7 resultados para nucleotide-sequence
em Duke University
Resumo:
BACKGROUND: There is considerable interest in the development of methods to efficiently identify all coding variants present in large sample sets of humans. There are three approaches possible: whole-genome sequencing, whole-exome sequencing using exon capture methods, and RNA-Seq. While whole-genome sequencing is the most complete, it remains sufficiently expensive that cost effective alternatives are important. RESULTS: Here we provide a systematic exploration of how well RNA-Seq can identify human coding variants by comparing variants identified through high coverage whole-genome sequencing to those identified by high coverage RNA-Seq in the same individual. This comparison allowed us to directly evaluate the sensitivity and specificity of RNA-Seq in identifying coding variants, and to evaluate how key parameters such as the degree of coverage and the expression levels of genes interact to influence performance. We find that although only 40% of exonic variants identified by whole genome sequencing were captured using RNA-Seq; this number rose to 81% when concentrating on genes known to be well-expressed in the source tissue. We also find that a high false positive rate can be problematic when working with RNA-Seq data, especially at higher levels of coverage. CONCLUSIONS: We conclude that as long as a tissue relevant to the trait under study is available and suitable quality control screens are implemented, RNA-Seq is a fast and inexpensive alternative approach for finding coding variants in genes with sufficiently high expression levels.
Resumo:
We present the analysis of twenty human genomes to evaluate the prospects for identifying rare functional variants that contribute to a phenotype of interest. We sequenced at high coverage ten "case" genomes from individuals with severe hemophilia A and ten "control" genomes. We summarize the number of genetic variants emerging from a study of this magnitude, and provide a proof of concept for the identification of rare and highly-penetrant functional variants by confirming that the cause of hemophilia A is easily recognizable in this data set. We also show that the number of novel single nucleotide variants (SNVs) discovered per genome seems to stabilize at about 144,000 new variants per genome, after the first 15 individuals have been sequenced. Finally, we find that, on average, each genome carries 165 homozygous protein-truncating or stop loss variants in genes representing a diverse set of pathways.
Resumo:
A large proportion of the variation in traits between individuals can be attributed to variation in the nucleotide sequence of the genome. The most commonly studied traits in human genetics are related to disease and disease susceptibility. Although scientists have identified genetic causes for over 4,000 monogenic diseases, the underlying mechanisms of many highly prevalent multifactorial inheritance disorders such as diabetes, obesity, and cardiovascular disease remain largely unknown. Identifying genetic mechanisms for complex traits has been challenging because most of the variants are located outside of protein-coding regions, and determining the effects of such non-coding variants remains difficult. In this dissertation, I evaluate the hypothesis that such non-coding variants contribute to human traits and diseases by altering the regulation of genes rather than the sequence of those genes. I will specifically focus on studies to determine the functional impacts of genetic variation associated with two related complex traits: gestational hyperglycemia and fetal adiposity. At the genomic locus associated with maternal hyperglycemia, we found that genetic variation in regulatory elements altered the expression of the HKDC1 gene. Furthermore, we demonstrated that HKDC1 phosphorylates glucose in vitro and in vivo, thus demonstrating that HKDC1 is a fifth human hexokinase gene. At the fetal-adiposity associated locus, we identified variants that likely alter VEPH1 expression in preadipocytes during differentiation. To make such studies of regulatory variation high-throughput and routine, we developed POP-STARR, a novel high throughput reporter assay that can empirically measure the effects of regulatory variants directly from patient DNA. By combining targeted genome capture technologies with STARR-seq, we assayed thousands of haplotypes from 760 individuals in a single experiment. We subsequently used POP-STARR to identify three key features of regulatory variants: that regulatory variants typically have weak effects on gene expression; that the effects of regulatory variants are often coordinated with respect to disease-risk, suggesting a general mechanism by which the weak effects can together have phenotypic impact; and that nucleotide transversions have larger impacts on enhancer activity than transitions. Together, the findings presented here demonstrate successful strategies for determining the regulatory mechanisms underlying genetic associations with human traits and diseases, and value of doing so for driving novel biological discovery.
Resumo:
The use of DNA as a polymeric building material transcends its function in biology and is exciting in bionanotechnology for applications ranging from biosensing, to diagnostics, and to targeted drug delivery. These applications are enabled by DNA’s unique structural and chemical properties, embodied as a directional polyanion that exhibits molecular recognition capabilities. Hence, the efficient and precise synthesis of high molecular weight DNA materials has become key to advance DNA bionanotechnology. Current synthesis methods largely rely on either solid phase chemical synthesis or template-dependent polymerase amplification. The inherent step-by-step fashion of solid phase synthesis limits the length of the resulting DNA to typically less than 150 nucleotides. In contrast, polymerase based enzymatic synthesis methods (e.g., polymerase chain reaction) are not limited by product length, but require a DNA template to guide the synthesis. Furthermore, advanced DNA bionanotechnology requires tailorable structural and self-assembly properties. Current synthesis methods, however, often involve multiple conjugating reactions and extensive purification steps.
The research described in this dissertation aims to develop a facile method to synthesize high molecular weight, single stranded DNA (or polynucleotide) with versatile functionalities. We exploit the ability of a template-independent DNA polymerase−terminal deoxynucleotidyl transferase (TdT) to catalyze the polymerization of 2’-deoxyribonucleoside 5’-triphosphates (dNTP, monomer) from the 3’-hydroxyl group of an oligodeoxyribonucleotide (initiator). We termed this enzymatic synthesis method: TdT catalyzed enzymatic polymerization, or TcEP.
Specifically, this dissertation is structured to address three specific research aims. With the objective to generate high molecular weight polynucleotides, Specific Aim 1 studies the reaction kinetics of TcEP by investigating the polymerization of 2’-deoxythymidine 5’-triphosphates (monomer) from the 3’-hydroxyl group of oligodeoxyribothymidine (initiator) using in situ 1H NMR and fluorescent gel electrophoresis. We found that TcEP kinetics follows the “living” chain-growth polycondensation mechanism, and like in “living” polymerizations, the molecular weight of the final product is determined by the starting molar ratio of monomer to initiator. The distribution of the molecular weight is crucially influenced by the molar ratio of initiator to TdT. We developed a reaction kinetics model that allows us to quantitatively describe the reaction and predict the molecular weight of the reaction products.
Specific Aim 2 further explores TcEP’s ability to transcend homo-polynucleotide synthesis by varying the choices of initiators and monomers. We investigated the effects of initiator length and sequence on TcEP, and found that the minimum length of an effective initiator should be 10 nucleotides and that the formation of secondary structures close to the 3’-hydroxyl group can impede the polymerization reaction. We also demonstrated TcEP’s capacity to incorporate a wide range of unnatural dNTPs into the growing chain, such as, hydrophobic fluorescent dNTP and fluoro modified dNTP. By harnessing the encoded nucleotide sequence of an initiator and the chemical diversity of monomers, TcEP enables us to introduce molecular recognition capabilities and chemical functionalities on the 5’-terminus and 3’-terminus, respectively.
Building on TcEP’s synthesis capacities, in Specific Aim 3 we invented a two-step strategy to synthesize diblock amphiphilic polynucleotides, in which the first, hydrophilic block serves as a macro-initiator for the growth of the second block, comprised of natural and/or unnatural nucleotides. By tuning the hydrophilic length, we synthesized the amphiphilic diblock polynucleotides that can self-assemble into micellar structures ranging from star-like to crew-cut morphologies. The observed self-assembly behaviors agree with predictions from dissipative particle dynamics simulations as well as scaling law for polyelectrolyte block copolymers.
In summary, we developed an enzymatic synthesis method (i.e., TcEP) that enables the facile synthesis of high molecular weight polynucleotides with low polydispersity. Although we can control the nucleotide sequence only to a limited extent, TcEP offers a method to integrate an oligodeoxyribonucleotide with specific sequence at the 5’-terminus and to incorporate functional groups along the growing chains simultaneously. Additionally, we used TcEP to synthesize amphiphilic polynucleotides that display self-assemble ability. We anticipate that our facile synthesis method will not only advance molecular biology, but also invigorate materials science and bionanotechnology.
Resumo:
Determination of copy number variants (CNVs) inferred in genome wide single nucleotide polymorphism arrays has shown increasing utility in genetic variant disease associations. Several CNV detection methods are available, but differences in CNV call thresholds and characteristics exist. We evaluated the relative performance of seven methods: circular binary segmentation, CNVFinder, cnvPartition, gain and loss of DNA, Nexus algorithms, PennCNV and QuantiSNP. Tested data included real and simulated Illumina HumHap 550 data from the Singapore cohort study of the risk factors for Myopia (SCORM) and simulated data from Affymetrix 6.0 and platform-independent distributions. The normalized singleton ratio (NSR) is proposed as a metric for parameter optimization before enacting full analysis. We used 10 SCORM samples for optimizing parameter settings for each method and then evaluated method performance at optimal parameters using 100 SCORM samples. The statistical power, false positive rates, and receiver operating characteristic (ROC) curve residuals were evaluated by simulation studies. Optimal parameters, as determined by NSR and ROC curve residuals, were consistent across datasets. QuantiSNP outperformed other methods based on ROC curve residuals over most datasets. Nexus Rank and SNPRank have low specificity and high power. Nexus Rank calls oversized CNVs. PennCNV detects one of the fewest numbers of CNVs.
Resumo:
DNaseI footprinting is an established assay for identifying transcription factor (TF)-DNA interactions with single base pair resolution. High-throughput DNase-seq assays have recently been used to detect in vivo DNase footprints across the genome. Multiple computational approaches have been developed to identify DNase-seq footprints as predictors of TF binding. However, recent studies have pointed to a substantial cleavage bias of DNase and its negative impact on predictive performance of footprinting. To assess the potential for using DNase-seq to identify individual binding sites, we performed DNase-seq on deproteinized genomic DNA and determined sequence cleavage bias. This allowed us to build bias corrected and TF-specific footprint models. The predictive performance of these models demonstrated that predicted footprints corresponded to high-confidence TF-DNA interactions. DNase-seq footprints were absent under a fraction of ChIP-seq peaks, which we show to be indicative of weaker binding, indirect TF-DNA interactions or possible ChIP artifacts. The modeling approach was also able to detect variation in the consensus motifs that TFs bind to. Finally, cell type specific footprints were detected within DNase hypersensitive sites that are present in multiple cell types, further supporting that footprints can identify changes in TF binding that are not detectable using other strategies.
Resumo:
A previous genome-wide association study (GWAS) of more than 100,000 individuals identified molecular-genetic predictors of educational attainment. We undertook in-depth life-course investigation of the polygenic score derived from this GWAS using the four-decade Dunedin Study (N = 918). There were five main findings. First, polygenic scores predicted adult economic outcomes even after accounting for educational attainments. Second, genes and environments were correlated: Children with higher polygenic scores were born into better-off homes. Third, children's polygenic scores predicted their adult outcomes even when analyses accounted for their social-class origins; social-mobility analysis showed that children with higher polygenic scores were more upwardly mobile than children with lower scores. Fourth, polygenic scores predicted behavior across the life course, from early acquisition of speech and reading skills through geographic mobility and mate choice and on to financial planning for retirement. Fifth, polygenic-score associations were mediated by psychological characteristics, including intelligence, self-control, and interpersonal skill. Effect sizes were small. Factors connecting DNA sequence with life outcomes may provide targets for interventions to promote population-wide positive development.