885 resultados para Deep sequencing
Resumo:
A major bottleneck in protein structure prediction is the selection of correct models from a pool of decoys. Relative activities of similar to 1,200 individual single-site mutants in a saturation library of the bacterial toxin CcdB were estimated by determining their relative populations using deep sequencing. This phenotypic information was used to define an empirical score for each residue (Rank Score), which correlated with the residue depth, and identify active-site residues. Using these correlations, similar to 98% of correct models of CcdB (RMSD <= 4 angstrom) were identified from a large set of decoys. The model-discrimination methodology was further validated on eleven different monomeric proteins using simulated RankScore values. The methodology is also a rapid, accurate way to obtain relative activities of each mutant in a large pool and derive sequence-structure-function relationships without protein isolation or characterization. It can be applied to any system in which mutational effects can be monitored by a phenotypic readout.
Resumo:
Multiple methods currently exist for rapid construction and screening of single-site saturation mutagenesis (SSM) libraries in which every codon or nucleotide in a DNA fragment is individually randomized. Nucleotide sequences of each library member before and after screening or selection can be obtained through deep sequencing. The relative enrichment of each mutant at each position provides information on its contribution to protein activity or ligand-binding under the conditions of the screen. Such saturation scans have been applied to diverse proteins to delineate hot-spot residues, stability determinants, and for comprehensive fitness estimates. The data have been used to design proteins with enhanced stability, activity and altered specificity relative to wild-type, to test computational predictions of binding affinity, and for protein model discrimination. Future improvements in deep sequencing read lengths and accuracy should allow comprehensive studies of epistatic effects, of combinational variation at multiple sites, and identification of spatially proximate residues.
Resumo:
We used ultra-deep sequencing to obtain tens of thousands of HIV-1 sequences from regions targeted by CD8+ T lymphocytes from longitudinal samples from three acutely infected subjects, and modeled viral evolution during the critical first weeks of infection. Previous studies suggested that a single virus established productive infection, but these conclusions were tempered because of limited sampling; now, we have greatly increased our confidence in this observation through modeling the observed earliest sample diversity based on vastly more extensive sampling. Conventional sequencing of HIV-1 from acute/early infection has shown different patterns of escape at different epitopes; we investigated the earliest escapes in exquisite detail. Over 3-6 weeks, ultradeep sequencing revealed that the virus explored an extraordinary array of potential escape routes in the process of evading the earliest CD8 T-lymphocyte responses--using 454 sequencing, we identified over 50 variant forms of each targeted epitope during early immune escape, while only 2-7 variants were detected in the same samples via conventional sequencing. In contrast to the diversity seen within epitopes, non-epitope regions, including the Envelope V3 region, which was sequenced as a control in each subject, displayed very low levels of variation. In early infection, in the regions sequenced, the consensus forms did not have a fitness advantage large enough to trigger reversion to consensus amino acids in the absence of immune pressure. In one subject, a genetic bottleneck was observed, with extensive diversity at the second time point narrowing to two dominant escape forms by the third time point, all within two months of infection. Traces of immune escape were observed in the earliest samples, suggesting that immune pressure is present and effective earlier than previously reported; quantifying the loss rate of the founder virus suggests a direct role for CD8 T-lymphocyte responses in viral containment after peak viremia. Dramatic shifts in the frequencies of epitope variants during the first weeks of infection revealed a complex interplay between viral fitness and immune escape.
Resumo:
The retinal vascular endothelium is essential for angiogenesis and is involved in maintaining barrier selectivity and vascular tone. The aim of this study was to identify and quantify microRNAs and other small regulatory non-coding RNAs (ncRNAs) which may regulate these crucial functions. Primary bovine retinal microvascular endothelial cells (RMECs) provide a well-characterized in vitro system for studying angiogenesis. RNA extracted from RMECs was used to prepare a small RNA library for deep sequencing (Illumina Genome Analyzer). A total of 6.8 million reads were mapped to 250 known microRNAs in miRBase (release 16). In many cases, the most frequent isomiR differed from the sequence reported in miRBase. In addition, five novel microRNAs, 13 novel bovine orthologs of known human microRNAs and multiple new members of the miR-2284/2285 family were detected. Several similar to 30 nucleotide sno-miRNAs were identified, with the most highly expressed being derived from snoRNA U78. Highly expressed microRNAs previously associated with endothelial cells included miR-126 and miR-378, but the most highly expressed was miR-21, comprising more than one-third of all mapped reads. Inhibition of miR-21 with an LNA inhibitor significantly reduced proliferation, migration, and tube-forming capacity of RMECs. The independence from prior sequence knowledge provided by deep sequencing facilitates analysis of novel microRNAs and other small RNAs. This approach also enables quantitative evaluation of microRNA expression, which has highlighted the predominance of a small number of microRNAs in RMECs. Knockdown of miR-21 suggests a role for this microRNA in regulation of angiogenesis in the retinal microvasculature. J. Cell. Biochem. 113: 20982111, 2012. (C) 2012 Wiley Periodicals, Inc.
Resumo:
Introduction: Amplicon deep-sequencing using second-generation sequencing technology is an innovative molecular diagnostic technique and enables a highly-sensitive detection of mutations. As an international consortium we had investigated previously the robustness, precision, and reproducibility of 454 amplicon next-generation sequencing (NGS) across 10 laboratories from 8 countries (Leukemia, 2011;25:1840-8).
Aims: In Phase II of the study, we established distinct working groups for various hematological malignancies, i.e. acute myeloid leukemia (AML), acute lymphoblastic leukemia (ALL), chronic lymphocytic leukemia (CLL), chronic myelogenous leukemia (CML), myelodysplastic syndromes (MDS), myeloproliferative neoplasms (MPN), and multiple myeloma. Currently, 27 laboratories from 13 countries are part of this research consortium. In total, 74 gene targets were selected by the working groups and amplicons were developed for a NGS deep-sequencing assay (454 Life Sciences, Branford, CT). A data analysis pipeline was developed to standardize mutation interpretation both for accessing raw data (Roche Amplicon Variant Analyzer, 454 Life Sciences) and variant interpretation (Sequence Pilot, JSI Medical Systems, Kippenheim, Germany).
Results: We will report on the design, standardization, quality control aspects, landscape of mutations, as well as the prognostic and predictive utility of this assay in a cohort of 8,867 cases. Overall, 1,146 primer sequences were designed and tested. In detail, for example in AML, 924 cases had been screened for CEBPA mutations. RUNX1 mutations were analyzed in 1,888 cases applying the deep-sequencing read counts to study the stability of such mutations at relapse and their utility as a biomarker to detect residual disease. Analyses of DNMT3A (n=1,041) were focused to perform landscape investigations and to address the prognostic relevance. Additionally, this working group is focusing on TET2, ASXL1, and TP53 analyses. A novel prognostic model is being developed allowing stratification of AML into prognostic subgroups based on molecular markers only. In ALL, 1,124 pediatric and adult cases have been screened, including 763 assays for TP53 mutations both at diagnosis and relapse of ALL. Pediatric and adult leukemia expert labs developed additional content to study the mutation incidence of other B and T lineage markers such as IKZF1, JAK2, IL7R, PAX5, EP300, LEF1, CRLF2, PHF6, WT1, JAK1, PTEN, AKT1, IL7R, NOTCH1, CREBBP, or FBXW7. Further, the molecular landscape of CLL is changing rapidly. As such, a separate working group focused on analyses including NOTCH1, SF3B1, MYD88, XPO1, FBXW7 and BIRC3. Currently, 922 cases were screened to investigate the range of mutational burden of NOTCH1 mutations for their prognostic relevance. In MDS, RUNX1 mutation analyses were performed in 977 cases. The prognostic relevance of TP53 mutations in MDS was assessed in additional 327 cases, including isolated deletions of chromosome 5q. Next, content was developed targeting genes of the cellular splicing component, e.g. SF3B1, SRSF2, U2AF1, and ZRSR2. In BCR-ABL1-negative MPN, nine genes of interest (JAK2, MPL, TET2, CBL, KRAS, EZH2, IDH1, IDH2, ASXL1) have been analyzed in a cohort of 155 primary myelofibrosis cases searching for novel somatic mutations and addressing their relevance for disease progression and leukemia transformation. Moreover, an assay was developed and applied to CMML cases allowing the simultaneous analysis of 25 leukemia-associated target genes in a single sequencing run using just 20 ng of starting DNA. Finally, nine laboratories are studying CML, applying ultra-deep sequencing of the BCR-ABL1 tyrosine kinase domain. Analyses were performed on 615 cases investigating the dynamics of expansion of mutated clones under various tyrosine kinase inhibitor therapies.
Conclusion: Molecular characterization of hematological malignancies today requires high diagnostic sensitivity and specificity. As part of the IRON-II study, a network of laboratories analyzed a variety of disease entities applying amplicon-based NGS assays. Importantly, the consortium not only standardized assay design for disease-specific panels, but also achieved consensus on a common data analysis pipeline for mutation interpretation. Distinct working groups have been forged to address scientific tasks and in total 8,867 cases had been analyzed thus far.
Resumo:
The current epidemic of Hepatitis C infection in HIV-positive men who have sex with men is associated with increasing use of recreational drugs. Multiple HCV infections have been reported in haemophiliacs and intravenous drug users. Using ultra-deep sequencing analysis, we present the case of an HIV-positive MSM with evidence of three sequential HCV infections, each occurring during the acute phase of the preceding infection, following risk exposures. We observed rapid replacement of the original strain by the incoming genotype at subsequent time points. The impact of HCV super-infection remains unclear and UDS may provide new insights.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
The cytidine deaminase AID hypermutates immunoglobulin genes but can also target oncogenes, leading to tumorigenesis. The extent of AID's promiscuity and its predilection for immunoglobulin genes are unknown. We report here that AID interacted broadly with promoter-proximal sequences associated with stalled polymerases and chromatin-activating marks. In contrast, genomic occupancy of replication protein A (RPA), an AID cofactor, was restricted to immunoglobulin genes. The recruitment of RPA to the immunoglobulin loci was facilitated by phosphorylation of AID at Ser38 and Thr140. We propose that stalled polymerases recruit AID, thereby resulting in low frequencies of hypermutation across the B cell genome. Efficient hypermutation and switch recombination required AID phosphorylation and correlated with recruitment of RPA. Our findings provide a rationale for the oncogenic role of AID in B cell malignancy.
Resumo:
Purpose: Mounting evidence supports the clinical significance of gene mutations and immunogenetic features in common mature B-cell malignancies.
Experimental Design: We undertook a detailed characterization of the genetic background of splenic marginal zone lymphoma (SMZL), using targeted resequencing and explored potential clinical implications in a multinational cohort of 175 patients with SMZL.
Results: We identified recurrent mutations in TP53 (16%), KLF2 (12%), NOTCH2 (10%), TNFAIP3 (7%), MLL2 (11%), MYD88 (7%), and ARID1A (6%), all genes known to be targeted by somatic mutation in SMZL. KLF2 mutations were early, clonal events, enriched in patients with del(7q) and IGHV1-2*04 B-cell receptor immunoglobulins, and were associated with a short median time to first treatment (0.12 vs. 1.11 years; P = 0.01). In multivariate analysis, mutations in NOTCH2 [HR, 2.12; 95% confidence interval (CI), 1.02–4.4; P = 0.044] and 100% germline IGHV gene identity (HR, 2.19; 95% CI, 1.05–4.55; P = 0.036) were independent markers of short time to first treatment, whereas TP53 mutations were an independent marker of short overall survival (HR, 2.36; 95 % CI, 1.08–5.2; P = 0.03).
Conclusions: We identify key associations between gene mutations and clinical outcome, demonstrating for the first time that NOTCH2 and TP53 gene mutations are independent markers of reduced treatment-free and overall survival, respectively.
Dinoflagellate Genomic Organization and Phylogenetic Marker Discovery Utilizing Deep Sequencing Data
Resumo:
Dinoflagellates possess large genomes in which most genes are present in many copies. This has made studies of their genomic organization and phylogenetics challenging. Recent advances in sequencing technology have made deep sequencing of dinoflagellate transcriptomes feasible. This dissertation investigates the genomic organization of dinoflagellates to better understand the challenges of assembling dinoflagellate transcriptomic and genomic data from short read sequencing methods, and develops new techniques that utilize deep sequencing data to identify orthologous genes across a diverse set of taxa. To better understand the genomic organization of dinoflagellates, a genomic cosmid clone of the tandemly repeated gene Alchohol Dehydrogenase (AHD) was sequenced and analyzed. The organization of this clone was found to be counter to prevailing hypotheses of genomic organization in dinoflagellates. Further, a new non-canonical splicing motif was described that could greatly improve the automated modeling and annotation of genomic data. A custom phylogenetic marker discovery pipeline, incorporating methods that leverage the statistical power of large data sets was written. A case study on Stramenopiles was undertaken to test the utility in resolving relationships between known groups as well as the phylogenetic affinity of seven unknown taxa. The pipeline generated a set of 373 genes useful as phylogenetic markers that successfully resolved relationships among the major groups of Stramenopiles, and placed all unknown taxa on the tree with strong bootstrap support. This pipeline was then used to discover 668 genes useful as phylogenetic markers in dinoflagellates. Phylogenetic analysis of 58 dinoflagellates, using this set of markers, produced a phylogeny with good support of all branches. The Suessiales were found to be sister to the Peridinales. The Prorocentrales formed a monophyletic group with the Dinophysiales that was sister to the Gonyaulacales. The Gymnodinales was found to be paraphyletic, forming three monophyletic groups. While this pipeline was used to find phylogenetic markers, it will likely also be useful for finding orthologs of interest for other purposes, for the discovery of horizontally transferred genes, and for the separation of sequences in metagenomic data sets.
Resumo:
miRDeep and its varieties are widely used to quantify known and novel micro RNA (miRNA) from small RNA sequencing (RNAseq). This article describes miRDeep*, our integrated miRNA identification tool, which is modeled off miRDeep, but the precision of detecting novel miRNAs is improved by introducing new strategies to identify precursor miRNAs. miRDeep* has a user-friendly graphic interface and accepts raw data in FastQ and Sequence Alignment Map (SAM) or the binary equivalent (BAM) format. Known and novel miRNA expression levels, as measured by the number of reads, are displayed in an interface, which shows each RNAseq read relative to the pre-miRNA hairpin. The secondary pre-miRNA structure and read locations for each predicted miRNA are shown and kept in a separate figure file. Moreover, the target genes of known and novel miRNAs are predicted using the TargetScan algorithm, and the targets are ranked according to the confidence score. miRDeep* is an integrated standalone application where sequence alignment, pre-miRNA secondary structure calculation and graphical display are purely Java coded. This application tool can be executed using a normal personal computer with 1.5 GB of memory. Further, we show that miRDeep* outperformed existing miRNA prediction tools using our LNCaP and other small RNAseq datasets. miRDeep* is freely available online at http://www.australianprostatecentre.org/research/software/mirdeep-star
Resumo:
Forward genetic screens have identified numerous genes involved in development and metabolism, and remain a cornerstone of biological research. However, to locate a causal mutation, the practice of crossing to a polymorphic background to generate a mapping population can be problematic if the mutant phenotype is difficult to recognize in the hybrid F2 progeny, or dependent on parental specific traits. Here in a screen for leaf hyponasty mutants, we have performed a single backcross of an Ethane Methyl Sulphonate (EMS) generated hyponastic mutant to its parent. Whole genome deep sequencing of a bulked homozygous F2 population and analysis via the Next Generation EMS mutation mapping pipeline (NGM) unambiguously determined the causal mutation to be a single nucleotide polymorphisim (SNP) residing in HASTY, a previously characterized gene involved in microRNA biogenesis. We have evaluated the feasibility of this backcross approach using three additional SNP mapping pipelines; SHOREmap, the GATK pipeline, and the samtools pipeline. Although there was variance in the identification of EMS SNPs, all returned the same outcome in clearly identifying the causal mutation in HASTY. The simplicity of performing a single parental backcross and genome sequencing a small pool of segregating mutants has great promise for identifying mutations that may be difficult to map using conventional approaches.