820 resultados para Next generation genome sequencing
Resumo:
Desmoid-type fibromatoses are locally aggressive and frequently recurrent tumours, and an accurate diagnosis is essential for patient management. The majority of sporadic lesions harbour beta-catenin (CTNNB1) mutations. We used next-generation sequencing to detect CTNNB1 mutations and to compare the sensitivity and specificity of next-generation sequencing with currently employed mutation detection techniques: mutation-specific restriction enzyme digestion and polymerase chain reaction amplification. DNA was extracted from formalin-fixed paraffin-embedded needle biopsy or resection tissue sections from 144 patients with sporadic desmoid-type fibromatoses, four patients with syndrome-related desmoid-type fibromatoses and 11 morphological mimics. Two primer pairs were designed for CTNNB1 mutation hotspots. Using ≥10 ng of DNA, libraries were generated by Fluidigm and sequenced on the Ion Torrent Personal Genome Machine. Next-generation sequencing had a sensitivity of 92.36 % (133/144, 95 % CIs: 86.74 to 96.12 %) and a specificity of 100 % for the detection of CTNNB1 mutations in desmoid-type fibromatoses-like spindle cell lesions. All mutations detected by mutation-specific restriction enzyme digestion were identified by next-generation sequencing. Next-generation sequencing identified additional mutations in 11 tumours that were not detected by mutation-specific restriction enzyme digestion, two of which have not been previously described. Next-generation sequencing is highly sensitive for the detection of CTNNB1 mutations. This multiplex assay has the advantage of detecting additional mutations compared to those detected by mutation-specific restriction enzyme digestion (sensitivity 82.41 %). The technology requires minimal DNA and is time- and cost-efficient.
Resumo:
Copy number variations (CNVs) affect a wide range of phenotypic traits; however, CNVs in or near segmental duplication regions are often intractable. Using a read depth approach based on next-generation sequencing, we examined genome-wide copy number differences among five taurine (three Angus, one Holstein, and one Hereford) and one indicine (Nelore) cattle. Within mapped chromosomal sequence, we identified 1265 CNV regions comprising similar to 55.6-Mbp sequence-476 of which (similar to 38%) have not previously been reported. We validated this sequence-based CNV call set with array comparative genomic hybridization (aCGH), quantitative PCR (qPCR), and fluorescent in situ hybridization (FISH), achieving a validation rate of 82% and a false positive rate of 8%. We further estimated absolute copy numbers for genomic segments and annotated genes in each individual. Surveys of the top 25 most variable genes revealed that the Nelore individual had the lowest copy numbers in 13 cases (similar to 52%, chi(2) test; P-value <0.05). In contrast, genes related to pathogen- and parasite-resistance, such as CATHL4 and ULBP17, were highly duplicated in the Nelore individual relative to the taurine cattle, while genes involved in lipid transport and metabolism, including APOL3 and FABP2, were highly duplicated in the beef breeds. These CNV regions also harbor genes like BPIFA2A (BSP30A) and WC1, suggesting that some CNVs may be associated with breed-specific differences in adaptation, health, and production traits. By providing the first individualized cattle CNV and segmental duplication maps and genome-wide gene copy number estimates, we enable future CNV studies into highly duplicated regions in the cattle genome.
Resumo:
Pediatric acute myeloid leukemia (AML) is a molecularly heterogeneous disease that arises from genetic alterations in pathways that regulate self-renewal and myeloid differentiation. While the majority of patients carry recurrent chromosomal translocations, almost 20% of childhood AML do not show any recognizable cytogenetic alteration and are defined as cytogenetically normal (CN)-AML. CN-AML patients have always showed a great variability in response to therapy and overall outcome, underlining the presence of unknown genetic changes, not detectable by conventional analyses, but relevant for pathogenesis, and outcome of AML. The development of novel genome-wide techniques such as next-generation sequencing, have tremendously improved our ability to interrogate the cancer genome. Based on this background, the aim of this research study was to investigate the mutational landscape of pediatric CN-AML patients negative for all the currently known somatic mutations reported in AML through whole-transcriptome sequencing (RNA-seq). RNA-seq performed on diagnostic leukemic blasts from 19 pediatric CN-AML cases revealed a considerable incidence of cryptic chromosomal rearrangements, with the identification of 21 putative fusion genes. Several of the fusion genes that were identified in this study are recurrent and might have a prognostic and/or therapeutic relevance. A paradigm of that is the CBFA2T3-GLIS2 fusion, which has been demonstrated to be a common alteration in pediatric CN-AML, predicting poor outcome. Important findings have been also obtained in the identification of novel therapeutic targets. On one side, the identification of NUP98-JARID1A fusion suggests the use of disulfiram; on the other, here we describe alteration-activating tyrosine kinases, providing functional data supporting the use of tyrosine kinase inhibitors to specifically inhibit leukemia cells. This study provides new insights in the knowledge of genetic alterations underlying pediatric AML, defines novel prognostic markers and putative therapeutic targets, and prospectively ensures a correct risk stratification and risk-adapted therapy also for the “all-neg” AML subgroup.
Resumo:
Next-generation sequencing (NGS) is a valuable tool for the detection and quantification of HIV-1 variants in vivo. However, these technologies require detailed characterization and control of artificially induced errors to be applicable for accurate haplotype reconstruction. To investigate the occurrence of substitutions, insertions, and deletions at the individual steps of RT-PCR and NGS, 454 pyrosequencing was performed on amplified and non-amplified HIV-1 genomes. Artificial recombination was explored by mixing five different HIV-1 clonal strains (5-virus-mix) and applying different RT-PCR conditions followed by 454 pyrosequencing. Error rates ranged from 0.04-0.66% and were similar in amplified and non-amplified samples. Discrepancies were observed between forward and reverse reads, indicating that most errors were introduced during the pyrosequencing step. Using the 5-virus-mix, non-optimized, standard RT-PCR conditions introduced artificial recombinants in a fraction of at least 30% of the reads that subsequently led to an underestimation of true haplotype frequencies. We minimized the fraction of recombinants down to 0.9-2.6% by optimized, artifact-reducing RT-PCR conditions. This approach enabled correct haplotype reconstruction and frequency estimations consistent with reference data obtained by single genome amplification. RT-PCR conditions are crucial for correct frequency estimation and analysis of haplotypes in heterogeneous virus populations. We developed an RT-PCR procedure to generate NGS data useful for reliable haplotype reconstruction and quantification.
Resumo:
Objective: In Southern European countries up to one-third of the patients with hereditary hemochromatosis (HH) do not present the common HFE risk genotype. In order to investigate the molecular basis of these cases we have designed a gene panel for rapid and simultaneous analysis of 6 HH-related genes (HFE, TFR2, HJV, HAMP, SLC40A1 and FTL) by next-generation sequencing (NGS). Materials and Methods: Eighty-eight iron overload Portuguese patients, negative for the common HFE mutations, were analysed. A TruSeq Custom Amplicon kit (TSCA, by Illumina) was designed in order to generate 97 amplicons covering exons, intron/exon junctions and UTRs of the mentioned genes with a cumulative target sequence of 12115bp. Amplicons were sequenced in the MiSeq instrument (IIlumina) using 250bp paired-end reads. Sequences were aligned against human genome reference hg19 using alignment and variant caller algorithms in the MiSeq reporter software. Novel variants were validated by Sanger sequencing and their pathogenic significance were assessed by in silico studies. Results: We found a total of 55 different genetic variants. These include novel pathogenic missense and splicing variants (in HFE and TFR2), a very rare variant in IRE of FTL, a variant that originates a novel translation initiation codon in the HAMP gene, among others. Conclusion: The merging of TSCA methodology and NGS technology appears to be an appropriate tool for simultaneous and fast analysis of HH-related genes in a large number of samples. However, establishing the clinical relevance of NGS-detected variants for HH development remains a hard-working task, requiring further functional studies.
Resumo:
Of the ~1.7 million SINE elements in the human genome, only a tiny number are estimated to be active in transcription by RNA polymerase (Pol) III. Tracing the individual loci from which SINE transcripts originate is complicated by their highly repetitive nature. By exploiting RNA-Seq datasets and unique SINE DNA sequences, we devised a bioinformatic pipeline allowing us to identify Pol III-dependent transcripts of individual SINE elements. When applied to ENCODE transcriptomes of seven human cell lines, this search strategy identified ~1300 Alu loci and ~1100 MIR loci corresponding to detectable transcripts, with ~120 and ~60 respectively Alu and MIR loci expressed in at least three cell lines. In vitro transcription of selected SINEs did not reflect their in vivo expression properties, and required the native 5’-flanking region in addition to internal promoter. We also identified a cluster of expressed AluYa5-derived transcription units, juxtaposed to snaR genes on chromosome 19, formed by a promoter-containing left monomer fused to an Alu-unrelated downstream moiety. Autonomous Pol III transcription was also revealed for SINEs nested within Pol II-transcribed genes raising the possibility of an underlying mechanism for Pol II gene regulation by SINE transcriptional units. Moreover the application of our bioinformatic pipeline to both RNA-seq data of cells subjected to an in vitro pro-oncogenic stimulus and of in vivo matched tumor and non-tumor samples allowed us to detect increased Alu RNA expression as well as the source loci of such deregulation. The ability to investigate SINE transcriptomes at single-locus resolution will facilitate both the identification of novel biologically relevant SINE RNAs and the assessment of SINE expression alteration under pathological conditions.
Resumo:
High throughput next generation sequencing, together with advanced molecular methods, has considerably enhanced the field of food microbiology. By overcoming biases associated with culture dependant approaches, it has become possible to achieve novel insights into the nature of food-borne microbial communities. In this thesis, several different sequencing-based approaches were applied with a view to better understanding microbe associated quality defects in cheese. Initially, a literature review provides an overview of microbe-associated cheese quality defects as well as molecular methods for profiling complex microbial communities. Following this, 16S rRNA sequencing revealed temporal and spatial differences in microbial composition due to the time during the production day that specific commercial cheeses were manufactured. A novel Ion PGM sequencing approach, focusing on decarboxylase genes rather than 16S rRNA genes, was then successfully employed to profile the biogenic amine producing cohort of a series of artisanal cheeses. Investigations into the phenomenon of cheese pinking formed the basis of a joint 16S rRNA and whole genome shotgun sequencing approach, leading to the identification of Thermus species and, more specifically, the pathway involved in production of lycopene, a red coloured carotenoid. Finally, using a more traditional approach, the effect of addition of a facultatively heterofermentative Lactobacillus (Lactobacillus casei) to a Swiss-type cheese, in which starter activity was compromised, was investigated from the perspective of its ability to promote gas defects and irregular eye formation. X-ray computed tomography was used to visualise, using a non-destructive method, the consequences of the undesirable gas formation that resulted. Ultimately this thesis has demonstrated that the application of molecular techniques, such as next generation sequencing, can provide a detailed insight into defect-causing microbial populations present and thereby may underpin approaches to optimise the quality and consistency of a wide variety of cheeses.
Resumo:
The non-standard decoding of the CUG codon in Candida cylindracea raises a number of questions about the evolutionary process of this organism and other species Candida clade for which the codon is ambiguous. In order to find some answers we studied the transcriptome of C. cylindracea, comparing its behavior with that of Saccharomyces cerevisiae (standard decoder) and Candida albicans (ambiguous decoder). The transcriptome characterization was performed using RNA-seq. This approach has several advantages over microarrays and its application is booming. TopHat and Cufflinks were the software used to build the protocol that allowed for gene quantification. About 95% of the reads were mapped on the genome. 3693 genes were analyzed, of which 1338 had a non-standard start codon (TTG/CTG) and the percentage of expressed genes was 99.4%. Most genes have intermediate levels of expression, some have little or no expression and a minority is highly expressed. The distribution profile of the CUG between the three species is different, but it can be significantly associated to gene expression levels: genes with fewer CUGs are the most highly expressed. However, CUG content is not related to the conservation level: more and less conserved genes have, on average, an equal number of CUGs. The most conserved genes are the most expressed. The lipase genes corroborate the results obtained for most genes of C. cylindracea since they are very rich in CUGs and nothing conserved. The reduced amount of CUG codons that was observed in highly expressed genes may be due, possibly, to an insufficient number of tRNA genes to cope with more CUGs without compromising translational efficiency. From the enrichment analysis, it was confirmed that the most conserved genes are associated with basic functions such as translation, pathogenesis and metabolism. From this set, genes with more or less CUGs seem to have different functions. The key issues on the evolutionary phenomenon remain unclear. However, the results are consistent with previous observations and shows a variety of conclusions that in future analyzes should be taken into consideration, since it was the first time that such a study was conducted.
Resumo:
Gastric (GC) and breast (BrC) cancer are two of the most common and deadly tumours. Different lines of evidence suggest a possible causative role of viral infections for both GC and BrC. Wide genome sequencing (WGS) technologies allow searching for viral agents in tissues of patients with cancer. These technologies have already contributed to establish virus-cancer associations as well as to discovery new tumour viruses. The objective of this study was to document possible associations of viral infection with GC and BrC in Mexican patients. In order to gain idea about cost effective conditions of experimental sequencing, we first carried out an in silico simulation of WGS. The next-generation-platform IlluminaGallx was then used to sequence GC and BrC tumour samples. While we did not find viral sequences in tissues from BrC patients, multiple reads matching Epstein-Barr virus (EBV) sequences were found in GC tissues. An end-point polymerase chain reaction confirmed an enrichment of EBV sequences in one of the GC samples sequenced, validating the next-generation sequencing-bioinformatics pipeline.
Resumo:
Background Human papillomavirus (HPV) is the aetiological agent for cervical cancer and genital warts. Concurrent HPV and HIV infection in the South African population is high. HIV positive (+) women are often infected with multiple, rare and undetermined HPV types. Data on HPV incidence and genotype distribution are based on commercial HPV detection kits, but these kits may not detect all HPV types in HIV + women. The objectives of this study were to (i) identify the HPV types not detected by commercial genotyping kits present in a cervical specimen from an HIV positive South African woman using next generation sequencing, and (ii) determine if these types were prevalent in a cohort of HIV-infected South African women. Methods Total DNA was isolated from 109 cervical specimens from South African HIV + women. A specimen within this cohort representing a complex multiple HPV infection, with 12 HPV genotypes detected by the Roche Linear Array HPV genotyping (LA) kit, was selected for next generation sequencing analysis. All HPV types present in this cervical specimen were identified by Illumina sequencing of the extracted DNA following rolling circle amplification. The prevalence of the HPV types identified by sequencing, but not included in the Roche LA, was then determined in the 109 HIV positive South African women by type-specific PCR. Results Illumina sequencing identified a total of 16 HPV genotypes in the selected specimen, with four genotypes (HPV-30, 74, 86 and 90) not included in the commercial kit. The prevalence's of HPV-30, 74, 86 and 90 in 109 HIV positive South African women were found to be 14.6 %, 12.8 %, 4.6 % and 8.3 % respectively. Conclusions Our results indicate that there are HPV types, with substantial prevalence, in HIV positive women not being detected in molecular epidemiology studies using commercial kits. The significance of these types in relation to cervical disease remains to be investigated.
Resumo:
This item provides supplementary materials for the paper mentioned in the title, specifically a range of organisms used in the study. The full abstract for the main paper is as follows: Next Generation Sequencing (NGS) technologies have revolutionised molecular biology, allowing clinical sequencing to become a matter of routine. NGS data sets consist of short sequence reads obtained from the machine, given context and meaning through downstream assembly and annotation. For these techniques to operate successfully, the collected reads must be consistent with the assumed species or species group, and not corrupted in some way. The common bacterium Staphylococcus aureus may cause severe and life-threatening infections in humans,with some strains exhibiting antibiotic resistance. In this paper, we apply an SVM classifier to the important problem of distinguishing S. aureus sequencing projects from alternative pathogens, including closely related Staphylococci. Using a sequence k-mer representation, we achieve precision and recall above 95%, implicating features with important functional associations.
Resumo:
Next Generation Sequencing (NGS) has revolutionised molec- ular biology, allowing routine clinical sequencing. NGS data consists of short sequence reads, given context through downstream assembly and annotation, a process requiring reads consistent with the assumed species or species group. The common bacterium Staphylococcus aureus may cause severe and life-threatening infections in humans, with some strains exhibiting antibiotic resistance. Here we apply an SVM classifier to the important problem of distinguishing S. aureus sequencing projects from other pathogens, including closely related Staphylococci. Using a sequence k-mer representation, we achieve precision and recall above 95%, implicating features with important functional associations.
Resumo:
We isolated and characterized 21 microsatellite loci in the vulnerable and iconic Australian lungfish, Neoceratodus forsteri. Loci were screened across eight individuals from the Burnett River and 40 individuals from the Pine River. Genetic diversity was low with between one and six alleles per locus within populations and a maximum expected heterozygosity of 0.774. These loci will now be available to assess effective population sizes and genetic structure in N. forsteri across its natural range in South East Queensland, Australia.
Resumo:
Forward genetic screens have identified numerous genes involved in development and metabolism, and remain a cornerstone of biological research. However, to locate a causal mutation, the practice of crossing to a polymorphic background to generate a mapping population can be problematic if the mutant phenotype is difficult to recognize in the hybrid F2 progeny, or dependent on parental specific traits. Here in a screen for leaf hyponasty mutants, we have performed a single backcross of an Ethane Methyl Sulphonate (EMS) generated hyponastic mutant to its parent. Whole genome deep sequencing of a bulked homozygous F2 population and analysis via the Next Generation EMS mutation mapping pipeline (NGM) unambiguously determined the causal mutation to be a single nucleotide polymorphisim (SNP) residing in HASTY, a previously characterized gene involved in microRNA biogenesis. We have evaluated the feasibility of this backcross approach using three additional SNP mapping pipelines; SHOREmap, the GATK pipeline, and the samtools pipeline. Although there was variance in the identification of EMS SNPs, all returned the same outcome in clearly identifying the causal mutation in HASTY. The simplicity of performing a single parental backcross and genome sequencing a small pool of segregating mutants has great promise for identifying mutations that may be difficult to map using conventional approaches.
Resumo:
Next Generation Sequencing (NGS) has revolutionised molecular biology, resulting in an explosion of data sets and an increasing role in clinical practice. Such applications necessarily require rapid identification of the organism as a prelude to annotation and further analysis. NGS data consist of a substantial number of short sequence reads, given context through downstream assembly and annotation, a process requiring reads consistent with the assumed species or species group. Highly accurate results have been obtained for restricted sets using SVM classifiers, but such methods are difficult to parallelise and success depends on careful attention to feature selection. This work examines the problem at very large scale, using a mix of synthetic and real data with a view to determining the overall structure of the problem and the effectiveness of parallel ensembles of simpler classifiers (principally random forests) in addressing the challenges of large scale genomics.