915 resultados para RNA-Seq
Resumo:
Dissertação de mestrado em Plant Molecular Biology, Biotechnology and Bioentrepreneurship
Resumo:
We evaluated 25 protocol variants of 14 independent computational methods for exon identification, transcript reconstruction and expression-level quantification from RNA-seq data. Our results show that most algorithms are able to identify discrete transcript components with high success rates but that assembly of complete isoform structures poses a major challenge even when all constituent elements are identified. Expression-level estimates also varied widely across methods, even when based on similar transcript models. Consequently, the complexity of higher eukaryotic genomes imposes severe limitations on transcript recall and splice product discrimination that are likely to remain limiting factors for the analysis of current-generation RNA-seq data.
Resumo:
BACKGROUND: Finding genes that are differentially expressed between conditions is an integral part of understanding the molecular basis of phenotypic variation. In the past decades, DNA microarrays have been used extensively to quantify the abundance of mRNA corresponding to different genes, and more recently high-throughput sequencing of cDNA (RNA-seq) has emerged as a powerful competitor. As the cost of sequencing decreases, it is conceivable that the use of RNA-seq for differential expression analysis will increase rapidly. To exploit the possibilities and address the challenges posed by this relatively new type of data, a number of software packages have been developed especially for differential expression analysis of RNA-seq data. RESULTS: We conducted an extensive comparison of eleven methods for differential expression analysis of RNA-seq data. All methods are freely available within the R framework and take as input a matrix of counts, i.e. the number of reads mapping to each genomic feature of interest in each of a number of samples. We evaluate the methods based on both simulated data and real RNA-seq data. CONCLUSIONS: Very small sample sizes, which are still common in RNA-seq experiments, impose problems for all evaluated methods and any results obtained under such conditions should be interpreted with caution. For larger sample sizes, the methods combining a variance-stabilizing transformation with the 'limma' method for differential expression analysis perform well under many different conditions, as does the nonparametric SAMseq method.
Resumo:
Gene set enrichment (GSE) analysis is a popular framework for condensing information from gene expression profiles into a pathway or signature summary. The strengths of this approach over single gene analysis include noise and dimension reduction, as well as greater biological interpretability. As molecular profiling experiments move beyond simple case-control studies, robust and flexible GSE methodologies are needed that can model pathway activity within highly heterogeneous data sets. To address this challenge, we introduce Gene Set Variation Analysis (GSVA), a GSE method that estimates variation of pathway activity over a sample population in an unsupervised manner. We demonstrate the robustness of GSVA in a comparison with current state of the art sample-wise enrichment methods. Further, we provide examples of its utility in differential pathway activity and survival analysis. Lastly, we show how GSVA works analogously with data from both microarray and RNA-seq experiments. GSVA provides increased power to detect subtle pathway activity changes over a sample population in comparison to corresponding methods. While GSE methods are generally regarded as end points of a bioinformatic analysis, GSVA constitutes a starting point to build pathway-centric models of biology. Moreover, GSVA contributes to the current need of GSE methods for RNA-seq data. GSVA is an open source software package for R which forms part of the Bioconductor project and can be downloaded at http://www.bioconductor.org.
Resumo:
Within the ENCODE Consortium, GENCODE aimed to accurately annotate all protein-coding genes, pseudogenes, and noncoding transcribed loci in the human genome through manual curation and computational methods. Annotated transcript structures were assessed, and less well-supported loci were systematically, experimentally validated. Predicted exon-exon junctions were evaluated by RT-PCR amplification followed by highly multiplexed sequencing readout, a method we called RT-PCR-seq. Seventy-nine percent of all assessed junctions are confirmed by this evaluation procedure, demonstrating the high quality of the GENCODE gene set. RT-PCR-seq was also efficient to screen gene models predicted using the Human Body Map (HBM) RNA-seq data. We validated 73% of these predictions, thus confirming 1168 novel genes, mostly noncoding, which will further complement the GENCODE annotation. Our novel experimental validation pipeline is extremely sensitive, far more than unbiased transcriptome profiling through RNA sequencing, which is becoming the norm. For example, exon-exon junctions unique to GENCODE annotated transcripts are five times more likely to be corroborated with our targeted approach than with extensive large human transcriptome profiling. Data sets such as the HBM and ENCODE RNA-seq data fail sampling of low-expressed transcripts. Our RT-PCR-seq targeted approach also has the advantage of identifying novel exons of known genes, as we discovered unannotated exons in ~11% of assessed introns. We thus estimate that at least 18% of known loci have yet-unannotated exons. Our work demonstrates that the cataloging of all of the genic elements encoded in the human genome will necessitate a coordinated effort between unbiased and targeted approaches, like RNA-seq and RT-PCR-seq.
Resumo:
Breast cancer is the most common diagnosed cancer and the leading cause of cancer death among females worldwide. It is considered a highly heterogeneous disease and it must be classified into more homogeneous groups. Hence, the purpose of this study was to classify breast tumors based on variations in gene expression patterns derived from RNA sequencing by using different class discovery methods. 42 breast tumors paired-samples were sequenced by Illumine Genome Analyzer and the data was analyzed and prepared by TopHat2 and htseq-count. As reported previously, breast cancer could be grouped into five main groups known as basal epithelial-like group, HER2 group, normal breast-like group and two Luminal groups with a distinctive expression profile. Classifying breast tumor samples by using PAM50 method, the most common subtype was Luminal B and was significantly associated with ESR1 and ERBB2 high expression. Luminal A subtype had ESR1 and SLC39A6 significant high expression, whereas HER2 subtype had a high expression of ERBB2 and CNNE1 genes and low luminal epithelial gene expression. Basal-like and normal-like subtypes were associated with low expression of ESR1, PgR and HER2, and had significant high expression of cytokeratins 5 and 17. Our results were similar compared with TGCA breast cancer data results and with known studies related with breast cancer classification. Classifying breast tumors could add significant prognostic and predictive information to standard parameters, and moreover, identify marker genes for each subtype to find a better therapy for patients with breast cancer.
Resumo:
The transcriptome of the developing starchy endosperm of hexaploid wheat (Triticum aestivum) was determined using RNA-Seq isolated at five stages during grain fill. This resource represents an excellent way to identify candidate genes responsible for the starchy endosperm cell wall, which is dominated by arabinoxylan (AX), accounting for 70% of the cell wall polysaccharides, with 20% (1,3; 1,4)-beta-D-glucan, 7% glucomannan, and 4% cellulose. A complete inventory of transcripts of 124 glycosyltransferase (GT) and 72 glycosylhydrolase (GH) genes associated with cell walls is presented. The most highly expressed GT transcript (excluding those known to be involved in starch synthesis) was a GT47 family transcript similar to Arabidopsis (Arabidopsis thaliana) IRX10 involved in xylan extension, and the second most abundant was a GT61. Profiles for GT43 IRX9 and IRX14 putative orthologs were consistent with roles in AX synthesis. Low abundances were found for transcripts from genes in the acyl-coA transferase BAHD family, for which a role in AX feruloylation has been postulated. The relative expression of these was much greater in whole grain compared with starchy endosperm, correlating with the levels of bound ferulate. Transcripts associated with callose (GSL), cellulose (CESA), pectin (GAUT), and glucomannan (CSLA) synthesis were also abundant in starchy endosperm, while the corresponding cell wall polysaccharides were confirmed as low abundance (glucomannan and callose) or undetectable (pectin) in these samples. Abundant transcripts from GH families associated with the hydrolysis of these polysaccharides were also present, suggesting that they may be rapidly turned over. Abundant transcripts in the GT31 family may be responsible for the addition of Gal residues to arabinogalactan peptide.
Resumo:
Pós-graduação em Biociências e Biotecnologia Aplicadas à Farmácia - FCFAR
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Schistosoma mansoni is one of the agents of schistosomiasis, a chronic and debilitating disease. Here we, present a transcriptome-wide characterization of adult S. mansoni males by high-throughput RNA-sequencing. We obtained 1,620,432 high-quality ESTs from a directional strand-specific cDNA library, resulting in a 26% higher coverage of genome bases than that of the public ESTs available at NCBI. With a 15 x-deep coverage of transcribed genomic regions, our data were able to (i) confirm for the first time 990 predictions without previous evidence of transcription; (ii) correct gene predictions; (iii) discover 989 and 1196 RNA-seq contigs that map to intergenic and intronic genomic regions, respectively, where no gene had been predicted before. These contigs could represent new protein-coding genes or non-coding RNAs (ncRNAs). Interestingly, we identified 11 novel Micro-exon genes (MEGs). These data reveal new features of the S. mansoni transcriptional landscape and significantly advance our understanding of the parasite transcriptome. (c) 2011 Elsevier Inc. All rights reserved.
Resumo:
The ability to measure gene expression on a genome-wide scale is one of the most promising accomplishments in molecular biology. Microarrays, the technology that first permitted this, were riddled with problems due to unwanted sources of variability. Many of these problems are now mitigated, after a decade’s worth of statistical methodology development. The recently developed RNA sequencing (RNA-seq) technology has generated much excitement in part due to claims of reduced variability in comparison to microarrays. However, we show RNA-seq data demonstrates unwanted and obscuring variability similar to what was first observed in microarrays. In particular, we find GC-content has a strong sample specific effect on gene expression measurements that, if left uncorrected, leads to false positives in downstream results. We also report on commonly observed data distortions that demonstrate the need for data normalization. Here we describe statistical methodology that improves precision by 42% without loss of accuracy. Our resulting conditional quantile normalization (CQN) algorithm combines robust generalized regression to remove systematic bias introduced by deterministic features such as GC-content, and quantile normalization to correct for global distortions.
Resumo:
BACKGROUND Moraxella catarrhalis, a major nasopharyngeal pathogen of the human respiratory tract, is exposed to rapid downshifts of environmental temperature when humans breathe cold air. The prevalence of pharyngeal colonization and respiratory tract infections caused by M. catarrhalis is greatest in winter. We investigated how M. catarrhalis uses the physiologic exposure to cold air to regulate pivotal survival systems that may contribute to M. catarrhalis virulence. RESULTS In this study we used the RNA-seq techniques to quantitatively catalogue the transcriptome of M. catarrhalis exposed to a 26 °C cold shock or to continuous growth at 37 °C. Validation of RNA-seq data using quantitative RT-PCR analysis demonstrated the RNA-seq results to be highly reliable. We observed that a 26 °C cold shock induces the expression of genes that in other bacteria have been related to virulence a strong induction was observed for genes involved in high affinity phosphate transport and iron acquisition, indicating that M. catarrhalis makes a better use of both phosphate and iron resources after exposure to cold shock. We detected the induction of genes involved in nitrogen metabolism, as well as several outer membrane proteins, including ompA, m35-like porin and multidrug efflux pump (acrAB) indicating that M. catarrhalis remodels its membrane components in response to downshift of temperature. Furthermore, we demonstrate that a 26 °C cold shock enhances the induction of genes encoding the type IV pili that are essential for natural transformation, and increases the genetic competence of M. catarrhalis, which may facilitate the rapid spread and acquisition of novel virulence-associated genes. CONCLUSION Cold shock at a physiologically relevant temperature of 26 °C induces in M. catarrhalis a complex of adaptive mechanisms that could convey novel pathogenic functions and may contribute to enhanced colonization and virulence.