917 resultados para RNA-seq data
Resumo:
The advent of next generation sequencing technologies (NGS) has expanded the area of genomic research, offering high coverage and increased sensitivity over older microarray platforms. Although the current cost of next generation sequencing is still exceeding that of microarray approaches, the rapid advances in NGS will likely make it the platform of choice for future research in differential gene expression. Connectivity mapping is a procedure for examining the connections among diseases, genes and drugs by differential gene expression initially based on microarray technology, with which a large collection of compound-induced reference gene expression profiles have been accumulated. In this work, we aim to test the feasibility of incorporating NGS RNA-Seq data into the current connectivity mapping framework by utilizing the microarray based reference profiles and the construction of a differentially expressed gene signature from a NGS dataset. This would allow for the establishment of connections between the NGS gene signature and those microarray reference profiles, alleviating the associated incurring cost of re-creating drug profiles with NGS technology. We examined the connectivity mapping approach on a publicly available NGS dataset with androgen stimulation of LNCaP cells in order to extract candidate compounds that could inhibit the proliferative phenotype of LNCaP cells and to elucidate their potential in a laboratory setting. In addition, we also analyzed an independent microarray dataset of similar experimental settings. We found a high level of concordance between the top compounds identified using the gene signatures from the two datasets. The nicotine derivative cotinine was returned as the top candidate among the overlapping compounds with potential to suppress this proliferative phenotype. Subsequent lab experiments validated this connectivity mapping hit, showing that cotinine inhibits cell proliferation in an androgen dependent manner. Thus the results in this study suggest a promising prospect of integrating NGS data with connectivity mapping. © 2013 McArt et al.
Resumo:
Tese de mestrado em Bioinformática e Biologia Computacional (Bioinformática), apresentada à Universidade de Lisboa, através da Faculdade de Ciências, 2014
Resumo:
The ability to measure gene expression on a genome-wide scale is one of the most promising accomplishments in molecular biology. Microarrays, the technology that first permitted this, were riddled with problems due to unwanted sources of variability. Many of these problems are now mitigated, after a decade’s worth of statistical methodology development. The recently developed RNA sequencing (RNA-seq) technology has generated much excitement in part due to claims of reduced variability in comparison to microarrays. However, we show RNA-seq data demonstrates unwanted and obscuring variability similar to what was first observed in microarrays. In particular, we find GC-content has a strong sample specific effect on gene expression measurements that, if left uncorrected, leads to false positives in downstream results. We also report on commonly observed data distortions that demonstrate the need for data normalization. Here we describe statistical methodology that improves precision by 42% without loss of accuracy. Our resulting conditional quantile normalization (CQN) algorithm combines robust generalized regression to remove systematic bias introduced by deterministic features such as GC-content, and quantile normalization to correct for global distortions.
Resumo:
BACKGROUND Moraxella catarrhalis, a major nasopharyngeal pathogen of the human respiratory tract, is exposed to rapid downshifts of environmental temperature when humans breathe cold air. The prevalence of pharyngeal colonization and respiratory tract infections caused by M. catarrhalis is greatest in winter. We investigated how M. catarrhalis uses the physiologic exposure to cold air to regulate pivotal survival systems that may contribute to M. catarrhalis virulence. RESULTS In this study we used the RNA-seq techniques to quantitatively catalogue the transcriptome of M. catarrhalis exposed to a 26 °C cold shock or to continuous growth at 37 °C. Validation of RNA-seq data using quantitative RT-PCR analysis demonstrated the RNA-seq results to be highly reliable. We observed that a 26 °C cold shock induces the expression of genes that in other bacteria have been related to virulence a strong induction was observed for genes involved in high affinity phosphate transport and iron acquisition, indicating that M. catarrhalis makes a better use of both phosphate and iron resources after exposure to cold shock. We detected the induction of genes involved in nitrogen metabolism, as well as several outer membrane proteins, including ompA, m35-like porin and multidrug efflux pump (acrAB) indicating that M. catarrhalis remodels its membrane components in response to downshift of temperature. Furthermore, we demonstrate that a 26 °C cold shock enhances the induction of genes encoding the type IV pili that are essential for natural transformation, and increases the genetic competence of M. catarrhalis, which may facilitate the rapid spread and acquisition of novel virulence-associated genes. CONCLUSION Cold shock at a physiologically relevant temperature of 26 °C induces in M. catarrhalis a complex of adaptive mechanisms that could convey novel pathogenic functions and may contribute to enhanced colonization and virulence.
Resumo:
MOTIVATION: Data from RNA-seq experiments provide us with many new possibilities to gain insights into biological and disease mechanisms of cellular functioning. However, the reproducibility and robustness of RNA-seq data analysis results is often unclear. This is in part attributed to the two counter acting goals of (a) a cost efficient and (b) an optimal experimental design leading to a compromise, e.g., in the sequencing depth of experiments.
RESULTS: We introduce an R package called samExploreR that allows the subsampling (m out of n bootstraping) of short-reads based on SAM files facilitating the investigation of sequencing depth related questions for the experimental design. Overall, this provides a systematic way for exploring the reproducibility and robustness of general RNA-seq studies. We exemplify the usage of samExploreR by studying the influence of the sequencing depth and the annotation on the identification of differentially expressed genes.
AVAILABILITY: Availability: samExploreR is available as an R package from Bioconductor (after acceptance of the paper, download link: http://www.bio-complexity.com/samExploreR_1.0.0.tar.gz).
Resumo:
Background Small RNA sequencing is commonly used to identify novel miRNAs and to determine their expression levels in plants. There are several miRNA identification tools for animals such as miRDeep, miRDeep2 and miRDeep*. miRDeep-P was developed to identify plant miRNA using miRDeep’s probabilistic model of miRNA biogenesis, but it depends on several third party tools and lacks a user-friendly interface. The objective of our miRPlant program is to predict novel plant miRNA, while providing a user-friendly interface with improved accuracy of prediction. Result We have developed a user-friendly plant miRNA prediction tool called miRPlant. We show using 16 plant miRNA datasets from four different plant species that miRPlant has at least a 10% improvement in accuracy compared to miRDeep-P, which is the most popular plant miRNA prediction tool. Furthermore, miRPlant uses a Graphical User Interface for data input and output, and identified miRNA are shown with all RNAseq reads in a hairpin diagram. Conclusions We have developed miRPlant which extends miRDeep* to various plant species by adopting suitable strategies to identify hairpin excision regions and hairpin structure filtering for plants. miRPlant does not require any third party tools such as mapping or RNA secondary structure prediction tools. miRPlant is also the first plant miRNA prediction tool that dynamically plots miRNA hairpin structure with small reads for identified novel miRNAs. This feature will enable biologists to visualize novel pre-miRNA structure and the location of small RNA reads relative to the hairpin. Moreover, miRPlant can be easily used by biologists with limited bioinformatics skills.
Resumo:
Increasing salinity levels in freshwater and coastal environments caused by sea level rise linked to climate change is now recognized to be a major factor that can impact fish growth negatively, especially for freshwater teleost species. Striped catfish (Pangasianodon hypophthalmus) is an important freshwater teleost that is now widely farmed across the Mekong River Delta in Vietnam. Understanding the basis for tolerance and adaptation to raised environmental salinity conditions can assist the regional culture industry to mitigate predicted impacts of climate change across this region. Attempt of next generation sequencing using the ion proton platform results in more than 174 million raw reads from three tissue libraries (gill, kidney and intestine). Reads were filtered and de novo assembled using a variety of assemblers and then clustered together to generate a combined reference transcriptome. Downstream analysis resulted in a final reference transcriptome that contained 60,585 transcripts with an N50 of 683 bp. This resource was further annotated using a variety of bioinformatics databases, followed by differential gene expression analysis that resulted in 3062 transcripts that were differentially expressed in catfish samples raised under two experimental conditions (0 and 15 ppt). A number of transcripts with a potential role in salinity tolerance were then classified into six different functional gene categories based on their gene ontology assignments. These included; energy metabolism, ion transportation, detoxification, signal transduction, structural organization and detoxification. Finally, we combined the data on functional salinity tolerance genes into a hypothetical schematic model that attempted to describe potential relationships and interactions among target genes to explain the molecular pathways that control adaptive salinity responses in P. hypophthalmus. Our results indicate that P. hypophthalmus exhibit predictable plastic regulatory responses to elevated salinity by means of characteristic gene expression patterns, providing numerous candidate genes for future investigations.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Schistosoma mansoni is one of the agents of schistosomiasis, a chronic and debilitating disease. Here we, present a transcriptome-wide characterization of adult S. mansoni males by high-throughput RNA-sequencing. We obtained 1,620,432 high-quality ESTs from a directional strand-specific cDNA library, resulting in a 26% higher coverage of genome bases than that of the public ESTs available at NCBI. With a 15 x-deep coverage of transcribed genomic regions, our data were able to (i) confirm for the first time 990 predictions without previous evidence of transcription; (ii) correct gene predictions; (iii) discover 989 and 1196 RNA-seq contigs that map to intergenic and intronic genomic regions, respectively, where no gene had been predicted before. These contigs could represent new protein-coding genes or non-coding RNAs (ncRNAs). Interestingly, we identified 11 novel Micro-exon genes (MEGs). These data reveal new features of the S. mansoni transcriptional landscape and significantly advance our understanding of the parasite transcriptome. (c) 2011 Elsevier Inc. All rights reserved.
Resumo:
Human neurodegenerative diseases, such as Parkinson’s disease (PD) and the neuromuscular disorders called dystroglycanopathies (DGPs), cause retinal impairments. We have used RNA-Seq technology to catalog all known genes linked to PD and DGPs expressed in the human retina and quantitate their mRNA levels in terms of FPKM. We have also characterized their expression profiles in the retina by determining their exonic, intronic and exon-intron junction expression levels, as well as the alternative splicing pattern of particular genes. We believe these data could pave the way toward understanding the molecular bases of sight deficiencies associated with neurodegenerative disorders.
Resumo:
miRDeep and its varieties are widely used to quantify known and novel micro RNA (miRNA) from small RNA sequencing (RNAseq). This article describes miRDeep*, our integrated miRNA identification tool, which is modeled off miRDeep, but the precision of detecting novel miRNAs is improved by introducing new strategies to identify precursor miRNAs. miRDeep* has a user-friendly graphic interface and accepts raw data in FastQ and Sequence Alignment Map (SAM) or the binary equivalent (BAM) format. Known and novel miRNA expression levels, as measured by the number of reads, are displayed in an interface, which shows each RNAseq read relative to the pre-miRNA hairpin. The secondary pre-miRNA structure and read locations for each predicted miRNA are shown and kept in a separate figure file. Moreover, the target genes of known and novel miRNAs are predicted using the TargetScan algorithm, and the targets are ranked according to the confidence score. miRDeep* is an integrated standalone application where sequence alignment, pre-miRNA secondary structure calculation and graphical display are purely Java coded. This application tool can be executed using a normal personal computer with 1.5 GB of memory. Further, we show that miRDeep* outperformed existing miRNA prediction tools using our LNCaP and other small RNAseq datasets. miRDeep* is freely available online at http://www.australianprostatecentre.org/research/software/mirdeep-star
Resumo:
The project investigated the molecular response of Tra catfish (Pangasianodon hypophthalmus) to elevated salinity conditions. We employed Next generation sequencing platform to evaluate differential gene expression profiles of key genes under two salinity conditions. Results of the current project can form the basis for further studies to confirm the functional roles of specific genes that influence salinity tolerance in the target species and more broadly in other freshwater teleost fishes. Ultimately, the approach can contribute to developing superior culture stocks of the target species.
Resumo:
Campylobacter jejuni is the most common bacterial cause of foodborne disease in the developed world. Its general physiology and biochemistry, as well as the mechanisms enabling it to colonize and cause disease in various hosts, are not well understood, and new approaches are required to understand its basic biology. High-throughput sequencing technologies provide unprecedented opportunities for functional genomic research. Recent studies have shown that direct Illumina sequencing of cDNA (RNA-seq) is a useful technique for the quantitative and qualitative examination of transcriptomes. In this study we report RNA-seq analyses of the transcriptomes of C. jejuni (NCTC11168) and its rpoN mutant. This has allowed the identification of hitherto unknown transcriptional units, and further defines the regulon that is dependent on rpoN for expression. The analysis of the NCTC11168 transcriptome was supplemented by additional proteomic analysis using liquid chromatography-MS. The transcriptomic and proteomic datasets represent an important resource for the Campylobacter research community. © 2011 SGM.
Resumo:
BACKGROUND:
We have recently identified a number of Quantitative Trait Loci (QTL) contributing to the 2-fold muscle weight difference between the LG/J and SM/J mouse strains and refined their confidence intervals. To facilitate nomination of the candidate genes responsible for these differences we examined the transcriptome of the tibialis anterior (TA) muscle of each strain by RNA-Seq.
RESULTS:13,726 genes were expressed in mouse skeletal muscle. Intersection of a set of 1061 differentially expressed transcripts with a mouse muscle Bayesian Network identified a coherent set of differentially expressed genes that we term the LG/J and SM/J Regulatory Network (LSRN). The integration of the QTL, transcriptome and the network analyses identified eight key drivers of the LSRN (Kdr, Plbd1, Mgp, Fah, Prss23, 2310014F06Rik, Grtp1, Stk10) residing within five QTL regions, which were either polymorphic or differentially expressed between the two strains and are strong candidates for quantitative trait genes (QTGs) underlying muscle mass. The insight gained from network analysis including the ability to make testable predictions is illustrated by annotating the LSRN with knowledge-based signatures and showing that the SM/J state of the network corresponds to a more oxidative state. We validated this prediction by NADH tetrazolium reductase staining in the TA muscle revealing higher oxidative potential of the SM/J compared to the LG/J strain (p<0.03).
CONCLUSION:Thus, integration of fine resolution QTL mapping, RNA-Seq transcriptome information and mouse muscle Bayesian Network analysis provides a novel and unbiased strategy for nomination of muscle QTGs.