963 resultados para RNA sequencing


Relevância:

100.00% 100.00%

Publicador:

Resumo:

miRDeep and its varieties are widely used to quantify known and novel micro RNA (miRNA) from small RNA sequencing (RNAseq). This article describes miRDeep*, our integrated miRNA identification tool, which is modeled off miRDeep, but the precision of detecting novel miRNAs is improved by introducing new strategies to identify precursor miRNAs. miRDeep* has a user-friendly graphic interface and accepts raw data in FastQ and Sequence Alignment Map (SAM) or the binary equivalent (BAM) format. Known and novel miRNA expression levels, as measured by the number of reads, are displayed in an interface, which shows each RNAseq read relative to the pre-miRNA hairpin. The secondary pre-miRNA structure and read locations for each predicted miRNA are shown and kept in a separate figure file. Moreover, the target genes of known and novel miRNAs are predicted using the TargetScan algorithm, and the targets are ranked according to the confidence score. miRDeep* is an integrated standalone application where sequence alignment, pre-miRNA secondary structure calculation and graphical display are purely Java coded. This application tool can be executed using a normal personal computer with 1.5 GB of memory. Further, we show that miRDeep* outperformed existing miRNA prediction tools using our LNCaP and other small RNAseq datasets. miRDeep* is freely available online at http://www.australianprostatecentre.org/research/software/mirdeep-star

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Potato leafroll virus (PLRV) is a positive-strand RNA virus that generates subgenomic RNAs (sgRNA) for expression of 3' proximal genes. Small RNA (sRNA) sequencing and mapping of the PLRV-derived sRNAs revealed coverage of the entire viral genome with the exception of four distinctive gaps. Remarkably, these gaps mapped to areas of PLRV genome with extensive secondary structures, such as the internal ribosome entry site and 5' transcriptional start site of sgRNA1 and sgRNA2. The last gap mapped to ~500. nt from the 3' terminus of PLRV genome and suggested the possible presence of an additional sgRNA for PLRV. Quantitative real-time PCR and northern blot analysis confirmed the expression of sgRNA3 and subsequent analyses placed its 5' transcriptional start site at position 5347 of PLRV genome. A regulatory role is proposed for the PLRV sgRNA3 as it encodes for an RNA-binding protein with specificity to the 5' of PLRV genomic RNA. © 2013.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background Small RNA sequencing is commonly used to identify novel miRNAs and to determine their expression levels in plants. There are several miRNA identification tools for animals such as miRDeep, miRDeep2 and miRDeep*. miRDeep-P was developed to identify plant miRNA using miRDeep’s probabilistic model of miRNA biogenesis, but it depends on several third party tools and lacks a user-friendly interface. The objective of our miRPlant program is to predict novel plant miRNA, while providing a user-friendly interface with improved accuracy of prediction. Result We have developed a user-friendly plant miRNA prediction tool called miRPlant. We show using 16 plant miRNA datasets from four different plant species that miRPlant has at least a 10% improvement in accuracy compared to miRDeep-P, which is the most popular plant miRNA prediction tool. Furthermore, miRPlant uses a Graphical User Interface for data input and output, and identified miRNA are shown with all RNAseq reads in a hairpin diagram. Conclusions We have developed miRPlant which extends miRDeep* to various plant species by adopting suitable strategies to identify hairpin excision regions and hairpin structure filtering for plants. miRPlant does not require any third party tools such as mapping or RNA secondary structure prediction tools. miRPlant is also the first plant miRNA prediction tool that dynamically plots miRNA hairpin structure with small reads for identified novel miRNAs. This feature will enable biologists to visualize novel pre-miRNA structure and the location of small RNA reads relative to the hairpin. Moreover, miRPlant can be easily used by biologists with limited bioinformatics skills.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cell lines derived from tumor tissues have been used as a valuable system to study gene regulation and cancer development. Comprehensive characterization of the genetic background of cell lines could provide clues on novel genes responsible for carcinogenesis and help in choosing cell lines for particular studies. Here, we have carried out whole exome and RNA sequencing of commonly used glioblastoma (GBM) cell lines (U87, T98G, LN229, U343, U373 and LN18) to unearth single nucleotide variations (SNVs), indels, differential gene expression, gene fusions and RNA editing events. We obtained an average of 41,071 SNVs out of which 1,594 (3.88%) were potentially cancer-specific. The cell lines showed frequent SNVs and indels in some of the genes that are known to be altered in GBM-EGFR, TP53, PTEN, SPTA1 and NF1. Chromatin modifying genes-ATRX, MLL3, MLL4, SETD2 and SRCAP also showed alterations. While no cell line carried IDH1 mutations, five cell lines showed hTERT promoter activating mutations with a concomitant increase in hTERT transcript levels. Five significant gene fusions were found of which NUP93-CYB5B was validated. An average of 18,949 RNA editing events was also obtained. Thus we have generated a comprehensive catalogue of genetic alterations for six GBM cell lines.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cell lines derived from tumor tissues have been used as a valuable system to study gene regulation and cancer development. Comprehensive characterization of the genetic background of cell lines could provide clues on novel genes responsible for carcinogenesis and help in choosing cell lines for particular studies. Here, we have carried out whole exome and RNA sequencing of commonly used glioblastoma (GBM) cell lines (U87, T98G, LN229, U343, U373 and LN18) to unearth single nucleotide variations (SNVs), indels, differential gene expression, gene fusions and RNA editing events. We obtained an average of 41,071 SNVs out of which 1,594 (3.88%) were potentially cancer-specific. The cell lines showed frequent SNVs and indels in some of the genes that are known to be altered in GBM-EGFR, TP53, PTEN, SPTA1 and NF1. Chromatin modifying genes-ATRX, MLL3, MLL4, SETD2 and SRCAP also showed alterations. While no cell line carried IDH1 mutations, five cell lines showed hTERT promoter activating mutations with a concomitant increase in hTERT transcript levels. Five significant gene fusions were found of which NUP93-CYB5B was validated. An average of 18,949 RNA editing events was also obtained. Thus we have generated a comprehensive catalogue of genetic alterations for six GBM cell lines.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Understanding the dynamics of eukaryotic transcriptome is essential for studying the complexity of transcriptional regulation and its impact on phenotype. However, comprehensive studies of transcriptomes at single base resolution are rare, even for modern organisms, and lacking for rice. Here, we present the first transcriptome atlas for eight organs of cultivated rice. Using high-throughput paired-end RNA-seq, we unambiguously detected transcripts expressing at an extremely low level, as well as a substantial number of novel transcripts, exons, and untranslated regions. An analysis of alternative splicing in the rice transcriptome revealed that alternative cis-splicing occurred in similar to 33% of all rice genes. This is far more than previously reported. In addition, we also identified 234 putative chimeric transcripts that seem to be produced by trans-splicing, indicating that transcript fusion events are more common than expected. In-depth analysis revealed a multitude of fusion transcripts that might be by-products of alternative splicing. Validation and chimeric transcript structural analysis provided evidence that some of these transcripts are likely to be functional in the cell. Taken together, our data provide extensive evidence that transcriptional regulation in rice is vastly more complex than previously believed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Brain structure and function experience dramatic changes from embryonic to postnatal development. Microarray analyses have detected differential gene expression at different stages and in disease models, but gene expression information during early brain development is limited. We have generated >27 million reads to identify mRNAs from the mouse cortex for>16,000 genes at either embryonic day 18 (E18) or postnatal day 7 (P7), a period of significant synapto-genesis for neural circuit formation. In addition, we devised strategies to detect alternative splice forms and uncovered more splice variants. We observed differential expression of 3,758 genes between the 2 stages, many with known functions or predicted to be important for neural development. Neurogenesis-related genes, such as those encoding Sox4, Sox11, and zinc-finger proteins, were more highly expressed at E18 than at P7. In contrast, the genes encoding synaptic proteins such as synaptotagmin, complexin 2, and syntaxin were up-regulated from E18 to P7. We also found that several neurological disorder-related genes were highly expressed at E18. Our transcriptome analysis may serve as a blueprint for gene expression pattern and provide functional clues of previously unknown genes and disease-related genes during early brain development.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the advent of high through-put sequencing (HTS), the emerging science of metagenomics is transforming our understanding of the relationships of microbial communities with their environments. While metagenomics aims to catalogue the genes present in a sample through assessing which genes are actively expressed, metatranscriptomics can provide a mechanistic understanding of community inter-relationships. To achieve these goals, several challenges need to be addressed from sample preparation to sequence processing, statistical analysis and functional annotation. Here we use an inbred non-obese diabetic (NOD) mouse model in which germ-free animals were colonized with a defined mixture of eight commensal bacteria, to explore methods of RNA extraction and to develop a pipeline for the generation and analysis of metatranscriptomic data. Applying the Illumina HTS platform, we sequenced 12 NOD cecal samples prepared using multiple RNA-extraction protocols. The absence of a complete set of reference genomes necessitated a peptide-based search strategy. Up to 16% of sequence reads could be matched to a known bacterial gene. Phylogenetic analysis of the mapped ORFs revealed a distribution consistent with ribosomal RNA, the majority from Bacteroides or Clostridium species. To place these HTS data within a systems context, we mapped the relative abundance of corresponding Escherichia coli homologs onto metabolic and protein-protein interaction networks. These maps identified bacterial processes with components that were well-represented in the datasets. In summary this study highlights the potential of exploiting the economy of HTS platforms for metatranscriptomics.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

My dissertation focuses on two aspects of RNA sequencing technology. The first is the methodology for modeling the overdispersion inherent in RNA-seq data for differential expression analysis. This aspect is addressed in three sections. The second aspect is the application of RNA-seq data to identify the CpG island methylator phenotype (CIMP) by integrating datasets of mRNA expression level and DNA methylation status. Section 1: The cost of DNA sequencing has reduced dramatically in the past decade. Consequently, genomic research increasingly depends on sequencing technology. However it remains elusive how the sequencing capacity influences the accuracy of mRNA expression measurement. We observe that accuracy improves along with the increasing sequencing depth. To model the overdispersion, we use the beta-binomial distribution with a new parameter indicating the dependency between overdispersion and sequencing depth. Our modified beta-binomial model performs better than the binomial or the pure beta-binomial model with a lower false discovery rate. Section 2: Although a number of methods have been proposed in order to accurately analyze differential RNA expression on the gene level, modeling on the base pair level is required. Here, we find that the overdispersion rate decreases as the sequencing depth increases on the base pair level. Also, we propose four models and compare them with each other. As expected, our beta binomial model with a dynamic overdispersion rate is shown to be superior. Section 3: We investigate biases in RNA-seq by exploring the measurement of the external control, spike-in RNA. This study is based on two datasets with spike-in controls obtained from a recent study. We observe an undiscovered bias in the measurement of the spike-in transcripts that arises from the influence of the sample transcripts in RNA-seq. Also, we find that this influence is related to the local sequence of the random hexamer that is used in priming. We suggest a model of the inequality between samples and to correct this type of bias. Section 4: The expression of a gene can be turned off when its promoter is highly methylated. Several studies have reported that a clear threshold effect exists in gene silencing that is mediated by DNA methylation. It is reasonable to assume the thresholds are specific for each gene. It is also intriguing to investigate genes that are largely controlled by DNA methylation. These genes are called “L-shaped” genes. We develop a method to determine the DNA methylation threshold and identify a new CIMP of BRCA. In conclusion, we provide a detailed understanding of the relationship between the overdispersion rate and sequencing depth. And we reveal a new bias in RNA-seq and provide a detailed understanding of the relationship between this new bias and the local sequence. Also we develop a powerful method to dichotomize methylation status and consequently we identify a new CIMP of breast cancer with a distinct classification of molecular characteristics and clinical features.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-06

Relevância:

70.00% 70.00%

Publicador:

Resumo:

 Milk is considered on of the world’s most ‘complete’ food. To characterise milk composition, Amit investigated RNA present of milk form 8 different species ranging from platypus to human. By applying latest RNA sequencing and bioinformatic techniques, his work led to uncover hundreds of novel milk RNAs.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)