24 resultados para RNA-seq data
Resumo:
The advent of next generation sequencing technologies (NGS) has expanded the area of genomic research, offering high coverage and increased sensitivity over older microarray platforms. Although the current cost of next generation sequencing is still exceeding that of microarray approaches, the rapid advances in NGS will likely make it the platform of choice for future research in differential gene expression. Connectivity mapping is a procedure for examining the connections among diseases, genes and drugs by differential gene expression initially based on microarray technology, with which a large collection of compound-induced reference gene expression profiles have been accumulated. In this work, we aim to test the feasibility of incorporating NGS RNA-Seq data into the current connectivity mapping framework by utilizing the microarray based reference profiles and the construction of a differentially expressed gene signature from a NGS dataset. This would allow for the establishment of connections between the NGS gene signature and those microarray reference profiles, alleviating the associated incurring cost of re-creating drug profiles with NGS technology. We examined the connectivity mapping approach on a publicly available NGS dataset with androgen stimulation of LNCaP cells in order to extract candidate compounds that could inhibit the proliferative phenotype of LNCaP cells and to elucidate their potential in a laboratory setting. In addition, we also analyzed an independent microarray dataset of similar experimental settings. We found a high level of concordance between the top compounds identified using the gene signatures from the two datasets. The nicotine derivative cotinine was returned as the top candidate among the overlapping compounds with potential to suppress this proliferative phenotype. Subsequent lab experiments validated this connectivity mapping hit, showing that cotinine inhibits cell proliferation in an androgen dependent manner. Thus the results in this study suggest a promising prospect of integrating NGS data with connectivity mapping. © 2013 McArt et al.
Resumo:
MOTIVATION: Data from RNA-seq experiments provide us with many new possibilities to gain insights into biological and disease mechanisms of cellular functioning. However, the reproducibility and robustness of RNA-seq data analysis results is often unclear. This is in part attributed to the two counter acting goals of (a) a cost efficient and (b) an optimal experimental design leading to a compromise, e.g., in the sequencing depth of experiments.
RESULTS: We introduce an R package called samExploreR that allows the subsampling (m out of n bootstraping) of short-reads based on SAM files facilitating the investigation of sequencing depth related questions for the experimental design. Overall, this provides a systematic way for exploring the reproducibility and robustness of general RNA-seq studies. We exemplify the usage of samExploreR by studying the influence of the sequencing depth and the annotation on the identification of differentially expressed genes.
AVAILABILITY: Availability: samExploreR is available as an R package from Bioconductor (after acceptance of the paper, download link: http://www.bio-complexity.com/samExploreR_1.0.0.tar.gz).
Resumo:
BACKGROUND:
We have recently identified a number of Quantitative Trait Loci (QTL) contributing to the 2-fold muscle weight difference between the LG/J and SM/J mouse strains and refined their confidence intervals. To facilitate nomination of the candidate genes responsible for these differences we examined the transcriptome of the tibialis anterior (TA) muscle of each strain by RNA-Seq.
RESULTS:13,726 genes were expressed in mouse skeletal muscle. Intersection of a set of 1061 differentially expressed transcripts with a mouse muscle Bayesian Network identified a coherent set of differentially expressed genes that we term the LG/J and SM/J Regulatory Network (LSRN). The integration of the QTL, transcriptome and the network analyses identified eight key drivers of the LSRN (Kdr, Plbd1, Mgp, Fah, Prss23, 2310014F06Rik, Grtp1, Stk10) residing within five QTL regions, which were either polymorphic or differentially expressed between the two strains and are strong candidates for quantitative trait genes (QTGs) underlying muscle mass. The insight gained from network analysis including the ability to make testable predictions is illustrated by annotating the LSRN with knowledge-based signatures and showing that the SM/J state of the network corresponds to a more oxidative state. We validated this prediction by NADH tetrazolium reductase staining in the TA muscle revealing higher oxidative potential of the SM/J compared to the LG/J strain (p<0.03).
CONCLUSION:Thus, integration of fine resolution QTL mapping, RNA-Seq transcriptome information and mouse muscle Bayesian Network analysis provides a novel and unbiased strategy for nomination of muscle QTGs.
Resumo:
BACKGROUND: Tumorigenesis is characterised by changes in transcriptional control. Extensive transcript expression data have been acquired over the last decade and used to classify prostate cancers. Prostate cancer is, however, a heterogeneous multifocal cancer and this poses challenges in identifying robust transcript biomarkers.
METHODS: In this study, we have undertaken a meta-analysis of publicly available transcriptomic data spanning datasets and technologies from the last decade and encompassing laser capture microdissected and macrodissected sample sets.
RESULTS: We identified a 33 gene signature that can discriminate between benign tissue controls and localised prostate cancers irrespective of detection platform or dissection status. These genes were significantly overexpressed in localised prostate cancer versus benign tissue in at least three datasets within the Oncomine Compendium of Expression Array Data. In addition, they were also overexpressed in a recent exon-array dataset as well a prostate cancer RNA-seq dataset generated as part of the The Cancer Genomics Atlas (TCGA) initiative. Biologically, glycosylation was the single enriched process associated with this 33 gene signature, encompassing four glycosylating enzymes. We went on to evaluate the performance of this signature against three individual markers of prostate cancer, v-ets avian erythroblastosis virus E26 oncogene homolog (ERG) expression, prostate specific antigen (PSA) expression and androgen receptor (AR) expression in an additional independent dataset. Our signature had greater discriminatory power than these markers both for localised cancer and metastatic disease relative to benign tissue, or in the case of metastasis, also localised prostate cancer.
CONCLUSION: In conclusion, robust transcript biomarkers are present within datasets assembled over many years and cohorts and our study provides both examples and a strategy for refining and comparing datasets to obtain additional markers as more data are generated.
Resumo:
Genetic risk factors for chronic kidney disease (CKD) are being identified through international collaborations. By comparison, epigenetic risk factors for CKD have only recently been considered using population-based approaches. DNA methylation is a major epigenetic modification that is associated with complex diseases, so we investigated methylome-wide loci for association with CKD. A total of 485,577 unique features were evaluated in 255 individuals with CKD (cases) and 152 individuals without evidence of renal disease (controls). Following stringent quality control, raw data were quantile normalized and β values calculated to reflect the methylation status at each site. The difference in methylation status was evaluated between cases and controls with resultant P values adjusted for multiple testing. Genes with significantly increased and decreased levels of DNA methylation were considered for biological relevance by functional enrichment analysis using KEGG pathways in Partek Genomics Suite. Twenty-three genes, where more than one CpG per loci was identified with Padjusted < 10−8, demonstrated significant methylation changes associated with CKD and additional support for these associated loci was sought from published literature. Strong biological candidates for CKD that showed statistically significant differential methylation include CUX1, ELMO1, FKBP5, INHBA-AS1, PTPRN2, and PRKAG2 genes; several genes are differentially methylated in kidney tissue and RNA-seq supports a functional role for differential methylation in ELMO1 and PRKAG2 genes. This study reports the largest, most comprehensive, genome-wide quantitative evaluation of DNA methylation for association with CKD. Evidence confirming methylation sites influence development of CKD would stimulate research to identify epigenetic therapies that might be clinically useful for CKD.
Resumo:
Patterns of glycosylation are important in cancer, but the molecular mechanisms that drive changes are often poorly understood. The androgen receptor drives prostate cancer (PCa) development and progression to lethal metastatic castration-resistant disease. Here we used RNA-Seq coupled with bioinformatic analyses of androgen-receptor (AR) binding sites and clinical PCa expression array data to identify ST6GalNAc1 as a direct and rapidly activated target gene of the AR in PCa cells. ST6GalNAc1 encodes a sialytransferase that catalyses formation of the cancer-associated sialyl-Tn antigen (sTn), which we find is also induced by androgen exposure. Androgens induce expression of a novel splice variant of the ST6GalNAc1 protein in PCa cells. This splice variant encodes a shorter protein isoform that is still fully functional as a sialyltransferase and able to induce expression of the sTn-antigen. Surprisingly, given its high expression in tumours, stable expression of ST6GalNAc1 in PCa cells reduced formation of stable tumours in mice, reduced cell adhesion and induced a switch towards a more mesenchymal-like cell phenotype in vitro. ST6GalNAc1 has a dynamic expression pattern in clinical datasets, beingsignificantly up-regulated in primary prostate carcinoma but relatively down-regulated in established metastatic tissue. ST6GalNAc1 is frequently upregulated concurrently with another important glycosylation enzyme GCNT1 previously associated with prostate cancer progression and implicated in Sialyl Lewis X antigen synthesis. Together our data establishes an androgen-dependent mechanism for sTn antigen expression in PCa, and are consistent with a general role for the androgen receptor in driving important coordinate changes to the glycoproteome during PCa progression.
Resumo:
Berlin high (BEH) and Berlin low (BEL) strains selected for divergent growth differ 3-fold in body weight. We aimed at examining muscle mass, which is a major contributor to body weight, by exploring anatomical characteristics of the soleus muscle, its fiber numbers and their cross sectional area (CSA), by analysing transcriptome of the gastrocnemius and by initiating quantitative trait locus (QTL) mapping. BEH muscles were 4-to-8 times larger compared to BEL strain. In sub-strain BEH+/+, mutant myostatin was replaced with a wild type allele, however, BEH+/+muscles still were 2-to-4 times larger compared to the BEL strain. BEH soleus contained 2-times more (P<0.0001) and 2-times larger in CSA (P<0.0001) fibers compared to BEL strain. In addition, soleus femoral attachment anomaly (SFAA) was observed in all BEL mice. One significant (chromosome 1) and four suggestive (chromosomes 3, 4, 6 and 9) muscle weight QTLs were mapped in 21-day old F2 intercross (n=296) between BEH and BEL strains. The frequency of SFAA incidence in the F2 and in the backcross to BEL strain (BCL) suggested the presence of more than one causative gene. Two suggestive SFAA QTLs were mapped in BCL, however, their peak markers were not associated with the phenotype in F2. RNA-Seq analysis revealed 2,148 differentially expressed (P<0.1) genes and 45,673 SNPs and >2,000 indels between BEH+/+ and BEL males. In conclusion, contrasting muscle traits, genomic and gene expression differences between BEH and BEL strains provide a promising model for the search of genes involved in muscle growth and musculoskeletal morphogenesis.
Resumo:
Background
The human microbiome plays a significant role in maintaining normal physiology. Changes in its composition have been associated with bowel disease, metabolic disorders and atherosclerosis. Sequences of microbial origin have been observed within small RNA sequencing data obtained from blood samples. The aim of this study was to characterise the microbiome from which these sequences are derived.
Results
Abundant non-human small RNA sequences were identified in plasma and plasma exosomal samples. Assembly of these short sequences into longer contigs was the pivotal novel step in ascertaining their origin by BLAST searches. Most reads mapped to rRNA sequences. The taxonomic profiles of the microbes detected were very consistent between individuals but distinct from microbiomes reported at other sites. The majority of bacterial reads were from the phylum Proteobacteria, whilst for 5 of 6 individuals over 90% of the more abundant fungal reads were from the phylum Ascomycota; of these over 90% were from the order Hypocreales. Many contigs were from plants, presumably of dietary origin. In addition, extremely abundant small RNAs derived from human Y RNAs were detected.
ConclusionsA characteristic profile of a subset of the human microbiome can be obtained by sequencing small RNAs present in the blood. The source and functions of these molecules remain to be determined, but the specific profiles are likely to reflect health status. The potential to provide biomarkers of diet and for the diagnosis and prognosis of human disease is immense.
Resumo:
BACKGROUND: Prostate cancer (PCa) is the most common cancer in men. PCa is strongly age associated; low death rates in surveillance cohorts call into question the widespread use of surgery, which leads to overtreatment and a reduction in quality of life. There is a great need to increase the understanding of tumor characteristics in the context of disease progression.
OBJECTIVE: To perform the first multigenome investigation of PCa through analysis of both autosomal and mitochondrial DNA, and to integrate exome sequencing data, and RNA sequencing and copy-number alteration (CNA) data to investigate how various different tumor characteristics, commonly analyzed separately, are interconnected.
DESIGN, SETTING, AND PARTICIPANTS: Exome sequencing was applied to 64 tumor samples from 55 PCa patients with varying stage and grade. Integrated analysis was performed on a core set of 50 tumors from which exome sequencing, CNA, and RNA sequencing data were available.
OUTCOME MEASUREMENTS AND STATISTICAL ANALYSIS: Genes, mutated at a significantly higher rate relative to a genomic background, were identified. In addition, mitochondrial and autosomal mutation rates were correlated to CNAs and proliferation, assessed as a cell cycle gene expression signature.
RESULTS AND LIMITATIONS: Genes not previously reported to be significantly mutated in PCa, such as cell division cycle 27 homolog (Saccharomyces cerevisiae) (CDC27), myeloid/lymphoid or mixed-lineage leukemia 3 (MLL3), lysine (K)-specific demethylase 6A (KDM6A), and kinesin family member 5A (KIF5A) were identified. The mutation rate in the mitochondrial genome was 55 times higher than that of the autosomes. Multilevel analysis demonstrated a tight correlation between high reactive-oxygen exposure, chromosomal damage, high proliferation, and in parallel, a transition from multiclonal indolent primary PCa to monoclonal aggressive disease. As we only performed targeted sequence analysis; copy-number neutral rearrangements recently described for PCa were not accounted for.
CONCLUSIONS: The mitochondrial genome displays an elevated mutation rate compared to the autosomal chromosomes. By integrated analysis, we demonstrated that different tumor characteristics are interconnected, providing an increased understanding of PCa etiology.
Resumo:
Purpose: To investigate how potentially functional genetic variants are coinherited on each of four common complement factor H (CFH) and CFH-related gene haplotypes and to measure expression of these genes in eye and liver tissues.
Methods: We sequenced the CFH region in four individuals (one homozygote for each of four common CFH region haplotypes) to identify all genetic variants. We studied associations between the haplotypes and AMD phenotypes in 2157 cases and 1150 controls. We examined RNA-seq profiles in macular and peripheral retina and retinal pigment epithelium/choroid/sclera (RCS) from eight eye donors and three liver samples.
Results: The haplotypic coinheritance of potentially functional variants (including missense variants, novel splice sites, and the CFHR3–CFHR1 deletion) was described for the four common haplotypes. Expression of the short and long CFH transcripts differed markedly between the retina and liver. We found no expression of any of the five CFH-related genes in the retina or RCS, in contrast to the liver, which is the main source of the circulating proteins.
Conclusions: We identified all genetic variants on common CFH region haplotypes and described their coinheritance. Understanding their functional effects will be key to developing and stratifying AMD therapies. The small scale of our expression study prevented us from investigating the relationships between CFH region haplotypes and their expression, and it will take time and collaboration to develop epidemiologic-scale studies. However, the striking difference between systemic and ocular expression of complement regulators shown in this study suggests important implications for the development of intraocular and systemic treatments.
Resumo:
The splicing factor SF3B1 is the most frequently mutated gene in myelodysplastic syndromes (MDS), and is strongly associated with the presence of ring sideroblasts (RS). We have performed a systematic analysis of cryptic splicing abnormalities from RNA sequencing data on hematopoietic stem cells (HSCs) of SF3B1-mutant MDS cases with RS. Aberrant splicing events in many downstream target genes were identified and cryptic 3' splice site usage was a frequent event in SF3B1-mutant MDS. The iron transporter ABCB7 is a well-recognized candidate gene showing marked downregulation in MDS with RS. Our analysis unveiled aberrant ABCB7 splicing, due to usage of an alternative 3' splice site in MDS patient samples, giving rise to a premature termination codon in the ABCB7 mRNA. Treatment of cultured SF3B1-mutant MDS erythroblasts and a CRISPR/Cas9-generated SF3B1-mutant cell line with the nonsense-mediated decay (NMD) inhibitor cycloheximide showed that the aberrantly spliced ABCB7 transcript is targeted by NMD. We describe cryptic splicing events in the HSCs of SF3B1-mutant MDS, and our data support a model in which NMD-induced downregulation of the iron exporter ABCB7 mRNA transcript resulting from aberrant splicing caused by mutant SF3B1 underlies the increased mitochondrial iron accumulation found in MDS patients with RS.Leukemia advance online publication, 17 June 2016; doi:10.1038/leu.2016.149.
Resumo:
The human coronavirus 229E replicase gene encodes a protein, p66HEL, that contains a putative zinc finger structure linked to a putative superfamily (SF) 1 helicase. A histidine-tagged form of this protein, HEL, was expressed using baculovirus vectors in insect cells. The purified recombinant protein had in vitro ATPase activity that was strongly stimulated by poly(U), poly(dT), poly(C), and poly(dA), but not by poly(G). The recombinant protein also had both RNA and DNA duplex-unwinding activities with 5'-to-3' polarity. The DNA helicase activity of the enzyme preferentially unwound 5'-oligopyrimidine-tailed, partial-duplex substrates and required a tail length of at least 10 nucleotides for effective unwinding. The combined data suggest that the coronaviral SF1 helicase functionally differs from the previously characterized RNA virus SF2 helicases.