934 resultados para Reference transcriptome
Resumo:
High Throughput Sequencing capabilities have made the process of assembling a transcriptome easier, whether or not there is a reference genome. But the quality of a transcriptome assembly must be good enough to capture the most comprehensive catalog of transcripts and their variations, and to carry out further experiments on transcriptomics. There is currently no consensus on which of the many sequencing technologies and assembly tools are the most effective. Many non-model organisms lack a reference genome to guide the transcriptome assembly. One question, therefore, is whether or not a reference-based genome assembly gives better results than de novo assembly. The blood-sucking insect Rhodnius prolixus-a vector for Chagas disease-has a reference genome. It is therefore a good model on which to compare reference-based and de novo transcriptome assemblies. In this study, we compared de novo and reference-based genome assembly strategies using three datasets (454, Illumina, 454 combined with Illumina) and various assembly software. We developed criteria to compare the resulting assemblies: the size distribution and number of transcripts, the proportion of potentially chimeric transcripts, how complete the assembly was (completeness evaluated both through CEGMA software and R. prolixus proteome fraction retrieved). Moreover, we looked for the presence of two chemosensory gene families (Odorant-Binding Proteins and Chemosensory Proteins) to validate the assembly quality. The reference-based assemblies after genome annotation were clearly better than those generated using de novo strategies alone. Reference-based strategies revealed new transcripts, including new isoforms unpredicted by automatic genome annotation. However, a combination of both de novo and reference-based strategies gave the best result, and allowed us to assemble fragmented transcripts.
Resumo:
Gene expression is one of the most critical factors influencing the phenotype of a cell. As a result of several technological advances, measuring gene expression levels has become one of the most common molecular biological measurements to study the behaviour of cells. The scientific community has produced enormous and constantly increasing collection of gene expression data from various human cells both from healthy and pathological conditions. However, while each of these studies is informative and enlighting in its own context and research setup, diverging methods and terminologies make it very challenging to integrate existing gene expression data to a more comprehensive view of human transcriptome function. On the other hand, bioinformatic science advances only through data integration and synthesis. The aim of this study was to develop biological and mathematical methods to overcome these challenges and to construct an integrated database of human transcriptome as well as to demonstrate its usage. Methods developed in this study can be divided in two distinct parts. First, the biological and medical annotation of the existing gene expression measurements needed to be encoded by systematic vocabularies. There was no single existing biomedical ontology or vocabulary suitable for this purpose. Thus, new annotation terminology was developed as a part of this work. Second part was to develop mathematical methods correcting the noise and systematic differences/errors in the data caused by various array generations. Additionally, there was a need to develop suitable computational methods for sample collection and archiving, unique sample identification, database structures, data retrieval and visualization. Bioinformatic methods were developed to analyze gene expression levels and putative functional associations of human genes by using the integrated gene expression data. Also a method to interpret individual gene expression profiles across all the healthy and pathological tissues of the reference database was developed. As a result of this work 9783 human gene expression samples measured by Affymetrix microarrays were integrated to form a unique human transcriptome resource GeneSapiens. This makes it possible to analyse expression levels of 17330 genes across 175 types of healthy and pathological human tissues. Application of this resource to interpret individual gene expression measurements allowed identification of tissue of origin with 92.0% accuracy among 44 healthy tissue types. Systematic analysis of transcriptional activity levels of 459 kinase genes was performed across 44 healthy and 55 pathological tissue types and a genome wide analysis of kinase gene co-expression networks was done. This analysis revealed biologically and medically interesting data on putative kinase gene functions in health and disease. Finally, we developed a method for alignment of gene expression profiles (AGEP) to perform analysis for individual patient samples to pinpoint gene- and pathway-specific changes in the test sample in relation to the reference transcriptome database. We also showed how large-scale gene expression data resources can be used to quantitatively characterize changes in the transcriptomic program of differentiating stem cells. Taken together, these studies indicate the power of systematic bioinformatic analyses to infer biological and medical insights from existing published datasets as well as to facilitate the interpretation of new molecular profiling data from individual patients.
Resumo:
The aim of this study was to characterize the transcriptome of a balanced polymorphism, under the regulation of a single gene, for phosphate fertilizer responsiveness/arsenate toler- ance in wild grass Holcus lanatus genotypes screened from the same habitat.
De novo transcriptome sequencing, RNAseq (RNA sequencing) and single nucleotide poly- morphism (SNP) calling were conducted on RNA extracted from H.lanatus. Roche 454 sequencing data were assembled into c. 22 000 isotigs, and paired-end Illumina reads for phosphorus-starved (P) and phosphorus-treated (P+) genovars of tolerant (T) and nontoler- ant (N) phenotypes were mapped to this reference transcriptome.
Heatmaps of the gene expression data showed strong clustering of each P+/P treated genovar, as well as clustering by N/T phenotype. Statistical analysis identified 87 isotigs to be significantly differentially expressed between N and T phenotypes and 258 between P+ and P treated plants. SNPs and transcript expression that systematically differed between N and T phenotypes had regulatory function, namely proteases, kinases and ribonuclear RNA- binding protein and transposable elements.
A single gene for arsenate tolerance led to distinct phenotype transcriptomes and SNP pro- files, with large differences in upstream post-translational and post-transcriptional regulatory genes rather than in genes directly involved in P nutrition transport and metabolism per se.
Resumo:
High throughput sequencing (HTS) provides new research opportunities for work on non-model organisms, such as differential expression studies between populations exposed to different environmental conditions. However, such transcriptomic studies first require the production of a reference assembly. The choice of sampling procedure, sequencing strategy and assembly workflow is crucial. To develop a reliable reference transcriptome for Triatoma brasiliensis, the major Chagas disease vector in Northeastern Brazil, different de novo assembly protocols were generated using various datasets and software. Both 454 and Illumina sequencing technologies were applied on RNA extracted from antennae and mouthparts from single or pooled individuals. The 454 library yielded 278 Mb. Fifteen Illumina libraries were constructed and yielded nearly 360 million RNA-seq single reads and 46 million RNA-seq paired-end reads for nearly 45 Gb. For the 454 reads, we used three assemblers, Newbler, CAP3 and/or MIRA and for the Illumina reads, the Trinity assembler. Ten assembly workflows were compared using these programs separately or in combination. To compare the assemblies obtained, quantitative and qualitative criteria were used, including contig length, N50, contig number and the percentage of chimeric contigs. Completeness of the assemblies was estimated using the CEGMA pipeline. The best assembly (57,657 contigs, completeness of 80 %, < 1 % chimeric contigs) was a hybrid assembly leading to recommend the use of (1) a single individual with large representation of biological tissues, (2) merging both long reads and short paired-end Illumina reads, (3) several assemblers in order to combine the specific advantages of each.
Resumo:
Increasing salinity levels in freshwater and coastal environments caused by sea level rise linked to climate change is now recognized to be a major factor that can impact fish growth negatively, especially for freshwater teleost species. Striped catfish (Pangasianodon hypophthalmus) is an important freshwater teleost that is now widely farmed across the Mekong River Delta in Vietnam. Understanding the basis for tolerance and adaptation to raised environmental salinity conditions can assist the regional culture industry to mitigate predicted impacts of climate change across this region. Attempt of next generation sequencing using the ion proton platform results in more than 174 million raw reads from three tissue libraries (gill, kidney and intestine). Reads were filtered and de novo assembled using a variety of assemblers and then clustered together to generate a combined reference transcriptome. Downstream analysis resulted in a final reference transcriptome that contained 60,585 transcripts with an N50 of 683 bp. This resource was further annotated using a variety of bioinformatics databases, followed by differential gene expression analysis that resulted in 3062 transcripts that were differentially expressed in catfish samples raised under two experimental conditions (0 and 15 ppt). A number of transcripts with a potential role in salinity tolerance were then classified into six different functional gene categories based on their gene ontology assignments. These included; energy metabolism, ion transportation, detoxification, signal transduction, structural organization and detoxification. Finally, we combined the data on functional salinity tolerance genes into a hypothetical schematic model that attempted to describe potential relationships and interactions among target genes to explain the molecular pathways that control adaptive salinity responses in P. hypophthalmus. Our results indicate that P. hypophthalmus exhibit predictable plastic regulatory responses to elevated salinity by means of characteristic gene expression patterns, providing numerous candidate genes for future investigations.
Resumo:
The male gametophyte of the semi-aquatic fern, Marsilea vestita, produces multiciliated spermatozoids in a rapid developmental sequence that is controlled post-transcriptionally when dry microspores are placed in water. Development can be divided into two phases, mitosis and differentiation. During the mitotic phase, a series of nine successive division cycles produce 7 sterile cells and 32 spermatids in 4.5-5 hours. During the next 5-6 hours, each spermatid differentiates into a corkscrew-shaped motile spermatozoid with ~140 cilia. This document focuses on the role of motor proteins in the regulation of male gametophyte development and during ciliogenesis. In order to study the mechanisms that regulate spermatogenesis, RNAseq was used to generate a reference transcriptome that allowed us to assess the abundance of transcripts at different stages of development. Over 120 kinesin-like sequences were identified in the transcriptome that represent 56 unique kinesin transcripts. Members of the kinesin-2, -4, -5, -7, -8, -9, -12, -13, and -14 families, in addition to several plant specific and ‘orphan’ kinesins are present. Most (91%) of these kinesin transcripts change in abundance throughout gametophyte development, with 52% of kinesin mRNAs enriched during the mitotic phase and 39% enriched during differentiation. Functional analyses show that the temporal regulation of kinesin transcripts during gametogenesis directly correlates with kinesin protein function. Specifically, Marsilea makes one kinesin-2 (MvKinesin-2) and two kinesin-9 (MvKinesin-9A and MvKinesin-9B) transcripts, which are present during spermatid differentiation and ciliogenesis. Silencing experiments showed that MvKinesin-2 and MvKinesin-9A are required for ciliogenesis and motility in the Marsilea male gametophyte; however, these kinesins display atypical roles during these processes. In contrast, spermatozoids produced after the silencing of MvKinesin-9B exhibit normal morphology. MvKinesin-2 is necessary for cytokinesis as well as for regulating ciliary length and MvKinesin-9A is needed for the correct orientation of basal bodies, events not typically associated with these proteins. In addition, Marsilea makes motile, ciliated gametophytes without the help of IFT dynein, outer arm dynein, or the BBsome. These results are the first to investigate the kinesin-linked mechanisms that regulate ciliogenesis in a land plant.
Resumo:
Urinary tract infections (UTI) are among the most common infections in humans. Uropathogenic Escherichia coli (UPEC) can invade and replicate within bladder epithelial cells, and some UPEC strains can also survive within macrophages. To understand the UPEC transcriptional program associated with intramacrophage survival, we performed host–pathogen co-transcriptome analyses using RNA sequencing. Mouse bone marrow-derived macrophages (BMMs) were challenged over a 24 h time course with two UPEC reference strains that possess contrasting intramacrophage phenotypes: UTI89, which survives in BMMs, and 83972, which is killed by BMMs. Neither of these strains caused significant BMM cell death at the low multiplicity of infection that was used in this study. We developed an effective computational framework that simultaneously separated, annotated, and quantified the mammalian and bacterial transcriptomes. BMMs responded to the two UPEC strains with a broadly similar gene expression program. In contrast, the transcriptional responses of the UPEC strains diverged markedly from each other. We identified UTI89 genes upregulated at 24 h post-infection, and hypothesized that some may contribute to intramacrophage survival. Indeed, we showed that deletion of one such gene (pspA) significantly reduced UTI89 survival within BMMs. Our study provides a technological framework for simultaneously capturing global changes at the transcriptional level in co-cultures, and has generated new insights into the mechanisms that UPEC use to persist within the intramacrophage environment.
Resumo:
12 p.
Resumo:
A long-standing yet to be accomplished task in understanding behavior is to dissect the function of each gene involved in the development and function of a neuron. The C. elegans ALA neuron was chosen in this study for its known function in sleep, an ancient but less understood animal behavior. Single-cell transcriptome profiling identified 8,133 protein-coding genes in the ALA neuron, of which 57 are neuropeptide-coding genes. The most enriched genes are also neuropeptides. In combination with gain-of-function and loss-of-function assays, here I showed that the ALA-enriched FMRFamide neuropeptides, FLP-7, FLP-13, and FLP-24, are sufficient and necessary for inducing C. elegans sleep. These neuropeptides act as neuromodulators through GPCRs, NPR-7, and NPR-22. Further investigation in zebrafish indicates that FMRFamide neuropeptides are sleep-promoting molecules in animals. To correlate the behavioral outputs with genomic context, I constructed a gene regulatory network of the relevant genes controlling C. elegans sleep behavior through EGFR signaling in the ALA neuron. First, I identified an ALA cell-specific motif to conduct a genome-wide search for possible ALA-expressed genes. I then filtered out non ALA-expressed genes by comparing the motif-search genes with ALA transcriptomes from single-cell profiling. In corroborating with ChIP-seq data from modENCODE, I sorted out direct interaction of ALA-expressed transcription factors and differentiation genes in the EGFR sleep regulation pathway. This approach provides a network reference for the molecular regulation of C. elegans sleep behavior, and serves as an entry point for the understanding of functional genomics in animal behaviors.
Resumo:
The growing accessibility to genomic resources using next-generation sequencing (NGS) technologies has revolutionized the application of molecular genetic tools to ecology and evolutionary studies in non-model organisms. Here we present the case study of the European hake (Merluccius merluccius), one of the most important demersal resources of European fisheries. Two sequencing platforms, the Roche 454 FLX (454) and the Illumina Genome Analyzer (GAII), were used for Single Nucleotide Polymorphisms (SNPs) discovery in the hake muscle transcriptome. De novo transcriptome assembly into unique contigs, annotation, and in silico SNP detection were carried out in parallel for 454 and GAII sequence data. High-throughput genotyping using the Illumina GoldenGate assay was performed for validating 1,536 putative SNPs. Validation results were analysed to compare the performances of 454 and GAII methods and to evaluate the role of several variables (e.g. sequencing depth, intron-exon structure, sequence quality and annotation). Despite well-known differences in sequence length and throughput, the two approaches showed similar assay conversion rates (approximately 43%) and percentages of polymorphic loci (67.5% and 63.3% for GAII and 454, respectively). Both NGS platforms therefore demonstrated to be suitable for large scale identification of SNPs in transcribed regions of non-model species, although the lack of a reference genome profoundly affects the genotyping success rate. The overall efficiency, however, can be improved using strict quality and filtering criteria for SNP selection (sequence quality, intron-exon structure, target region score).
Resumo:
Background Somatic embryogenesis (SE) in plants is a process by which embryos are generated directly from somatic cells, rather than from the fused products of male and female gametes. Despite the detailed expression analysis of several somatic-to-embryonic marker genes, a comprehensive understanding of SE at a molecular level is still lacking. The present study was designed to generate high resolution transcriptome datasets for early SE providing the way for future research to understand the underlying molecular mechanisms that regulate this process. We sequenced Arabidopsis thaliana somatic embryos collected from three distinct developmental time-points (5, 10 and 15 d after in vitro culture) using the Illumina HiSeq 2000 platform. Results This study yielded a total of 426,001,826 sequence reads mapped to 26,520 genes in the A. thaliana reference genome. Analysis of embryonic cultures after 5 and 10 d showed differential expression of 1,195 genes; these included 778 genes that were more highly expressed after 5 d as compared to 10 d. Moreover, 1,718 genes were differentially expressed in embryonic cultures between 10 and 15 d. Our data also showed at least eight different expression patterns during early SE; the majority of genes are transcriptionally more active in embryos after 5 d. Comparison of transcriptomes derived from somatic embryos and leaf tissues revealed that at least 4,951 genes are transcriptionally more active in embryos than in the leaf; increased expression of genes involved in DNA cytosine methylation and histone deacetylation were noted in embryogenic tissues. In silico expression analysis based on microarray data found that approximately 5% of these genes are transcriptionally more active in somatic embryos than in actively dividing callus and non-dividing leaf tissues. Moreover, this identified 49 genes expressed at a higher level in somatic embryos than in other tissues. This included several genes with unknown function, as well as others related to oxidative and osmotic stress, and auxin signalling. Conclusions The transcriptome information provided here will form the foundation for future research on genetic and epigenetic control of plant embryogenesis at a molecular level. In follow-up studies, these data could be used to construct a regulatory network for SE; the genes more highly expressed in somatic embryos than in vegetative tissues can be considered as potential candidates to validate these networks.
Resumo:
The impact of ocean acidification (OA) on coral calcification, a subject of intense current interest, is poorly understood in part because of the presence of symbionts in adult corals. Early life history stages of Acropora spp. provide an opportunity to study the effects of elevated CO(2) on coral calcification without the complication of symbiont metabolism. Therefore, we used the Illumina RNAseq approach to study the effects of acute exposure to elevated CO(2) on gene expression in primary polyps of Acropora millepora, using as reference a novel comprehensive transcriptome assembly developed for this study. Gene ontology analysis of this whole transcriptome data set indicated that CO(2) -driven acidification strongly suppressed metabolism but enhanced extracellular organic matrix synthesis, whereas targeted analyses revealed complex effects on genes implicated in calcification. Unexpectedly, expression of most ion transport proteins was unaffected, while many membrane-associated or secreted carbonic anhydrases were expressed at lower levels. The most dramatic effect of CO(2) -driven acidification, however, was on genes encoding candidate and known components of the skeletal organic matrix that controls CaCO(3) deposition. The skeletal organic matrix effects included elevated expression of adult-type galaxins and some secreted acidic proteins, but down-regulation of other galaxins, secreted acidic proteins, SCRiPs and other coral-specific genes, suggesting specialized roles for the members of these protein families and complex impacts of OA on mineral deposition. This study is the first exhaustive exploration of the transcriptomic response of a scleractinian coral to acidification and provides an unbiased perspective on its effects during the early stages of calcification.
Phylum-wide transcriptome analysis of oogenesis and early embryogenesis in selected nematode species
Resumo:
Oogenesis is a prerequisite for embryogenesis in Metazoa. During both biological processes important decisions must be made to form the embryo and hence ensure the next generation: (1) Maternal gene products (mRNAs, proteins and nutrients) must be supplied to the embryo. (2) Polarity must be established and axes must be specified. While incorporation of maternal gene products occurs during oogenesis, the time point of polarity establishment and axis specification varies among species, as it is accomplished either prior, during, or after fertilisation. But not only the time point when these events take place varies among species but also the underlying mechanisms by which they are triggered. For the nematode model Caenorhabditis elegans the underlying pathways and gene regulatory networks (GRNs) are well understood. It is known that there the sperm entry point initiates a primary polarity in the 1-celled egg and with it the establishment of the anteroposterior axis. However, studies of other nematodes demonstrated that polarity establishment can be independent of sperm entry (Goldstein et al., 1998; Lahl et al., 2006) and that cleavage patterns, symmetry formation and cell specification also differ from C. elegans. In contrast to the studied Chromadorea (more derived nematodes including C. elegans), embryos of some marine Enoplea (more basal representatives) even show no discernible early polarity and blastomeres can adopt variable cell fates (Voronov and Panchin 1998). The underlying pathways controlling the obviously variant embryonic processes in non-Caenorhabditis nematodes are essentially unknown. In this thesis I addressed this issue by performing a detailed unbiased comparative transcriptome analysis based on microarrays and RNA sequencing of selected developmental stages in a variety of nematodes from different phylogenetic branches with C. elegans as a reference system and a nematomorph as an outgroup representative. In addition, I made use of available genomic data to determine the presence or absence of genes for which no expression had been detected. In particular, I focussed on components of selected pathways or GRNs which are known to play essential roles during C. elegans development and/or other invertebrate or vertebrate model systems. Oogenesis must be regulated differently in non-Caenorhabditis nematodes, as crucial controlling components of Wnt and sex determination signaling are absent in these species. In this respect, I identified female-specific expression of potential polarity associated genes during gonad development and oogenesis in the Enoplean nematode Romanomermis culicivorax. I could show that known downstream components of the polarity complexes PAR-3/-6/PKC-3 and PAR-1/-2 are absent in non-Caenorhabditis species. Even PAR-2 as part of the polarity complex does not exist in these nematodes. Instead, transcriptomes of nematodes (including C. elegans), show expression of other polarity-associated complexes such as the Lgl (Lethal giant larvae) complex. This result could pose an alternative route for nematodes and nematomorphs to initiate polarity during early embryogenesis. I could show that crucial pathways of axis specification, such as Wnt and BMP are very different in C. elegans compared to other nematodes. In the former, Wnt signaling, for instance, is mediated by four paralogous beta-catenins, while other Chromadorea have fewer and Enoplea only one beta-catenin. The transcriptomes of R. culicivorax and the nematomorph show that regulators of BMP (e.g. Chordin), are specifically expressed during early embryogenesis only in Enoplea and the close outgroup of nematomorphs. In conclusion, my results demonstrate that the molecular machinery controlling oogenesis and embryogenesis in nematodes is unexpectedly variable and C. elegans cannot be taken as a general model for nematode development. Under this perspective, Enoplean nematodes show more similarities with outgroups than with C. elegans. It appears that certain pathway components were lost or gained during evolution and others adopted new functions. Based on my findings I can conjecture, which pathway components may be ancestral and which were newly acquired in the course of nematode evolution.
Resumo:
The development of molecular markers for genomic studies in Mangifera indica (mango) will allow marker-assisted selection and identification of genetically diverse germplasm, greatly aiding mango breeding programs. We report here our identification of thousands of unambiguous molecular markers that can be easily assayed across genotypes of the species. With origin centered in Southeast Asia, mangos are grown throughout the tropics and subtropics as a nutritious fruit that exhibits remarkable intraspecific phenotypic diversity. With the goal of building a high density genetic map, we have undertaken discovery of sequence variation in expressed genes across a broad range of mango cultivars. A transcriptome sequence reference was built de novo from extensive sequencing and assembly of RNA from cultivar 'Tommy Atkins'. Single nucleotide polymorphisms (SNPs) in protein coding transcripts were determined from alignment of RNA reads from 24 mango cultivars of diverse origins: 'Amin Abrahimpur' (India), 'Aroemanis' (Indonesia), 'Burma' (Burma), 'CAC' (Hawaii), 'Duncan' (Florida), 'Edward' (Florida), 'Everbearing' (Florida), 'Gary' (Florida), 'Hodson' (Florida), 'Itamaraca' (Brazil), 'Jakarata' (Florida), 'Long' (Jamaica), 'M. Casturi Purple' (Borneo), 'Malindi' (Kenya), 'Mulgoba' (India), 'Neelum' (India), 'Peach' (unknown), 'Prieto' (Cuba), 'Sandersha' (India), 'Tete Nene' (Puerto Rico), 'Thai Everbearing' (Thailand), 'Toledo' (Cuba), 'Tommy Atkins' (Florida) and 'Turpentine' (West Indies). SNPs in a selected subset of protein coding transcripts are currently being converted into Fluidigm assays for genotyping of mapping populations and germplasm collections. Using an alternate approach, SNPs (144) discovered by sequencing of candidate genes in 'Kensington Pride' have already been converted and used for genotyping.