Biblioteca Digital

986 resultados para De novo assembly

Unusual RNA plant virus integration in the soybean genome leads to the production of small RNAs

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Horizontal gene transfer (HGT) is known to be a major force in genome evolution. The acquisition of genes from viruses by eukaryotic genomes is a well-studied example of HGT, including rare cases of non-retroviral RNA virus integration. The present study describes the integration of cucumber mosaic virus RNA-1 into soybean genome. After an initial metatranscriptomic analysis of small RNAs derived from soybean, the de novo assembly resulted a 3029-nt contig homologous to RNA-1. The integration of this sequence in the soybean genome was confirmed by DNA deep sequencing. The locus where the integration occurred harbors the full RNA-1 sequence followed by the partial sequence of an endogenous mRNA and another sequence of RNA-1 as an inverted repeat and allowing the formation of a hairpin structure. This region recombined into a retrotransposon located inside an exon of a soybean gene. The nucleotide similarity of the integrated sequence compared to other Cucumber mosaic virus sequences indicates that the integration event occurred recently. We described a rare event of non-retroviral RNA virus integration in soybean that leads to the production of a double-stranded RNA in a similar fashion to virus resistance RNAi plants.

Sequencing, annotation and comparative analysis of nine BACs of giant panda (Ailuropoda melanoleuca)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A 10-fold BAC library for giant panda was constructed and nine BACs were selected to generate finish sequences. These BACs could be used as a validation resource for the de novo assembly accuracy of the whole genome shotgun sequencing reads of giant panda newly generated by the Illumina GA sequencing technology. Complete sanger sequencing, assembly, annotation and comparative analysis were carried out on the selected BACs of a joint length 878 kb. Homologue search and de novo prediction methods were used to annotate genes and repeats. Twelve protein coding genes were predicted, seven of which could be functionally annotated. The seven genes have an average gene size of about 41 kb, an average coding size of about 1.2 kb and an average exon number of 6 per gene. Besides, seven tRNA genes were found. About 27 percent of the BAC sequence is composed of repeats. A phylogenetic tree was constructed using neighbor-join algorithm across five species, including giant panda, human, dog, cat and mouse, which reconfirms dog as the most related species to giant panda. Our results provide detailed sequence and structure information for new genes and repeats of giant panda, which will be helpful for further studies on the giant panda.

High-coverage sequencing and annotated assemblies of the budgerigar genome.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

BACKGROUND: Parrots belong to a group of behaviorally advanced vertebrates and have an advanced ability of vocal learning relative to other vocal-learning birds. They can imitate human speech, synchronize their body movements to a rhythmic beat, and understand complex concepts of referential meaning to sounds. However, little is known about the genetics of these traits. Elucidating the genetic bases would require whole genome sequencing and a robust assembly of a parrot genome. FINDINGS: We present a genomic resource for the budgerigar, an Australian Parakeet (Melopsittacus undulatus) -- the most widely studied parrot species in neuroscience and behavior. We present genomic sequence data that includes over 300× raw read coverage from multiple sequencing technologies and chromosome optical maps from a single male animal. The reads and optical maps were used to create three hybrid assemblies representing some of the largest genomic scaffolds to date for a bird; two of which were annotated based on similarities to reference sets of non-redundant human, zebra finch and chicken proteins, and budgerigar transcriptome sequence assemblies. The sequence reads for this project were in part generated and used for both the Assemblathon 2 competition and the first de novo assembly of a giga-scale vertebrate genome utilizing PacBio single-molecule sequencing. CONCLUSIONS: Across several quality metrics, these budgerigar assemblies are comparable to or better than the chicken and zebra finch genome assemblies built from traditional Sanger sequencing reads, and are sufficient to analyze regions that are difficult to sequence and assemble, including those not yet assembled in prior bird genomes, and promoter regions of genes differentially regulated in vocal learning brain regions. This work provides valuable data and material for genome technology development and for investigating the genomics of complex behavioral traits.

IMP: Imperial Metagenomics Pipeline for high-throughput sequence data

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We have developed an in-house pipeline for the processing and analyses of sequence data generated during Illumina technology-based metagenomic studies of the human gut microbiota. Each component of the pipeline has been selected following comparative analysis of available tools; however, the modular nature of software facilitates replacement of any individual component with an alternative should a better tool become available in due course. The pipeline consists of quality analysis and trimming followed by taxonomic filtering of sequence data allowing reads associated with samples to be binned according to whether they represent human, prokaryotic (bacterial/archaeal), viral, parasite, fungal or plant DNA. Viral, parasite, fungal and plant DNA can be assigned to species level on a presence/absence basis, allowing – for example – identification of dietary intake of plant-based foodstuffs and their derivatives. Prokaryotic DNA is subject to taxonomic and functional analyses, with assignment to taxonomic hierarchies (kingdom, class, order, family, genus, species, strain/subspecies) and abundance determination. After de novo assembly of sequence reads, genes within samples are predicted and used to build a non-redundant catalogue of genes. From this catalogue, per-sample gene abundance can be determined after normalization of data based on gene length. Functional annotation of genes is achieved through mapping of gene clusters against KEGG proteins, and InterProScan. The pipeline is undergoing validation using the human faecal metagenomic data of Qin et al. (2014, Nature 513, 59–64). Outputs from the pipeline allow development of tools for the integration of metagenomic and metabolomic data, moving metagenomic studies beyond determination of gene richness and representation towards microbial-metabolite mapping. There is scope to improve the outputs from viral, parasite, fungal and plant DNA analyses, depending on the depth of sequencing associated with samples. The pipeline can easily be adapted for the analyses of environmental and non-human animal samples, and for use with data generated via non-Illumina sequencing platforms.

Regulation of gene expression in the dinoflagellate Lingulodinium polyedrum

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Les dinoflagellés sont des eucaryotes unicellulaires que l’on retrouve autant en eau douce qu’en milieu marin. Ils sont particulièrement connus pour causer des fleurs d’algues toxiques nommées ‘marée-rouge’, ainsi que pour leur symbiose avec les coraux et pour leur importante contribution à la fixation du carbone dans les océans. Au point de vue moléculaire, ils sont aussi connus pour leur caractéristiques nucléaires uniques, car on retrouve généralement une quantité immense d’ADN dans leurs chromosomes et ceux-ci sont empaquetés et condensés sous une forme cristalline liquide au lieu de nucléosomes. Les gènes encodés par le noyau sont souvent présents en multiples copies et arrangés en tandem et aucun élément de régulation transcriptionnelle, y compris la boite TATA, n’a encore été observé. L’organisation unique de la chromatine des dinoflagellés suggère que différentes stratégies sont nécessaires pour contrôler l’expression des gènes de ces organismes. Dans cette étude, j’ai abordé ce problème en utilisant le dinoflagellé photosynthétique Lingulodinium polyedrum comme modèle. L. polyedrum est d’un intérêt particulier, car il a plusieurs rythmes circadiens (journalier). À ce jour, toutes les études sur l’expression des gènes lors des changements circadiens ont démontrées une régulation à un niveau traductionnel. Pour mes recherches, j’ai utilisé les approches transcriptomique, protéomique et phosphoprotéomique ainsi que des études biochimiques pour donner un aperçu de la mécanique de la régulation des gènes des dinoflagellés, ceci en mettant l’accent sur l’importance de la phosphorylation du système circadien de L. polyedrum. L’absence des protéines histones et des nucléosomes est une particularité des dinoflagellés. En utilisant la technologie RNA-Seq, j’ai trouvé des séquences complètes encodant des histones et des enzymes modifiant les histones. L polyedrum exprime donc des séquences conservées codantes pour les histones, mais le niveau d’expression protéique est plus faible que les limites de détection par immunodétection de type Western. Les données de séquençage RNA-Seq ont également été utilisées pour générer un transcriptome, qui est une liste des gènes exprimés par L. polyedrum. Une recherche par homologie de séquences a d’abord été effectuée pour classifier les transcrits en diverses catégories (Gene Ontology; GO). Cette analyse a révélé une faible abondance des facteurs de transcription et une surprenante prédominance, parmi ceux-ci, des séquences à domaine Cold Shock. Chez L. polyedrum, plusieurs gènes sont répétés en tandem. Un alignement des séquences obtenues par RNA-Seq avec les copies génomiques de gènes organisés en tandem a été réalisé pour examiner la présence de transcrits polycistroniques, une hypothèse formulée pour expliquer le manque d’élément promoteur dans la région intergénique de la séquence de ces gènes. Cette analyse a également démontré une très haute conservation des séquences codantes des gènes organisés en tandem. Le transcriptome a également été utilisé pour aider à l’identification de protéines après leur séquençage par spectrométrie de masse, et une fraction enrichie en phosphoprotéines a été déterminée comme particulièrement bien adapté aux approches d’analyse à haut débit. La comparaison des phosphoprotéomes provenant de deux périodes différentes de la journée a révélée qu’une grande partie des protéines pour lesquelles l’état de phosphorylation varie avec le temps est reliées aux catégories de liaison à l’ARN et de la traduction. Le transcriptome a aussi été utilisé pour définir le spectre des kinases présentes chez L. polyedrum, qui a ensuite été utilisé pour classifier les différents peptides phosphorylés qui sont potentiellement les cibles de ces kinases. Plusieurs peptides identifiés comme étant phosphorylés par la Casein Kinase 2 (CK2), une kinase connue pour être impliquée dans l’horloge circadienne des eucaryotes, proviennent de diverses protéines de liaison à l’ARN. Pour évaluer la possibilité que quelques-unes des multiples protéines à domaine Cold Shock identifiées dans le transcriptome puissent moduler l’expression des gènes de L. polyedrum, tel qu’observé chez plusieurs autres systèmes procaryotiques et eucaryotiques, la réponse des cellules à des températures froides a été examinée. Les températures froides ont permis d’induire rapidement un enkystement, condition dans laquelle ces cellules deviennent métaboliquement inactives afin de résister aux conditions environnementales défavorables. Les changements dans le profil des phosphoprotéines seraient le facteur majeur causant la formation de kystes. Les phosphosites prédits pour être phosphorylés par la CK2 sont la classe la plus fortement réduite dans les kystes, une découverte intéressante, car le rythme de la bioluminescence confirme que l’horloge a été arrêtée dans le kyste.

Análise do transcriptoma de folhas de cana-de-açúcar submetidas à prolongada limitação hídrica usando RNA-Seq

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Aktivierung des Sauerstoffsensors FNR von Escherichia coli durch Glutathion in vivo und in vitro

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Der Sauerstoffsensor FNR (Fumarat-Nitratreduktase-Regulator) von Escherichia coli spielt eine wichtige Rolle beim Umschalten vom aeroben zum anaeroben Stoffwechsel. FNR ist ein Transkriptionsregulator, der im aktiven Zustand ein [4Fe4S]-Zentrum besitzt. Bei Kontakt mit Sauerstoff zerfällt das [4Fe4S]- zu einem [2Fe2S]-Zentrum und führt zum Verlust der Aktivität von FNR. Die Reaktionen, die zum Aufbau des [4Fe4S]-Zentrums und der reduktiven Aktivierung von aerob und anaerob isoliertem apoFNR führen, wurden in vivo und in vitro untersucht. Die Einfluß in vivo von Glutathion auf die Funktion von FNR und die Rolle von Glutathion beim Aufbau des [4Fe4S]-Zentrums in gereinigtem apoFNR zeigen die wichtige Bedeutung von Glutathion bei der de novo Assemblierung von [4Fe4S]FNR und bei der reduktiven Aktivierung von sauerstoff-inaktiviertem FNR. Die energetischen Parameter von E. coli und ihre Änderungen beim Übergang vom aeroben zum anaeroben Stoffwechsel wurden untersucht. Das elektrochemische Protonenpotential delta-p über der Cytoplasmamembran wurde im Gleichgewichtszustand in der aeroben Atmung und anaeroben Nitrat-, Fumarat- und Dimethylsulfoxid-Atmung bestimmt. Delta-p betrug in der aeroben Atmung -160 mV, in der anaeroben Atmung sank delta-p entgegen früheren Vermutungen lediglich um 20 mV. Die geringen Änderungen von delta-p können deshalb vermutlich nicht als regulatorisches Signal für das Umschalten vom aeroben zum anaeroben Stoffwechsel genutzt werden.

DEVELOPMENT OF NOVEL METHODS TO MINIMIZE THE IMPACT OF SEQUENCING ERRORS IN THE NEXT-GENERATION SEQUENCING DATA ANALYSIS

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Next-generation sequencing (NGS) technology has become a prominent tool in biological and biomedical research. However, NGS data analysis, such as de novo assembly, mapping and variants detection is far from maturity, and the high sequencing error-rate is one of the major problems. . To minimize the impact of sequencing errors, we developed a highly robust and efficient method, MTM, to correct the errors in NGS reads. We demonstrated the effectiveness of MTM on both single-cell data with highly non-uniform coverage and normal data with uniformly high coverage, reflecting that MTM’s performance does not rely on the coverage of the sequencing reads. MTM was also compared with Hammer and Quake, the best methods for correcting non-uniform and uniform data respectively. For non-uniform data, MTM outperformed both Hammer and Quake. For uniform data, MTM showed better performance than Quake and comparable results to Hammer. By making better error correction with MTM, the quality of downstream analysis, such as mapping and SNP detection, was improved. SNP calling is a major application of NGS technologies. However, the existence of sequencing errors complicates this process, especially for the low coverage (

Molecular tools to improve chestnut management: El Bierzo as a case study

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The European chestnut (Castanea sativa Mill.) is a multipurpose species that has been widely cultivated around the Mediterranean basin since ancient times. New varieties were brought to the Iberian Peninsula during the Roman Empire, which coexist since then with native populations that survived the last glaciation. The relevance of chestnut cultivation has being steadily growing since the Middle Ages, until the rural decline of the past century put a stop to this trend. Forest fires and diseases were also major factors. Chestnut cultivation is gaining momentum again due to its economic (wood, fruits) and ecologic relevance, and represents currently an important asset in many rural areas of Europe. In this Thesis we apply different molecular tools to help improve current management strategies. For this study we have chosen El Bierzo (Castile and Leon, NW Spain), which has a centenary tradition of chestnut cultivation and management, and also presents several unique features from a genetic perspective (next paragraph). Moreover, its nuts are widely appreciated in Spain and abroad for their organoleptic properties. We have focused our experimental work on two major problems faced by breeders and the industry: the lack of a fine-grained genetic characterization and the need for new strategies to control blight disease. To characterize with sufficient detail the genetic diversity and structure of El Bierzo orchards, we analyzed DNA from 169 trees grafted for nut production covering the entire region. We also analyzed 62 nuts from all traditional varieties. El Bierzo constitutes an outstanding scenario to study chestnut genetics and the influence of human management because: (i) it is located at one extreme of the distribution area; (ii) it is a major glacial refuge for the native species; (iii) it has a long tradition of human management (since Roman times, at least); and (iv) its geographical setting ensures an unusual degree of genetic isolation. Thirteen microsatellite markers provided enough informativeness and discrimination power to genotype at the individual level. Together with an unexpected level of genetic variability, we found evidence of genetic structure, with three major gene pools giving rise to the current population. High levels of genetic differentiation between groups supported this organization. Interestingly, genetic structure does not match with spatial boundaries, suggesting that the exchange of material and cultivation practices have strongly influenced natural gene flow. The microsatellite markers selected for this study were also used to classify a set of 62 samples belonging to all traditional varieties. We identified several cases of synonymies and homonymies, evidencing the need to substitute traditional classification systems with new tools for genetic profiling. Management and conservation strategies should also benefit from these tools. The avenue of high-throughput sequencing technologies, combined with the development of bioinformatics tools, have paved the way to study transcriptomes without the need for a reference genome. We took advantage of RNA sequencing and de novo assembly tools to determine the transcriptional landscape of chestnut in response to blight disease. In addition, we have selected a set of candidate genes with high potential for developing resistant varieties via genetic engineering. Our results evidenced a deep transcriptional reprogramming upon fungal infection. The plant hormones ET and JA appear to orchestrate the defensive response. Interestingly, our results also suggest a role for auxins in modulating such response. Many transcription factors were identified in this work that interact with promoters of genes involved in disease resistance. Among these genes, we have conducted a functional characterization of a two major thaumatin-like proteins (TLP) that belongs to the PR5 family. Two genes encoding chestnut cotyledon TLPs have been previously characterized, termed CsTL1 and CsTL2. We substantiate here their protective role against blight disease for the first time, including in silico, in vitro and in vivo evidence. The synergy between TLPs and other antifungal proteins, particularly endo-p-1,3-glucanases, bolsters their interest for future control strategies based on biotechnological approaches.

The transcriptome of metamorphosing flatfish

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background Flatfish metamorphosis denotes the extraordinary transformation of a symmetric pelagic larva into an asymmetric benthic juvenile. Metamorphosis in vertebrates is driven by thyroid hormones (THs), but how they orchestrate the cellular, morphological and functional modifications associated with maturation to juvenile/adult states in flatfish is an enigma. Since THs act via thyroid receptors that are ligand activated transcription factors, we hypothesized that the maturation of tissues during metamorphosis should be preceded by significant modifications in the transcriptome. Targeting the unique metamorphosis of flatfish and taking advantage of the large size of Atlantic halibut (Hippoglossus hippoglossus) larvae, we determined the molecular basis of TH action using RNA sequencing. Results De novo assembly of sequences for larval head, skin and gastrointestinal tract (GI-tract) yielded 90,676, 65,530 and 38,426 contigs, respectively. More than 57 % of the assembled sequences were successfully annotated using a multi-step Blast approach. A unique set of biological processes and candidate genes were identified specifically associated with changes in morphology and function of the head, skin and GI-tract. Transcriptome dynamics during metamorphosis were mapped with SOLiD sequencing of whole larvae and revealed greater than 8,000 differentially expressed (DE) genes significantly (p < 0.05) up- or down-regulated in comparison with the juvenile stage. Candidate transcripts quantified by SOLiD and qPCR analysis were significantly (r = 0.843; p < 0.05) correlated. The majority (98 %) of DE genes during metamorphosis were not TH-responsive. TH-responsive transcripts clustered into 6 groups based on their expression pattern during metamorphosis and the majority of the 145 DE TH-responsive genes were down-regulated. Conclusions A transcriptome resource has been generated for metamorphosing Atlantic halibut and over 8,000 DE transcripts per stage were identified. Unique sets of biological processes and candidate genes were associated with changes in the head, skin and GI-tract during metamorphosis. A small proportion of DE transcripts were TH-responsive, suggesting that they trigger gene networks, signalling cascades and transcription factors, leading to the overt changes in tissue occurring during metamorphosis.

De Novo Transcriptome Sequence Assembly and Analysis of RNA Silencing Genes of Nicotiana benthamiana

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Background: Nicotiana benthamiana has been widely used for transient gene expression assays and as a model plant in the study of plant-microbe interactions, lipid engineering and RNA silencing pathways. Assembling the sequence of its transcriptome provides information that, in conjunction with the genome sequence, will facilitate gaining insight into the plant's capacity for high-level transient transgene expression, generation of mobile gene silencing signals, and hyper-susceptibility to viral infection. Methodology/Results: RNA-seq libraries from 9 different tissues were deep sequenced and assembled, de novo, into a representation of the transcriptome. The assembly, of16GB of sequence, yielded 237,340 contigs, clustering into 119,014 transcripts (unigenes). Between 80 and 85% of reads from all tissues could be mapped back to the full transcriptome. Approximately 63% of the unigenes exhibited a match to the Solgenomics tomato predicted proteins database. Approximately 94% of the Solgenomics N. benthamiana unigene set (16,024 sequences) matched our unigene set (119,014 sequences). Using homology searches we identified 31 homologues that are involved in RNAi-associated pathways in Arabidopsis thaliana, and show that they possess the domains characteristic of these proteins. Of these genes, the RNA dependent RNA polymerase gene, Rdr1, is transcribed but has a 72 nt insertion in exon1 that would cause premature termination of translation. Dicer-like 3 (DCL3) appears to lack both the DEAD helicase motif and second dsRNA binding motif, and DCL2 and AGO4b have unexpectedly high levels of transcription. Conclusions: The assembled and annotated representation of the transcriptome and list of RNAi-associated sequences are accessible at www.benthgenome.com alongside a draft genome assembly. These genomic resources will be very useful for further study of the developmental, metabolic and defense pathways of N. benthamiana and in understanding the mechanisms behind the features which have made it such a well-used model plant. © 2013 Nakasugi et al.

De novo design and molecular assembly of a transmembrane diporphyrin-binding protein complex.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The de novo design of membrane proteins remains difficult despite recent advances in understanding the factors that drive membrane protein folding and association. We have designed a membrane protein PRIME (PoRphyrins In MEmbrane) that positions two non-natural iron diphenylporphyrins (Fe(III)DPP's) sufficiently close to provide a multicentered pathway for transmembrane electron transfer. Computational methods previously used for the design of multiporphyrin water-soluble helical proteins were extended to this membrane target. Four helices were arranged in a D(2)-symmetrical bundle to bind two Fe(II/III) diphenylporphyrins in a bis-His geometry further stabilized by second-shell hydrogen bonds. UV-vis absorbance, CD spectroscopy, analytical ultracentrifugation, redox potentiometry, and EPR demonstrate that PRIME binds the cofactor with high affinity and specificity in the expected geometry.

New Contig Creation Algorithm for the de novo DNA Assembly Problem

Relevância:

40.00% 40.00%

Publicador:

Resumo:

DNA assembly is among the most fundamental and difficult problems in bioinformatics. Near optimal assembly solutions are available for bacterial and small genomes, however assembling large and complex genomes especially the human genome using Next-Generation-Sequencing (NGS) technologies is shown to be very difficult because of the highly repetitive and complex nature of the human genome, short read lengths, uneven data coverage and tools that are not specifically built for human genomes. Moreover, many algorithms are not even scalable to human genome datasets containing hundreds of millions of short reads. The DNA assembly problem is usually divided into several subproblems including DNA data error detection and correction, contig creation, scaffolding and contigs orientation; each can be seen as a distinct research area. This thesis specifically focuses on creating contigs from the short reads and combining them with outputs from other tools in order to obtain better results. Three different assemblers including SOAPdenovo [Li09], Velvet [ZB08] and Meraculous [CHS+11] are selected for comparative purposes in this thesis. Obtained results show that this thesis’ work produces comparable results to other assemblers and combining our contigs to outputs from other tools, produces the best results outperforming all other investigated assemblers.

Dissection of De Novo Membrane Insertion Activities of Internal Transmembrane Segments of ATP-Binding-Cassette Transporters: Toward Understanding Topological Rules for Membrane Assembly of Polytopic Membrane Proteins

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The membrane assembly of polytopic membrane proteins is a complicated process. Using Chinese hamster P-glycoprotein (Pgp) as a model protein, we investigated this process previously and found that Pgp expresses more than one topology. One of the variations occurs at the transmembrane (TM) domain including TM3 and TM4: TM4 inserts into membranes in an Nin-Cout rather than the predicted Nout-Cin orientation, and TM3 is in cytoplasm rather than the predicted Nin-Cout orientation in the membrane. It is possible that TM4 has a strong activity to initiate the Nin-Cout membrane insertion, leaving TM3 out of the membrane. Here, we tested this hypothesis by expressing TM3 and TM4 in isolated conditions. Our results show that TM3 of Pgp does not have de novo Nin-Cout membrane insertion activity whereas TM4 initiates the Nin-Cout membrane insertion regardless of the presence of TM3. In contrast, TM3 and TM4 of another polytopic membrane protein, cystic fibrosis transmembrane conductance regulator (CFTR), have a similar level of de novo Nin-Cout membrane insertion activity and TM4 of CFTR functions only as a stop-transfer sequence in the presence of TM3. Based on these findings, we propose that 1) the membrane insertion of TM3 and TM4 of Pgp does not follow the sequential model, which predicts that TM3 initiates Nin-Cout membrane insertion whereas TM4 stops the insertion event; and 2) “leaving one TM segment out of the membrane” may be an important folding mechanism for polytopic membrane proteins, and it is regulated by the Nin-Cout membrane insertion activities of the TM segments.

Combining transcriptome assemblies from multiple de novo assemblers in the allo-tetraploid plant Nicotiana benthamiana

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background Nicotiana benthamiana is an allo-tetraploid plant, which can be challenging for de novo transcriptome assemblies due to homeologous and duplicated gene copies. Transcripts generated from such genes can be distinct yet highly similar in sequence, with markedly differing expression levels. This can lead to unassembled, partially assembled or mis-assembled contigs. Due to the different properties of de novo assemblers, no one assembler with any one given parameter space can re-assemble all possible transcripts from a transcriptome. Results In an effort to maximise the diversity and completeness of de novo assembled transcripts, we utilised four de novo transcriptome assemblers, TransAbyss, Trinity, SOAPdenovo-Trans, and Oases, using a range of k-mer sizes and different input RNA-seq read counts. We complemented the parameter space biologically by using RNA from 10 plant tissues. We then combined the output of all assemblies into a large super-set of sequences. Using a method from the EvidentialGene pipeline, the combined assembly was reduced from 9.9 million de novo assembled transcripts to about 235,000 of which about 50,000 were classified as primary. Metrics such as average bit-scores, feature response curves and the ability to distinguish paralogous or homeologous transcripts, indicated that the EvidentialGene processed assembly was of high quality. Of 35 RNA silencing gene transcripts, 34 were identified as assembled to full length, whereas in a previous assembly using only one assembler, 9 of these were partially assembled. Conclusions To achieve a high quality transcriptome, it is advantageous to implement and combine the output from as many different de novo assemblers as possible. We have in essence taking the ‘best’ output from each assembler while minimising sequence redundancy. We have also shown that simultaneous assessment of a variety of metrics, not just focused on contig length, is necessary to gauge the quality of assemblies.

«
1
2
3
4
5
6
7
8
...
65
66
»