72 resultados para Reference transcriptome
em Repositório Institucional UNESP - Universidade Estadual Paulista "Julio de Mesquita Filho"
Resumo:
High Throughput Sequencing capabilities have made the process of assembling a transcriptome easier, whether or not there is a reference genome. But the quality of a transcriptome assembly must be good enough to capture the most comprehensive catalog of transcripts and their variations, and to carry out further experiments on transcriptomics. There is currently no consensus on which of the many sequencing technologies and assembly tools are the most effective. Many non-model organisms lack a reference genome to guide the transcriptome assembly. One question, therefore, is whether or not a reference-based genome assembly gives better results than de novo assembly. The blood-sucking insect Rhodnius prolixus-a vector for Chagas disease-has a reference genome. It is therefore a good model on which to compare reference-based and de novo transcriptome assemblies. In this study, we compared de novo and reference-based genome assembly strategies using three datasets (454, Illumina, 454 combined with Illumina) and various assembly software. We developed criteria to compare the resulting assemblies: the size distribution and number of transcripts, the proportion of potentially chimeric transcripts, how complete the assembly was (completeness evaluated both through CEGMA software and R. prolixus proteome fraction retrieved). Moreover, we looked for the presence of two chemosensory gene families (Odorant-Binding Proteins and Chemosensory Proteins) to validate the assembly quality. The reference-based assemblies after genome annotation were clearly better than those generated using de novo strategies alone. Reference-based strategies revealed new transcripts, including new isoforms unpredicted by automatic genome annotation. However, a combination of both de novo and reference-based strategies gave the best result, and allowed us to assemble fragmented transcripts.
Resumo:
High throughput sequencing (HTS) provides new research opportunities for work on non-model organisms, such as differential expression studies between populations exposed to different environmental conditions. However, such transcriptomic studies first require the production of a reference assembly. The choice of sampling procedure, sequencing strategy and assembly workflow is crucial. To develop a reliable reference transcriptome for Triatoma brasiliensis, the major Chagas disease vector in Northeastern Brazil, different de novo assembly protocols were generated using various datasets and software. Both 454 and Illumina sequencing technologies were applied on RNA extracted from antennae and mouthparts from single or pooled individuals. The 454 library yielded 278 Mb. Fifteen Illumina libraries were constructed and yielded nearly 360 million RNA-seq single reads and 46 million RNA-seq paired-end reads for nearly 45 Gb. For the 454 reads, we used three assemblers, Newbler, CAP3 and/or MIRA and for the Illumina reads, the Trinity assembler. Ten assembly workflows were compared using these programs separately or in combination. To compare the assemblies obtained, quantitative and qualitative criteria were used, including contig length, N50, contig number and the percentage of chimeric contigs. Completeness of the assemblies was estimated using the CEGMA pipeline. The best assembly (57,657 contigs, completeness of 80 %, < 1 % chimeric contigs) was a hybrid assembly leading to recommend the use of (1) a single individual with large representation of biological tissues, (2) merging both long reads and short paired-end Illumina reads, (3) several assemblers in order to combine the specific advantages of each.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
The reference intervals for biochemical variables and red blood cell indices of healthy intensively bred channel catfish Ictalurus punctatus were determined. The blood variables were determined using standardized clinical methods. The reference intervals (25th and 75th percentiles) were established using a non-parametric method. Reference intervals for plasma glucose, serum total protein, sodium, potassium, calcium, magnesium, chloride concentration, primary and secondary red blood cell indices were established. The haematological and biochemical reference intervals established may allow important clinical decisions about channel catfish. (c) 2007 the Authors Journal compilation (C) 2007 the Fisheries Society of the British Isles.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
New analyses have been performed in order to enhance the data-set on the independent ages of four glasses that have been proposed as reference materials for fission-track dating. The results are as follows. Moldavite - repeated (40)Ar/(39)Ar age determinations on samples from deposits from Bohemia and Moravia yielded an average of 14.34 +/- 0.08 Ma. This datum agrees with other recent determinations and is significantly younger than the (40)Ar/(39)Ar age of 15.21 +/- 0.15 Ma determined in the early 1980s. Macusanite (Peru) -four K-Ar ages ranging from 5.44 +/- 0.06 to 5.72 +/- 0.12 Ma have been published previously. New (40)Ar/(39)Ar ages gave an average of 5.12 +/- 0.04 Ma. Plateau fission-track ages determined using the IRMM-540 certified glass and U and Th thin films for neutron fluence measurements agree better with these new (40)Ar/(39)Ar ages than the previously published ages. Roccastrada glass (Italy) - a new (40)Ar/(39)Ar age, 2.45 +/- 0.04 Ma, is consistent with previous determinations. The Quiron obsidian (Argentina) is a recently discovered glass that has been proposed as an additional reference material for its high spontaneous track density (around 100 000 cm(-2)). Defects that might produce spurious tracks are virtually absent. An independent (40)Ar/(39)Ar age of 8.77 +/- 0.09 Ma was determined and is recommended for this glass. We believe that these materials, which will be distributed upon request to fission-track groups, will be very useful for testing system calibrations and experimental procedures.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
A detailed genome mapping analysis of 213,636 expressed sequence tags (EST) derived from nontumor and tumor tissues of the oral cavity, larynx, pharynx, and thyroid was done. Transcripts matching known human genes were identified; potential new splice variants were flagged and subjected to manual curation, pointing to 788 putatively new alternative splicing isoforms, the majority (75%) being insertion events. A subset of 34 new splicing isoforms (5% of 788 events) was selected and 23 (68%) were confirmed by reverse transcription-PCR and DNA sequencing. Putative new genes were revealed, including six transcripts mapped to well-studied chromosomes such as 22, as well as transcripts that mapped to 253 intergenic regions. In addition, 2,251 noncoding intronic RNAs, eventually involved in transcriptional regulation, were found. A set of 250 candidate markers for loss of heterozygosis or gene amplification was selected by identifying transcripts that mapped to genomic regions previously known to be frequently amplified or deleted in head, neck, and thyroid tumors. Three of these markers were evaluated by quantitative reverse transcription-PCR in an independent set of individual samples. Along with detailed clinical data about tumor origin, the information reported here is now publicly available on a dedicated Web site as a resource for further biological investigation. This first in silico reconstruction of the head, neck, and thyroid transcriptomes points to a wealth of new candidate markers that can be used for future studies on the molecular basis of these tumors. Similar analysis is warranted for a number of other tumors for which large EST data sets are available.
Resumo:
open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription-PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Yeasts are becoming a common cause of nosocomial fungal infections that affect immunocompromised patients. Such infections can evolve into sepsis, whose mortality rate is high. This study aimed to evaluate the viability of Candida species identification by the automated system Vitek-Biomerieux (Durham, USA). Ninety-eight medical charts referencing the Candida spp. samples available for the study were retrospectively analyzed. The system Vitek-Biomerieux with Candida identification card is recommended for laboratory routine use and presents 80.6% agreement with the reference method. By separate analysis of species, 13.5% of C. parapsilosis samples differed from the reference method, while the Vitek system wrongly identified them as C. tropicalis, C. lusitaneae or as Candida albicans. C. glabrata presented a discrepancy of only one sample (25%), and was identified by Vitek as C. parapsilosis. C. guilliermondii also differed in only one sample (33.3%), being identified as Candida spp. All C. albicans, C. tropicalis and C. lusitaneae samples were identified correctly.
Resumo:
The comparison between the outcomes of intracytoplasmic morphologically selected sperm injection performed in couples with male factor infertility according to the World Health Organization guidelines from 1999 and 2010 was the objective of this study. Our results suggest that the sperm selection under high magnification results in improved treatment outcomes in patients with oligoasthenoteratozoospermia, according to the new World Health Organization guidelines. (Fertil Steril (R) 2011;95:2711-4. (C)2011 by American Society for Reproductive Medicine.)
Resumo:
Snake venom glands are a rich source of bioactive molecules such as peptides, proteins and enzymes that show important pharmacological activity leading to in local and systemic effects as pain, edema, bleeding and muscle necrosis. Most studies on pharmacologically active peptides and proteins from snake venoms have been concerned with isolation and structure elucidation through methods of classical biochemistry. As an attempt to examine the transcripts expressed in the venom gland of Bothrops jararacussu and to unveil the toxicological and pharmacological potential of its products at the molecular level, we generated 549 expressed sequence tags (ESTs) from a directional cDNA library. Sequences obtained from single-pass sequencing of randomly selected cDNA clones could be identified by similarities searches on existing databases, resulting in 197 sequences with significant similarity to phospholipase A(2) (PLA(2)), of which 83.2% were Lys49-PLA(2) homologs (BOJU-1), 0.1% were basic Asp49-PLA(2)s (BOJU-II) and 0.6% were acidic Asp49-PLA(2)s (BOJU-III). Adjoining this very abundant class of proteins we found 88 transcripts codifying for putative sequences of metalloproteases, which after clustering and assembling resulted in three full-length sequences: BOJUMET-I, BOJUMET-II and BOJUMET-III; as well as 25 transcripts related to C-type lectin like protein including a full-length cDNA of a putative galactose binding C-type lectin and a cluster of eight serine-proteases transcripts including a full-length cDNA of a putative serine protease. Among the full-length sequenced clones we identified a nerve growth factor (Bj-NGF) with 92% identity with a human NGF (NGHUBM) and an acidic phospholipase A2 (BthA-I-PLA(2)) displaying 85-93% identity with other snake venom toxins. Genetic distance among PLA(2)s from Bothrops species were evaluated by phylogenetic analysis. Furthermore, analysis of full-length putative Lys49-PLA(2) through molecular modeling showed conserved structural domains, allowing the characterization of those proteins as group II PLA(2)s. The constructed cDNA library provides molecular clones harboring sequences that can be used to probe directly the genetic material from gland venom of other snake species. Expression of complete cDNAs or their modified derivatives will be useful for elucidation of the structure-function relationships of these toxins and peptides of biotechnological interest. (C) 2004 Elsevier SAS. All rights reserved.
Resumo:
Over 40,000 sugarcane (Saccharum officinarum) consensus sequences assembled from 237,954 expressed sequence tags were compared with the protein and DNA sequences from other angiosperms, including the genomes of Arabidopsis and rice (Oryza sativa). Approximately two-thirds of the sugarcane transcriptome have similar sequences in Arabidopsis. These sequences may represent a core set of proteins or protein domains that are conserved among monocots and eudicots and probably encode for essential angiosperm. functions. The remaining sequences represent putative monocot-specific genetic material, one-half of which were found only in sugarcane. These monocot-specific cDNAs represent either novelties or, in many cases, fast-evolving sequences that diverged substantially from their eudicot homologs. The wide comparative genome analysis presented here provides information on the evolutionary changes that underlie the divergence of monocots and eudicots. Our comparative analysis also led to the identification of several not yet annotated putative genes and possible gene loss events in Arabidopsis.
Resumo:
We report the results of a transcript finishing initiative, undertaken for the purpose of identifying and characterizing novel human transcripts, in which RT-PCR was used to bridge gaps between paired EST Clusters, mapped against the genomic sequence. Each pair of EST Clusters selected for experimental validation was designated a transcript finishing unit (TFU). A total of 489 TFUs were selected for validation, and an overall efficiency of 43.1% was achieved. We generated a total of 59,975 bp of transcribed sequences organized into 432 exons, contributing to the definition of the structure of 211 human transcripts. The structure of several transcripts reported here was confirmed during the course of this project, through the generation of their corresponding full-length cDNA sequences. Nevertheless, for 21% of the validated TFUs, a full-length cDNA sequence is not yet available in public databases, and the structure of 69.2% of these TFUs was not correctly predicted by computer programs. The TF strategy provides a significant contribution to the definition of the complete catalog of human genes and transcripts, because it appears to be particularly useful for identification of low abundance transcripts expressed in a restricted Set of tissues as well as for the delineation of gene boundaries and alternatively spliced isoforms.