8 resultados para Genome Annotation Assessment
em Repositório Institucional UNESP - Universidade Estadual Paulista "Julio de Mesquita Filho"
Resumo:
Most of the tasks in genome annotation can be at least partially automated. Since this annotation is time-consuming, facilitating some parts of the process - thus freeing the specialist to carry out more valuable tasks - has been the motivation of many tools and annotation environments. In particular, annotation of protein function can benefit from knowledge about enzymatic processes. The use of sequence homology alone is not a good approach to derive this knowledge when there are only a few homologues of the sequence to be annotated. The alternative is to use motifs. This paper uses a symbolic machine learning approach to derive rules for the classification of enzymes according to the Enzyme Commission (EC). Our results show that, for the top class, the average global classification error is 3.13%. Our technique also produces a set of rules relating structural to functional information, which is important to understand the protein tridimensional structure and determine its biological function. © 2009 Springer Berlin Heidelberg.
Resumo:
High Throughput Sequencing capabilities have made the process of assembling a transcriptome easier, whether or not there is a reference genome. But the quality of a transcriptome assembly must be good enough to capture the most comprehensive catalog of transcripts and their variations, and to carry out further experiments on transcriptomics. There is currently no consensus on which of the many sequencing technologies and assembly tools are the most effective. Many non-model organisms lack a reference genome to guide the transcriptome assembly. One question, therefore, is whether or not a reference-based genome assembly gives better results than de novo assembly. The blood-sucking insect Rhodnius prolixus-a vector for Chagas disease-has a reference genome. It is therefore a good model on which to compare reference-based and de novo transcriptome assemblies. In this study, we compared de novo and reference-based genome assembly strategies using three datasets (454, Illumina, 454 combined with Illumina) and various assembly software. We developed criteria to compare the resulting assemblies: the size distribution and number of transcripts, the proportion of potentially chimeric transcripts, how complete the assembly was (completeness evaluated both through CEGMA software and R. prolixus proteome fraction retrieved). Moreover, we looked for the presence of two chemosensory gene families (Odorant-Binding Proteins and Chemosensory Proteins) to validate the assembly quality. The reference-based assemblies after genome annotation were clearly better than those generated using de novo strategies alone. Reference-based strategies revealed new transcripts, including new isoforms unpredicted by automatic genome annotation. However, a combination of both de novo and reference-based strategies gave the best result, and allowed us to assemble fragmented transcripts.
Resumo:
To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST),program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged.
Resumo:
Although many Brazilian sugar mills initiate the fermentation process by inoculating selected commercial Saccharomyces cerevisiae strains, the unsterile conditions of the industrial sugar cane ethanol fermentation process permit the constant entry of native yeast strains. Certain of those native strains are better adapted and tend to predominate over the initial strain, which may cause problems during fermentation. In the industrial fermentation process, yeast cells are often exposed to stressful environmental conditions, including prolonged cell recycling, ethanol toxicity and osmotic, oxidative or temperature stress. Little is known about these S. cerevisiae strains, although recent studies have demonstrated that heterogeneous genome architecture is exhibited by some selected well-adapted Brazilian indigenous yeast strains that display high performance in bioethanol fermentation. In this study, 11 microsatellite markers were used to assess the genetic diversity and population structure of the native autochthonous S. cerevisiae strains in various Brazilian sugar mills. The resulting multilocus data were used to build a similarity-based phenetic tree and to perform a Bayesian population structure analysis. The tree revealed the presence of great genetic diversity among the strains, which were arranged according to the place of origin and the collection year. The population structure analysis revealed genotypic differences among populations; in certain populations, these genotypic differences are combined to yield notably genotypically diverse individuals. The high yeast diversity observed among native S. cerevisiae strains provides new insights on the use of autochthonous high-fitness strains with industrial characteristics as starter cultures at bioethanol plants. © 2013 John Wiley & Sons, Ltd.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
The length of the post-partum anoestrous interval affects reproductive efficiency in many tropical beef cattle herds. In this study, results from genome-wide association studies (Experiment 1: GWAS) and gene expression (Experiment 2: microarray) were combined in a systems approach to reveal genetic markers, genes and pathways underlying the physiology of post-partum anoestrus in tropically adapted cattle. The microarray study measured the expression of 13,964 genes in the hypothalamus of Brahman cows. A total of 366 genes were differentially expressed (DE) in the post-partum period, when acyclic cows were compared to cows that had resumed ovarian cycles. Associated markers (P < 0.05) from a high density GWAS pointed to 2829 genes that were associated with post-partum anoestrous interval (PPAI) in two populations of beef cattle: Brahman and Tropical composite. Together the experiments provided evidence for 63 genes that are likely to influence the resumption of ovulation post-partum in tropically adapted beef cattle. Functional annotation analysis revealed that some of the 63 genes have known roles in hormonal activity, energy balance and neuronal synapse plasticity. Polymorphisms within candidate genes identified by this systems approach could have biological significance in post-partum anoestrus and help select Zebu (Bos indicus) influenced cattle with genetic potential for shorter post-partum anoestrus. Crown Copyright (C) 2014 Published by Elsevier B.V. All rights reserved.
Resumo:
The use of relatively low numbers of sires in cattle breeding programs, particularly on those for carcass and weight traits in Nellore beef cattle (Bos indicus) in Brazil, has always raised concerns about inbreeding, which affects conservation of genetic resources and sustainability of this breed. Here, we investigated the distribution of autozygosity levels based on runs of homozygosity (ROH) in a sample of 1,278 Nellore cows, genotyped for over 777,000 SNPs. We found ROH segments larger than 10 Mb in over 70% of the samples, representing signatures most likely related to the recent massive use of few sires. However, the average genome coverage by ROH (>1 Mb) was lower than previously reported for other cattle breeds (4.58%). In spite of 99.98% of the SNPs being included within a ROH in at least one individual, only 19.37% of the markers were encompassed by common ROH, suggesting that the ongoing selection for weight, carcass and reproductive traits in this population is too recent to have produced selection signatures in the form of ROH. Three short-range highly prevalent ROH autosomal hotspots (occurring in over 50% of the samples) were observed, indicating candidate regions most likely under selection since before the foundation of Brazilian Nellore cattle. The putative signatures of selection on chromosomes 4, 7, and 12 may be involved in resistance to infectious diseases and fertility, and should be subject of future investigation.