38 resultados para Genome annotation

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background: Even before having its genome sequence published in 2004, Kluyveromyces lactis had long been considered a model organism for studies in genetics and physiology. Research on Kluyveromyces lactis is quite advanced and this yeast species is one of the few with which it is possible to perform formal genetic analysis. Nevertheless, until now, no complete metabolic functional annotation has been performed to the proteins encoded in the Kluyveromyces lactis genome. Results: In this work, a new metabolic genome-wide functional re-annotation of the proteins encoded in the Kluyveromyces lactis genome was performed, resulting in the annotation of 1759 genes with metabolic functions, and the development of a methodology supported by merlin (software developed in-house). The new annotation includes novelties, such as the assignment of transporter superfamily numbers to genes identified as transporter proteins. Thus, the genes annotated with metabolic functions could be exclusively enzymatic (1410 genes), transporter proteins encoding genes (301 genes) or have both metabolic activities (48 genes). The new annotation produced by this work largely surpassed the Kluyveromyces lactis currently available annotations. A comparison with KEGG’s annotation revealed a match with 844 (~90%) of the genes annotated by KEGG, while adding 850 new gene annotations. Moreover, there are 32 genes with annotations different from KEGG. Conclusions: The methodology developed throughout this work can be used to re-annotate any yeast or, with a little tweak of the reference organism, the proteins encoded in any sequenced genome. The new annotation provided by this study offers basic knowledge which might be useful for the scientific community working on this model yeast, because new functions have been identified for the so-called metabolic genes. Furthermore, it served as the basis for the reconstruction of a compartmentalized, genome-scale metabolic model of Kluyveromyces lactis, which is currently being finished.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The Xylella fastidiosa comparative genomic database is a scientific resource with the aim to provide a user-friendly interface for accessing high-quality manually curated genomic annotation and comparative sequence analysis, as well as for identifying and mapping prophage-like elements, a marked feature of Xylella genomes. Here we describe a database and tools for exploring the biology of this important plant pathogen. The hallmarks of this database are the high quality genomic annotation, the functional and comparative genomic analysis and the identification and mapping of prophage-like elements. It is available from web site http://www.xylella.lncc.br.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Schistosoma mansoni is one of the agents of schistosomiasis, a chronic and debilitating disease. Here we, present a transcriptome-wide characterization of adult S. mansoni males by high-throughput RNA-sequencing. We obtained 1,620,432 high-quality ESTs from a directional strand-specific cDNA library, resulting in a 26% higher coverage of genome bases than that of the public ESTs available at NCBI. With a 15 x-deep coverage of transcribed genomic regions, our data were able to (i) confirm for the first time 990 predictions without previous evidence of transcription; (ii) correct gene predictions; (iii) discover 989 and 1196 RNA-seq contigs that map to intergenic and intronic genomic regions, respectively, where no gene had been predicted before. These contigs could represent new protein-coding genes or non-coding RNAs (ncRNAs). Interestingly, we identified 11 novel Micro-exon genes (MEGs). These data reveal new features of the S. mansoni transcriptional landscape and significantly advance our understanding of the parasite transcriptome. (c) 2011 Elsevier Inc. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Abstract Background The ongoing efforts to sequence the honey bee genome require additional initiatives to define its transcriptome. Towards this end, we employed the Open Reading frame ESTs (ORESTES) strategy to generate profiles for the life cycle of Apis mellifera workers. Results Of the 5,021 ORESTES, 35.2% matched with previously deposited Apis ESTs. The analysis of the remaining sequences defined a set of putative orthologs whose majority had their best-match hits with Anopheles and Drosophila genes. CAP3 assembly of the Apis ORESTES with the already existing 15,500 Apis ESTs generated 3,408 contigs. BLASTX comparison of these contigs with protein sets of organisms representing distinct phylogenetic clades revealed a total of 1,629 contigs that Apis mellifera shares with different taxa. Most (41%) represent genes that are in common to all taxa, another 21% are shared between metazoans (Bilateria), and 16% are shared only within the Insecta clade. A set of 23 putative genes presented a best match with human genes, many of which encode factors related to cell signaling/signal transduction. 1,779 contigs (52%) did not match any known sequence. Applying a correction factor deduced from a parallel analysis performed with Drosophila melanogaster ORESTES, we estimate that approximately half of these no-match ESTs contigs (22%) should represent Apis-specific genes. Conclusions The versatile and cost-efficient ORESTES approach produced minilibraries for honey bee life cycle stages. Such information on central gene regions contributes to genome annotation and also lends itself to cross-transcriptome comparisons to reveal evolutionary trends in insect genomes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Vibrio campbellii PEL22A was isolated from open ocean water in the Abrolhos Bank. The genome of PEL22A consists of 6,788,038 bp (the GC content is 45%). The number of coding sequences (CDS) is 6,359, as determined according to the Rapid Annotation using Subsystem Technology (RAST) server. The number of ribosomal genes is 80, of which 68 are tRNAs and 12 are rRNAs. V. campbellii PEL22A contains genes related to virulence and fitness, including a complete proteorhodopsin cluster, complete type II and III secretion systems, incomplete type I, IV, and VI secretion systems, a hemolysin, and CTX Phi.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Exiguobacterium antarcticum is a psychotropic bacterium isolated for the first time from microbial mats of Lake Fryxell in Antarctica. Many organisms of the genus Exiguobacterium are extremophiles and have properties of biotechnological interest, e. g., the capacity to adapt to cold, which make this genus a target for discovering new enzymes, such as lipases and proteases, in addition to improving our understanding of the mechanisms of adaptation and survival at low temperatures. This study presents the genome of E. antarcticum B7, isolated from a biofilm sample of Ginger Lake on King George Island, Antarctic peninsula.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Genome-wide association studies have failed to establish common variant risk for the majority of common human diseases. The underlying reasons for this failure are explained by recent studies of resequencing and comparison of over 1200 human genomes and 10 000 exomes, together with the delineation of DNA methylation patterns (epigenome) and full characterization of coding and noncoding RNAs (transcriptome) being transcribed. These studies have provided the most comprehensive catalogues of functional elements and genetic variants that are now available for global integrative analysis and experimental validation in prospective cohort studies. With these datasets, researchers will have unparalleled opportunities for the alignment, mining, and testing of hypotheses for the roles of specific genetic variants, including copy number variations, single nucleotide polymorphisms, and indels as the cause of specific phenotypes and diseases. Through the use of next-generation sequencing technologies for genotyping and standardized ontological annotation to systematically analyze the effects of genomic variation on humans and model organism phenotypes, we will be able to find candidate genes and new clues for disease’s etiology and treatment. This article describes essential concepts in genetics and genomic technologies as well as the emerging computational framework to comprehensively search websites and platforms available for the analysis and interpretation of genomic data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Malaria caused by Plasmodium vivax is an experimentally neglected severe disease with a substantial burden on human health. Because of technical limitations, little is known about the biology of this important human pathogen. Whole genome analysis methods on patient-derived material are thus likely to have a substantial impact on our understanding of P. vivax pathogenesis and epidemiology. For example, it will allow study of the evolution and population biology of the parasite, allow parasite transmission patterns to be characterized, and may facilitate the identification of new drug resistance genes. Because parasitemias are typically low and the parasite cannot be readily cultured, on-site leukocyte depletion of blood samples is typically needed to remove human DNA that may be 1000X more abundant than parasite DNA. These features have precluded the analysis of archived blood samples and require the presence of laboratories in close proximity to the collection of field samples for optimal pre-cryopreservation sample preparation. Results: Here we show that in-solution hybridization capture can be used to extract P. vivax DNA from human contaminating DNA in the laboratory without the need for on-site leukocyte filtration. Using a whole genome capture method, we were able to enrich P. vivax DNA from bulk genomic DNA from less than 0.5% to a median of 55% (range 20%-80%). This level of enrichment allows for efficient analysis of the samples by whole genome sequencing and does not introduce any gross biases into the data. With this method, we obtained greater than 5X coverage across 93% of the P. vivax genome for four P. vivax strains from Iquitos, Peru, which is similar to our results using leukocyte filtration (greater than 5X coverage across 96% of the genome). Conclusion: The whole genome capture technique will enable more efficient whole genome analysis of P. vivax from a larger geographic region and from valuable archived sample collections.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Saccharomyces cerevisiae strains widely used for industrial fuel-ethanol production have been developed by selection, but their underlying beneficial genetic polymorphisms remain unknown. Here, we report the draft whole-genome sequence of the S. cerevisiae strain CAT-1, which is a dominant fuel-ethanol fermentative strain from the sugarcane industry in Brazil. Our results indicate that strain CAT-1 is a highly heterozygous diploid yeast strain, and the similar to 12-Mb genome of CAT-1, when compared with the reference S228c genome, contains similar to 36,000 homozygous and similar to 30,000 heterozygous single nucleotide polymorphisms, exhibiting an uneven distribution among chromosomes due to large genomic regions of loss of heterozygosity (LOH). In total, 58 % of the 6,652 predicted protein-coding genes of the CAT-1 genome constitute different alleles when compared with the genes present in the reference S288c genome. The CAT-1 genome contains a reduced number of transposable elements, as well as several gene deletions and duplications, especially at telomeric regions, some correlated with several of the physiological characteristics of this industrial fuel-ethanol strain. Phylogenetic analyses revealed that some genes were likely associated with traits important for bioethanol production. Identifying and characterizing the allelic variations controlling traits relevant to industrial fermentation should provide the basis for a forward genetics approach for developing better fermenting yeast strains.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Transposons are abundant components of eukaryotic genomes, and play important role in genome evolution. The knowledge about these elements should contribute to the understanding of their impact on the host genomes. The hAT transposon superfamily is one of the best characterized superfamilies in diverse organisms, nevertheless, a detailed study of these elements was never carried in sugarcane. To address this question we analyzed 32 cDNAs similar to that of hAT superfamily of transposons previously identified in the sugarcane transcriptome. Our results revealed that these hAT-like transposases cluster in one highly homogeneous and other more heterogeneous lineage. We present evidences that support the hypothesis that the highly homogeneous group is a domesticated transposase while the remainder of the lineages are composed of transposon units. The first is common to grasses, clusters significantly with domesticated transposases from Arabidopsis, rice and sorghum and is expressed in different tissues of two sugarcane cultivars analyzed. In contrast, the more heterogeneous group represents at least two transposon lineages. We recovered five genomic versions of one lineage, characterizing a novel transposon family with conserved DDE motif, named SChAT. These results indicate the presence of at least three distinct lineages of hAT-like transposase paralogues in sugarcane genome, including a novel transposon family described in Saccharum and a domesticated transposase. Taken together, these findings permit to follow the diversification of some hAT transposase paralogues in sugarcane, aggregating knowledge about the co-evolution of transposons and their host genomes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Protozoan parasites cause thousands of deaths each year in developing countries. The genome projects of these parasites opened a new era in the identification of therapeutic targets. However, the putative function could be predicted for fewer than half of the protein-coding genes. In this work, all Trypanosoma cruzi proteins containing predicted transmembrane spans were processed through an automated computational routine and further analyzed in order to assign the most probable function. The analysis consisted of dissecting the whole predicted protein in different regions. More than 5,000 sequences were processed, and the predicted biological functions were grouped into 19 categories according to the hits obtained after analysis. One focus of interest, due to the scarce information available on trypanosomatids, is the proteins involved in signal-transduction processes. In the present work, we identified 54 proteins belonging to this group, which were individually analyzed. The results show that by means of a simple pipeline it was possible to attribute probable functions to sequences annotated as coding for "hypothetical proteins.'' Also, we successfully identified the majority of candidates participating in the signal-transduction pathways in T. cruzi.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In silico analyses of Leishmania spp. genome data are a powerful resource to improve the understanding of these pathogens' biology. Trypanosomatids such as Leishmania spp. have their protein-coding genes grouped in long polycistronic units of functionally unrelated genes. The control of gene expression happens by a variety of posttranscriptional mechanisms. The high degree of synteny among Leishmania species is accompanied by highly conserved coding sequences (CDS) and poorly conserved intercoding untranslated sequences. To identify the elements involved in the control of gene expression, we conducted an in silico investigation to find conserved intercoding sequences (CICS) in the genomes of L major, L infantum, and L braziliensis. We used a combination of computational tools, such as Linux-Shell, PERL and R languages, BLAST, MSPcrunch, SSAKE, and Pred-A-Term algorithms to construct a pipeline which was able to: (i) search for conservation in target-regions, (ii) eliminate CICS redundancy and mask repeat elements, (iii) predict the mRNA's extremities, (iv) analyze the distribution of orthologous genes within the generated LeishCICS-clusters, (v) assign GO terms to the LeishCICS-clusters. and (vi) provide statistical support for the gene-enrichment annotation. We associated the LeishCICS-cluster data, generated at the end of the pipeline, with the expression profile oft. donovani genes during promastigote-amastigote differentiation, as previously evaluated by others (GEO accession: GSE21936). A Pearson's correlation coefficient greater than 0.5 was observed for 730 LeishCICS-clusters containing from 2 to 17 genes. The designed computational pipeline is a useful tool and its application identified potential regulatory cis elements and putative regulons in Leishmania. (C) 2012 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A taxonomic and annotated functional description of microbial life was deduced from 53 Mb of metagenomic sequence retrieved from a planktonic fraction of the Neotropical high Andean (3,973 meters above sea level) acidic hot spring El Coquito (EC). A classification of unassembled metagenomic reads using different databases showed a high proportion of Gammaproteobacteria and Alphaproteobacteria (in total read affiliation), and through taxonomic affiliation of 16S rRNA gene fragments we observed the presence of Proteobacteria, micro-algae chloroplast and Firmicutes. Reads mapped against the genomes Acidiphilium cryptum JF-5, Legionella pneumophila str. Corby and Acidithiobacillus caldus revealed the presence of transposase-like sequences, potentially involved in horizontal gene transfer. Functional annotation and hierarchical comparison with different datasets obtained by pyrosequencing in different ecosystems showed that the microbial community also contained extensive DNA repair systems, possibly to cope with ultraviolet radiation at such high altitudes. Analysis of genes involved in the nitrogen cycle indicated the presence of dissimilatory nitrate reduction to N2 (narGHI, nirS, norBCDQ and nosZ), associated with Proteobacteria-like sequences. Genes involved in the sulfur cycle (cysDN, cysNC and aprA) indicated adenylsulfate and sulfite production that were affiliated to several bacterial species. In summary, metagenomic sequence data provided insight regarding the structure and possible functions of this hot spring microbial community, describing some groups potentially involved in the nitrogen and sulfur cycling in this environment. Citation: Jimenez DJ, Andreote FD, Chaves D, Montana JS, Osorio-Forero C, et al. (2012) Structural and Functional Insights from the Metagenome of an Acidic Hot Spring Microbial Planktonic Community in the Colombian Andes. PLoS ONE 7(12): e52069. doi:10.1371/journal.pone.0052069

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Human cells are constantly exposed to DNA damage. Without repair, damage can result in genetic instability and eventually cancer. The strong association between the lack of DNA damage repair, mutations and cancer is dramatically demonstrated by a number of cancer-prone human syndromes, such as xeroderma pigmentosum (XP), ataxia-telangiectasia (AT) and Fanconi anemia (FA). This review focuses on the historical discoveries related with these three diseases and describes their impact on the understanding of DNA repair mechanisms and the causes of human cancer. As deficiencies in DNA repair are also often related with progeria symptoms, unrepaired damage and aging are somehow related. Several other pathologies associated with DNA repair defects, genetic instability and increased cancer risk are also discussed. In fact, studies with cells from these many syndromes have helped in understanding important levels of protection against cancer and aging, although little help has actually been conferred to the patients in terms of therapy. Finally, the recent advances in combined basic and translational research on DNA repair and chemotherapy are presented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Decreasing costs of DNA sequencing have made prokaryotic draft genome sequences increasingly common. A contig scaffold is an ordering of contigs in the correct orientation. A scaffold can help genome comparisons and guide gap closure efforts. One popular technique for obtaining contig scaffolds is to map contigs onto a reference genome. However, rearrangements that may exist between the query and reference genomes may result in incorrect scaffolds, if these rearrangements are not taken into account. Large-scale inversions are common rearrangement events in prokaryotic genomes. Even in draft genomes it is possible to detect the presence of inversions given sufficient sequencing coverage and a sufficiently close reference genome. Results: We present a linear-time algorithm that can generate a set of contig scaffolds for a draft genome sequence represented in contigs given a reference genome. The algorithm is aimed at prokaryotic genomes and relies on the presence of matching sequence patterns between the query and reference genomes that can be interpreted as the result of large-scale inversions; we call these patterns inversion signatures. Our algorithm is capable of correctly generating a scaffold if at least one member of every inversion signature pair is present in contigs and no inversion signatures have been overwritten in evolution. The algorithm is also capable of generating scaffolds in the presence of any kind of inversion, even though in this general case there is no guarantee that all scaffolds in the scaffold set will be correct. We compare the performance of SIS, the program that implements the algorithm, to seven other scaffold-generating programs. The results of our tests show that SIS has overall better performance. Conclusions: SIS is a new easy-to-use tool to generate contig scaffolds, available both as stand-alone and as a web server. The good performance of SIS in our tests adds evidence that large-scale inversions are widespread in prokaryotic genomes.