988 resultados para Functional Annotation
Resumo:
BACKGROUND: Genetic association studies are conducted to discover genetic loci that contribute to an inherited trait, identify the variants behind these associations and ascertain their functional role in determining the phenotype. To date, functional annotations of the genetic variants have rarely played more than an indirect role in assessing evidence for association. Here, we demonstrate how these data can be systematically integrated into an association study's analysis plan. RESULTS: We developed a Bayesian statistical model for the prior probability of phenotype-genotype association that incorporates data from past association studies and publicly available functional annotation data regarding the susceptibility variants under study. The model takes the form of a binary regression of association status on a set of annotation variables whose coefficients were estimated through an analysis of associated SNPs in the GWAS Catalog (GC). The functional predictors examined included measures that have been demonstrated to correlate with the association status of SNPs in the GC and some whose utility in this regard is speculative: summaries of the UCSC Human Genome Browser ENCODE super-track data, dbSNP function class, sequence conservation summaries, proximity to genomic variants in the Database of Genomic Variants and known regulatory elements in the Open Regulatory Annotation database, PolyPhen-2 probabilities and RegulomeDB categories. Because we expected that only a fraction of the annotations would contribute to predicting association, we employed a penalized likelihood method to reduce the impact of non-informative predictors and evaluated the model's ability to predict GC SNPs not used to construct the model. We show that the functional data alone are predictive of a SNP's presence in the GC. Further, using data from a genome-wide study of ovarian cancer, we demonstrate that their use as prior data when testing for association is practical at the genome-wide scale and improves power to detect associations. CONCLUSIONS: We show how diverse functional annotations can be efficiently combined to create 'functional signatures' that predict the a priori odds of a variant's association to a trait and how these signatures can be integrated into a standard genome-wide-scale association analysis, resulting in improved power to detect truly associated variants.
Resumo:
Affiliation: Centre Robert-Cedergren de l'Université de Montréal en bio-informatique et génomique & Département de biochimie, Université de Montréal
Resumo:
Blumeria graminis is an economically important obligate plant-pathogenic fungus, whose entire genome was recently sequenced and manually annotated using ab initio in silico predictions [7]. Employing large scale proteogenomic analysis we are now able to verify independently the existence of proteins predicted by 24% of open reading frame models. We compared the haustoria and sporulating hyphae proteomes and identified 71 proteins exclusively in haustoria, the feeding and effector-delivery organs of the pathogen. These proteins are ‘significantly smaller than the rest of the protein pool and predicted to be secreted. Most do not share any similarities with Swiss–Prot or Trembl entries nor possess any identifiable Pfam domains. We used a novel automated prediction pipeline to model the 3D structures of the proteins, identify putative ligand binding sites and predict regions of intrinsic disorder. This revealed that the protein set found exclusively in haustoria is significantly less disordered than the rest of the identified Blumeria proteins or random (and representative) protein sets generated from the yeast proteome. For most of the haustorial proteins with unknown functions no good templates could be found, from which to generate high quality models. Thus, these unknown proteins present potentially new protein folds that can be specific to the interaction of the pathogen with its host.
Resumo:
To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST),program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged.
Resumo:
The striped catfish (Pangasianodon hypophthalmus) culture industry in the Mekong Delta in Vietnam has developed rapidly over the past decade. The culture industry now however, faces some significant challenges, especially related to climate change impacts notably from predicted extensive saltwater intrusion into many low topographical coastal provinces across the Mekong Delta. This problem highlights a need for development of culture stocks that can tolerate more saline culture environments as a response to expansion of saline water-intruded land. While a traditional artificial selection program can potentially address this need, understanding the genomic basis of salinity tolerance can assist development of more productive culture lines. The current study applied a transcriptomic approach using Ion PGM technology to generate expressed sequence tag (EST) resources from the intestine and swim bladder from striped catfish reared at a salinity level of 9 ppt which showed best growth performance. Total sequence data generated was 467.8 Mbp, consisting of 4,116,424 reads with an average length of 112 bp. De novo assembly was employed that generated 51,188 contigs, and allowed identification of 16,116 putative genes based on the GenBank non-redundant database. GO annotation, KEGG pathway mapping, and functional annotation of the EST sequences recovered with a wide diversity of biological functions and processes. In addition, more than 11,600 simple sequence repeats were also detected. This is the first comprehensive analysis of a striped catfish transcriptome, and provides a valuable genomic resource for future selective breeding programs and functional or evolutionary studies of genes that influence salinity tolerance in this important culture species.
Resumo:
In the current era of high-throughput sequencing and structure determination, functional annotation has become a bottleneck in biomedical science. Here, we show that automated inference of molecular function using functional linkages among genes increases the accuracy of functional assignments by >= 8% and enriches functional descriptions in >= 34% of top assignments. Furthermore, biochemical literature supports >80% of automated inferences for previously unannotated proteins. These results emphasize the benefit of incorporating functional linkages in protein annotation.
Resumo:
Of the similar to 4000 ORFs identified through the genome sequence of Mycobacterium tuberculosis (TB) H37Rv, experimentally determined structures are available for 312. Since knowledge of protein structures is essential to obtain a high-resolution understanding of the underlying biology, we seek to obtain a structural annotation for the genome, using computational methods. Structural models were obtained and validated for similar to 2877 ORFs, covering similar to 70% of the genome. Functional annotation of each protein was based on fold-based functional assignments and a novel binding site based ligand association. New algorithms for binding site detection and genome scale binding site comparison at the structural level, recently reported from the laboratory, were utilized. Besides these, the annotation covers detection of various sequence and sub-structural motifs and quaternary structure predictions based on the corresponding templates. The study provides an opportunity to obtain a global perspective of the fold distribution in the genome. The annotation indicates that cellular metabolism can be achieved with only 219 folds. New insights about the folds that predominate in the genome, as well as the fold-combinations that make up multi-domain proteins are also obtained. 1728 binding pockets have been associated with ligands through binding site identification and sub-structure similarity analyses. The resource (http://proline.physics.iisc.ernet.in/Tbstructuralannotation), being one of the first to be based on structure-derived functional annotations at a genome scale, is expected to be useful for better understanding of TB and for application in drug discovery. The reported annotation pipeline is fairly generic and can be applied to other genomes as well.
Resumo:
A computational pipeline PocketAnnotate for functional annotation of proteins at the level of binding sites has been proposed in this study. The pipeline integrates three in-house algorithms for site-based function annotation: PocketDepth, for prediction of binding sites in protein structures; PocketMatch, for rapid comparison of binding sites and PocketAlign, to obtain detailed alignment between pair of binding sites. A novel scheme has been developed to rapidly generate a database of non-redundant binding sites. For a given input protein structure, putative ligand-binding sites are identified, matched in real time against the database and the query substructure aligned with the promising hits, to obtain a set of possible ligands that the given protein could bind to. The input can be either whole protein structures or merely the substructures corresponding to possible binding sites. Structure-based function annotation at the level of binding sites thus achieved could prove very useful for cases where no obvious functional inference can be obtained based purely on sequence or fold-level analyses. An attempt has also been made to analyse proteins of no known function from Protein Data Bank. PocketAnnotate would be a valuable tool for the scientific community and contribute towards structure-based functional inference. The web server can be freely accessed at http://proline.biochem.iisc.ernet.in/pocketannotate/.
Resumo:
A high-quality cDNA library was constructed from whole body tissues of the zhikong scallop, Chlamys farreri, challenged by Listonella anguillarum. A total of 5720 clones were sequenced, yielding 5123 expressed sequence tags (ESTs). Among the 3326 unique genes identified, 2289 (69%) genes had no significant (E-value < 1e-5) matches to known sequences in public databases and 194 (6%) matched proteins of unknown functions. The remaining 843 (25%) genes that exhibited homology with genes of known functions, showed broad involvement in metabolic processes (31%), cell structure and motility (20%), gene and protein expression (12%), cell signaling and cell communication (8%), cell division (4%), and notably, 25% of those genes were related to immune function. They included stress response genes, complement-like genes, proteinase and proteinase inhibitors, immune recognition receptors and immune effectors. The EST collection obtained in this study provides a useful resource for gene discovery and especially for the identification of host-defense genes and systems in scallops and other molluscs. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
El marcaje de proteínas con ubiquitina, conocido como ubiquitinación, cumple diferentes funciones que incluyen la regulación de varios procesos celulares, tales como: la degradación de proteínas por medio del proteosoma, la reparación del ADN, la señalización mediada por receptores de membrana, y la endocitosis, entre otras (1). Las moléculas de ubiquitina pueden ser removidas de sus sustratos gracias a la acción de un gran grupo de proteasas, llamadas enzimas deubiquitinizantes (DUBs) (2). Las DUBs son esenciales para la manutención de la homeostasis de la ubiquitina y para la regulación del estado de ubiquitinación de diferentes sustratos. El gran número y la diversidad de DUBs descritas refleja tanto su especificidad como su utilización para regular un amplio espectro de sustratos y vías celulares. Aunque muchas DUBs han sido estudiadas a profundidad, actualmente se desconocen los sustratos y las funciones biológicas de la mayoría de ellas. En este trabajo se investigaron las funciones de las DUBs: USP19, USP4 y UCH-L1. Utilizando varias técnicas de biología molecular y celular se encontró que: i) USP19 es regulada por las ubiquitin ligasas SIAH1 y SIAH2 ii) USP19 es importante para regular HIF-1α, un factor de transcripción clave en la respuesta celular a hipoxia, iii) USP4 interactúa con el proteosoma, iv) La quimera mCherry-UCH-L1 reproduce parcialmente los fenotipos que nuestro grupo ha descrito previamente al usar otros constructos de la misma enzima, y v) UCH-L1 promueve la internalización de la bacteria Yersinia pseudotuberculosis.
Resumo:
Severe acute respiratory syndrome (SARS) coronavirus infection and growth are dependent on initiating signaling and enzyme actions upon viral entry into the host cell. Proteins packaged during virus assembly may subsequently form the first line of attack and host manipulation upon infection. A complete characterization of virion components is therefore important to understanding the dynamics of early stages of infection. Mass spectrometry and kinase profiling techniques identified nearly 200 incorporated host and viral proteins. We used published interaction data to identify hubs of connectivity with potential significance for virion formation. Surprisingly, the hub with the most potential connections was not the viral M protein but the nonstructurall protein 3 (nsp3), which is one of the novel virion components identified by mass spectrometry. Based on new experimental data and a bioinformatics analysis across the Coronaviridae, we propose a higher-resolution functional domain architecture for nsp3 that determines the interaction capacity of this protein. Using recombinant protein domains expressed in Escherichia coli, we identified two additional RNA-binding domains of nsp3. One of these domains is located within the previously described SARS-unique domain, and there is a nucleic acid chaperone-like domain located immediately downstream of the papain-like proteinase domain. We also identified a novel cysteine-coordinated metal ion-binding domain. Analyses of interdomain interactions and provisional functional annotation of the remaining, so-far-uncharacterized domains are presented. Overall, the ensemble of data surveyed here paint a more complete picture of nsp3 as a conserved component of the viral protein processing machinery, which is intimately associated with viral RNA in its role as a virion component.
Resumo:
A taxonomic and annotated functional description of microbial life was deduced from 53 Mb of metagenomic sequence retrieved from a planktonic fraction of the Neotropical high Andean (3,973 meters above sea level) acidic hot spring El Coquito (EC). A classification of unassembled metagenomic reads using different databases showed a high proportion of Gammaproteobacteria and Alphaproteobacteria (in total read affiliation), and through taxonomic affiliation of 16S rRNA gene fragments we observed the presence of Proteobacteria, micro-algae chloroplast and Firmicutes. Reads mapped against the genomes Acidiphilium cryptum JF-5, Legionella pneumophila str. Corby and Acidithiobacillus caldus revealed the presence of transposase-like sequences, potentially involved in horizontal gene transfer. Functional annotation and hierarchical comparison with different datasets obtained by pyrosequencing in different ecosystems showed that the microbial community also contained extensive DNA repair systems, possibly to cope with ultraviolet radiation at such high altitudes. Analysis of genes involved in the nitrogen cycle indicated the presence of dissimilatory nitrate reduction to N2 (narGHI, nirS, norBCDQ and nosZ), associated with Proteobacteria-like sequences. Genes involved in the sulfur cycle (cysDN, cysNC and aprA) indicated adenylsulfate and sulfite production that were affiliated to several bacterial species. In summary, metagenomic sequence data provided insight regarding the structure and possible functions of this hot spring microbial community, describing some groups potentially involved in the nitrogen and sulfur cycling in this environment. Citation: Jimenez DJ, Andreote FD, Chaves D, Montana JS, Osorio-Forero C, et al. (2012) Structural and Functional Insights from the Metagenome of an Acidic Hot Spring Microbial Planktonic Community in the Colombian Andes. PLoS ONE 7(12): e52069. doi:10.1371/journal.pone.0052069
Resumo:
Background: Even before having its genome sequence published in 2004, Kluyveromyces lactis had long been considered a model organism for studies in genetics and physiology. Research on Kluyveromyces lactis is quite advanced and this yeast species is one of the few with which it is possible to perform formal genetic analysis. Nevertheless, until now, no complete metabolic functional annotation has been performed to the proteins encoded in the Kluyveromyces lactis genome. Results: In this work, a new metabolic genome-wide functional re-annotation of the proteins encoded in the Kluyveromyces lactis genome was performed, resulting in the annotation of 1759 genes with metabolic functions, and the development of a methodology supported by merlin (software developed in-house). The new annotation includes novelties, such as the assignment of transporter superfamily numbers to genes identified as transporter proteins. Thus, the genes annotated with metabolic functions could be exclusively enzymatic (1410 genes), transporter proteins encoding genes (301 genes) or have both metabolic activities (48 genes). The new annotation produced by this work largely surpassed the Kluyveromyces lactis currently available annotations. A comparison with KEGG’s annotation revealed a match with 844 (~90%) of the genes annotated by KEGG, while adding 850 new gene annotations. Moreover, there are 32 genes with annotations different from KEGG. Conclusions: The methodology developed throughout this work can be used to re-annotate any yeast or, with a little tweak of the reference organism, the proteins encoded in any sequenced genome. The new annotation provided by this study offers basic knowledge which might be useful for the scientific community working on this model yeast, because new functions have been identified for the so-called metabolic genes. Furthermore, it served as the basis for the reconstruction of a compartmentalized, genome-scale metabolic model of Kluyveromyces lactis, which is currently being finished.
Resumo:
Pochonia chlamydosporia is a worldwide-distributed soil fungus with a great capacity to infect and destroy the eggs and kill females of plant-parasitic nematodes. Additionally, it has the ability to colonize endophytically roots of economically-important crop plants, thereby promoting their growth and eliciting plant defenses. This multitrophic behavior makes P. chlamydosporia a potentially useful tool for sustainable agriculture approaches. We sequenced and assembled ∼41 Mb of P. chlamydosporia genomic DNA and predicted 12,122 gene models, of which many were homologous to genes of fungal pathogens of invertebrates and fungal plant pathogens. Predicted genes (65%) were functionally annotated according to Gene Ontology, and 16% of them found to share homology with genes in the Pathogen Host Interactions (PHI) database. The genome of this fungus is highly enriched in genes encoding hydrolytic enzymes, such as proteases, glycoside hydrolases and carbohydrate esterases. We used RNA-Seq technology in order to identify the genes expressed during endophytic behavior of P. chlamydosporia when colonizing barley roots. Functional annotation of these genes showed that hydrolytic enzymes and transporters are expressed during endophytism. This structural and functional analysis of the P. chlamydosporia genome provides a starting point for understanding the molecular mechanisms involved in the multitrophic lifestyle of this fungus. The genomic information provided here should also prove useful for enhancing the capabilities of this fungus as a biocontrol agent of plant-parasitic nematodes and as a plant growth-promoting organism.