3 resultados para Functional annotation
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
The development of Next Generation Sequencing promotes Biology in the Big Data era. The ever-increasing gap between proteins with known sequences and those with a complete functional annotation requires computational methods for automatic structure and functional annotation. My research has been focusing on proteins and led so far to the development of three novel tools, DeepREx, E-SNPs&GO and ISPRED-SEQ, based on Machine and Deep Learning approaches. DeepREx computes the solvent exposure of residues in a protein chain. This problem is relevant for the definition of structural constraints regarding the possible folding of the protein. DeepREx exploits Long Short-Term Memory layers to capture residue-level interactions between positions distant in the sequence, achieving state-of-the-art performances. With DeepRex, I conducted a large-scale analysis investigating the relationship between solvent exposure of a residue and its probability to be pathogenic upon mutation. E-SNPs&GO predicts the pathogenicity of a Single Residue Variation. Variations occurring on a protein sequence can have different effects, possibly leading to the onset of diseases. E-SNPs&GO exploits protein embeddings generated by two novel Protein Language Models (PLMs), as well as a new way of representing functional information coming from the Gene Ontology. The method achieves state-of-the-art performances and is extremely time-efficient when compared to traditional approaches. ISPRED-SEQ predicts the presence of Protein-Protein Interaction sites in a protein sequence. Knowing how a protein interacts with other molecules is crucial for accurate functional characterization. ISPRED-SEQ exploits a convolutional layer to parse local context after embedding the protein sequence with two novel PLMs, greatly surpassing the current state-of-the-art. All methods are published in international journals and are available as user-friendly web servers. They have been developed keeping in mind standard guidelines for FAIRness (FAIR: Findable, Accessible, Interoperable, Reusable) and are integrated into the public collection of tools provided by ELIXIR, the European infrastructure for Bioinformatics.
Resumo:
Bioinformatics, in the last few decades, has played a fundamental role to give sense to the huge amount of data produced. Obtained the complete sequence of a genome, the major problem of knowing as much as possible of its coding regions, is crucial. Protein sequence annotation is challenging and, due to the size of the problem, only computational approaches can provide a feasible solution. As it has been recently pointed out by the Critical Assessment of Function Annotations (CAFA), most accurate methods are those based on the transfer-by-homology approach and the most incisive contribution is given by cross-genome comparisons. In the present thesis it is described a non-hierarchical sequence clustering method for protein automatic large-scale annotation, called “The Bologna Annotation Resource Plus” (BAR+). The method is based on an all-against-all alignment of more than 13 millions protein sequences characterized by a very stringent metric. BAR+ can safely transfer functional features (Gene Ontology and Pfam terms) inside clusters by means of a statistical validation, even in the case of multi-domain proteins. Within BAR+ clusters it is also possible to transfer the three dimensional structure (when a template is available). This is possible by the way of cluster-specific HMM profiles that can be used to calculate reliable template-to-target alignments even in the case of distantly related proteins (sequence identity < 30%). Other BAR+ based applications have been developed during my doctorate including the prediction of Magnesium binding sites in human proteins, the ABC transporters superfamily classification and the functional prediction (GO terms) of the CAFA targets. Remarkably, in the CAFA assessment, BAR+ placed among the ten most accurate methods. At present, as a web server for the functional and structural protein sequence annotation, BAR+ is freely available at http://bar.biocomp.unibo.it/bar2.0.
Resumo:
Grape berry is considered a non climacteric fruit, but there are some evidences that ethylene plays a role in the control of berry ripening. This PhD thesis aimed to give insights in the role of ethylene and ethylene-related genes in the regulation of grape berry ripening. During this study a small increase in ethylene concentration one week before véraison has been measured in Vitis vinifera L. ‘Pinot Noir’ grapes confirming previous findings in ‘Cabernet Sauvignon’. In addition, ethylene-related genes have been identified in the grapevine genome sequence. Similarly to other species, biosynthesis and ethylene receptor genes are present in grapevine as multi-gene families and their expression appeared tissue or developmental specific. All the other elements of the ethylene signal transduction cascade were also identified in the grape genome. Among them, there were ethylene response factors (ERF) which modulate the transcription of many effector genes in response to ethylene. In this study seven grapevine ERFs have been characterized and they showed tissue and berry development specific expression profiles. Two sequences, VvERF045 and VvERF063, seemed likely involved in berry ripening control due to their expression profiles and their sequence annotation. VvERF045 was induced before véraison and was specific of the ripe berry, by sequence similarity it was likely a transcription activator. VvERF063 displayed high sequence similarity to repressors of transcription and its expression, very high in green berries, was lowest at véraison and during ripening. To functionally characterize VvERF045 and VvERF063, a stable transformation strategy was chosen. Both sequences were cloned in vectors for over-expression and silencing and transferred in grape by Agrobacterium-mediated or biolistic-mediated gene transfer. In vitro, transgenic VvERF045 over-expressing plants displayed an epinastic phenotype whose extent was correlated to the transgene expression level. Four pathogen stress response genes were significantly induced in the transgenic plants, suggesting a putative function of VvERF045 in biotic stress defense during berry ripening. Further molecular analysis on the transgenic plants will help in identifying the actual VvERF045 target genes and together with the phenotypic characterization of the adult transgenic plants, will allow to extensively define the role of VvERF045 in berry ripening.