A clustering method for robust and reliable large scale functional and structural protein sequence annotation


Autoria(s): Piovesan, Damiano
Contribuinte(s)

Casadio, Rita

Data(s)

18/04/2013

Resumo

Bioinformatics, in the last few decades, has played a fundamental role to give sense to the huge amount of data produced. Obtained the complete sequence of a genome, the major problem of knowing as much as possible of its coding regions, is crucial. Protein sequence annotation is challenging and, due to the size of the problem, only computational approaches can provide a feasible solution. As it has been recently pointed out by the Critical Assessment of Function Annotations (CAFA), most accurate methods are those based on the transfer-by-homology approach and the most incisive contribution is given by cross-genome comparisons. In the present thesis it is described a non-hierarchical sequence clustering method for protein automatic large-scale annotation, called “The Bologna Annotation Resource Plus” (BAR+). The method is based on an all-against-all alignment of more than 13 millions protein sequences characterized by a very stringent metric. BAR+ can safely transfer functional features (Gene Ontology and Pfam terms) inside clusters by means of a statistical validation, even in the case of multi-domain proteins. Within BAR+ clusters it is also possible to transfer the three dimensional structure (when a template is available). This is possible by the way of cluster-specific HMM profiles that can be used to calculate reliable template-to-target alignments even in the case of distantly related proteins (sequence identity < 30%). Other BAR+ based applications have been developed during my doctorate including the prediction of Magnesium binding sites in human proteins, the ABC transporters superfamily classification and the functional prediction (GO terms) of the CAFA targets. Remarkably, in the CAFA assessment, BAR+ placed among the ten most accurate methods. At present, as a web server for the functional and structural protein sequence annotation, BAR+ is freely available at http://bar.biocomp.unibo.it/bar2.0.

Formato

application/pdf

Identificador

http://amsdottorato.unibo.it/5627/1/piovesan_damiano_tesi.pdf

urn:nbn:it:unibo-10414

Piovesan, Damiano (2013) A clustering method for robust and reliable large scale functional and structural protein sequence annotation, [Dissertation thesis], Alma Mater Studiorum Università di Bologna. Dottorato di ricerca in Biotecnologie, farmacologia e tossicologia: progetto n. 1 "Biotecnologie cellulari e molecolari" <http://amsdottorato.unibo.it/view/dottorati/DOT419/>, 25 Ciclo. DOI 10.6092/unibo/amsdottorato/5627.

Idioma(s)

en

Publicador

Alma Mater Studiorum - Università di Bologna

Relação

http://amsdottorato.unibo.it/5627/

Direitos

info:eu-repo/semantics/openAccess

Palavras-Chave #BIO/10 Biochimica
Tipo

Tesi di dottorato

NonPeerReviewed