4 resultados para Subcellular localization prediction
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
The continuous increase of genome sequencing projects produced a huge amount of data in the last 10 years: currently more than 600 prokaryotic and 80 eukaryotic genomes are fully sequenced and publically available. However the sole sequencing process of a genome is able to determine just raw nucleotide sequences. This is only the first step of the genome annotation process that will deal with the issue of assigning biological information to each sequence. The annotation process is done at each different level of the biological information processing mechanism, from DNA to protein, and cannot be accomplished only by in vitro analysis procedures resulting extremely expensive and time consuming when applied at a this large scale level. Thus, in silico methods need to be used to accomplish the task. The aim of this work was the implementation of predictive computational methods to allow a fast, reliable, and automated annotation of genomes and proteins starting from aminoacidic sequences. The first part of the work was focused on the implementation of a new machine learning based method for the prediction of the subcellular localization of soluble eukaryotic proteins. The method is called BaCelLo, and was developed in 2006. The main peculiarity of the method is to be independent from biases present in the training dataset, which causes the over‐prediction of the most represented examples in all the other available predictors developed so far. This important result was achieved by a modification, made by myself, to the standard Support Vector Machine (SVM) algorithm with the creation of the so called Balanced SVM. BaCelLo is able to predict the most important subcellular localizations in eukaryotic cells and three, kingdom‐specific, predictors were implemented. In two extensive comparisons, carried out in 2006 and 2008, BaCelLo reported to outperform all the currently available state‐of‐the‐art methods for this prediction task. BaCelLo was subsequently used to completely annotate 5 eukaryotic genomes, by integrating it in a pipeline of predictors developed at the Bologna Biocomputing group by Dr. Pier Luigi Martelli and Dr. Piero Fariselli. An online database, called eSLDB, was developed by integrating, for each aminoacidic sequence extracted from the genome, the predicted subcellular localization merged with experimental and similarity‐based annotations. In the second part of the work a new, machine learning based, method was implemented for the prediction of GPI‐anchored proteins. Basically the method is able to efficiently predict from the raw aminoacidic sequence both the presence of the GPI‐anchor (by means of an SVM), and the position in the sequence of the post‐translational modification event, the so called ω‐site (by means of an Hidden Markov Model (HMM)). The method is called GPIPE and reported to greatly enhance the prediction performances of GPI‐anchored proteins over all the previously developed methods. GPIPE was able to predict up to 88% of the experimentally annotated GPI‐anchored proteins by maintaining a rate of false positive prediction as low as 0.1%. GPIPE was used to completely annotate 81 eukaryotic genomes, and more than 15000 putative GPI‐anchored proteins were predicted, 561 of which are found in H. sapiens. In average 1% of a proteome is predicted as GPI‐anchored. A statistical analysis was performed onto the composition of the regions surrounding the ω‐site that allowed the definition of specific aminoacidic abundances in the different considered regions. Furthermore the hypothesis that compositional biases are present among the four major eukaryotic kingdoms, proposed in literature, was tested and rejected. All the developed predictors and databases are freely available at: BaCelLo http://gpcr.biocomp.unibo.it/bacello eSLDB http://gpcr.biocomp.unibo.it/esldb GPIPE http://gpcr.biocomp.unibo.it/gpipe
Resumo:
Beet necrotic yellow vein virus (BNYVV), the leading infectious agent that affects sugar beet, is included within viruses transmitted through the soil from plasmodiophorid as Polymyxa betae. BNYVV is the causal agent of Rhizomania, which induces abnormal rootlet proliferation and is widespread in the sugar beet growing areas in Europe, Asia and America; for review see (Peltier et al., 2008). In this latter continent, Beet soil-borne mosaic virus (BSBMV) has been identified (Lee et al., 2001) and belongs to the benyvirus genus together with BNYVV, both vectored by P. betae. BSBMV is widely distributed only in the United States and it has not been reported yet in others countries. It was first identified in Texas as a sugar beet virus morphologically similar but serologically distinct to BNYVV. Subsequent sequence analysis of BSBMV RNAs evidenced similar genomic organization to that of BNYVV but sufficient molecular differences to distinct BSBMV and BNYVV in two different species (Rush et al., 2003). Benyviruses field isolates usually consist of four RNA species but some BNYVV isolates contain a fifth RNA. RNAs -1 contains a single long ORF encoding polypeptide that shares amino acid homology with known viral RNA-dependent RNA polymerases (RdRp) and helicases. RNAs -2 contains six ORFs: capsid protein (CP), one readthrough protein, triple gene block proteins (TGB) that are required for cell-to-cell virus movement and the sixth 14 kDa ORF is a post-translation gene silencing suppressor. RNAs -3 is involved on disease symptoms and is essential for virus systemic movement. BSBMV RNA-3 can be trans-replicated, trans-encapsidated by the BNYVV helper strain (RNA-1 and -2) (Ratti et al., 2009). BNYVV RNA-4 encoded one 31 kDa protein and is essential for vector interactions and virus transmission by P. betae (Rahim et al., 2007). BNYVV RNA-5 encoded 26 kDa protein that improve virus infections and accumulation in the hosts. We are interest on BSBMV effect on Rhizomania studies using powerful tools as full-length infectious cDNA clones. B-type full-length infectious cDNA clones are available (Quillet et al., 1989) as well as A/P-type RNA-3, -4 and -5 from BNYVV (unpublished). A-type BNYVV full-length clones are also available, but RNA-1 cDNA clone still need to be modified. During the PhD program, we start production of BSBMV full-length cDNA clones and we investigate molecular interactions between plant and Benyviruses exploiting biological, epidemiological and molecular similarities/divergences between BSBMV and BNYVV. During my PhD researchrs we obtained full length infectious cDNA clones of BSBMV RNA-1 and -2 and we demonstrate that they transcripts are replicated and packaged in planta and able to substitute BNYVV RNA-1 or RNA-2 in a chimeric viral progeny (BSBMV RNA-1 + BNYVV RNA-2 or BNYVV RNA-1 + BSBMV RNA-2). During BSBMV full-length cDNA clones production, unexpected 1,730 nts long form of BSBMV RNA-4 has been detected from sugar beet roots grown on BSBMV infected soil. Sequence analysis of the new BSBMV RNA-4 form revealed high identity (~100%) with published version of BSBMV RNA-4 sequence (NC_003508) between nucleotides 1-608 and 1,138-1,730, however the new form shows 528 additionally nucleotides between positions 608-1,138 (FJ424610). Two putative ORFs has been identified, the first one (nucleotides 383 to 1,234), encode a protein with predicted mass of 32 kDa (p32) and the second one (nucleotides 885 to 1,244) express an expected product of 13 kDa (p13). As for BSBMV RNA-3 (Ratti et al., 2009), full-length BSBMV RNA-4 cDNA clone permitted to obtain infectious transcripts that BNYVV viral machinery (Stras12) is able to replicate and to encapsidate in planta. Moreover, we demonstrated that BSBMV RNA-4 can substitute BNYVV RNA-4 for an efficient transmission through the vector P. betae in Beta vulgaris plants, demonstrating a very high correlation between BNYVV and BSBMV. At the same time, using BNYVV helper strain, we studied BSBMV RNA-4’s protein expression in planta. We associated a local necrotic lesions phenotype to the p32 protein expression onto mechanically inoculated C. quinoa. Flag or GFP-tagged sequences of p32 and p13 have been expressed in viral context, using Rep3 replicons, based on BNYVV RNA-3. Western blot analyses of local lesions contents, using FLAG-specific antibody, revealed a high molecular weight protein, which suggest either a strong interaction of BSBMV RNA4’s protein with host protein(s) or post translational modifications. GFP-fusion sequences permitted the subcellular localization of BSBMV RNA4’s proteins. Moreover we demonstrated the absence of self-activation domains on p32 by yeast two hybrid system approaches. We also confirmed that p32 protein is essential for virus transmission by P. betae using BNYVV helper strain and BNYVV RNA-3 and we investigated its role by the use of different deleted forms of p32 protein. Serial mechanical inoculation of wild-type BSBMV on C. quinoa plants were performed every 7 days. Deleted form of BSBMV RNA-4 (1298 bp) appeared after 14 passages and its sequence analysis shows deletion of 433 nucleotides between positions 611 and 1044 of RNA-4 new form. We demonstrated that this deleted form can’t support transmission by P. betae using BNYVV helper strain and BNYVV RNA-3, moreover we confirmed our hypothesis that BSBMV RNA-4 described by Lee et al. (2001) is a deleted form. Interesting after 21 passages we identifed one chimeric form of BSBMV RNA-4 and BSBMV RNA-3 (1146 bp). Two putative ORFs has been identified on its sequence, the first one (nucleotides 383 to 562), encode a protein with predicted mass of 7 kDa (p7), corresponding to the N-terminal of p32 protein encoded by BSBMV RNA-4; the second one (nucleotides 562 to 789) express an expected product of 9 kDa (p9) corresponding to the C-terminal of p29 encoded by BSBMV RNA-3. Results obtained by our research in this topic opened new research lines that our laboratories will develop in a closely future. In particular BSBMV p32 and its mutated forms will be used to identify factors, as host or vector protein(s), involved in the virus transmission through P. betae. The new results could allow selection or production of sugar beet plants able to prevent virus transmission then able to reduce viral inoculum in the soil.
Resumo:
OPA3 è una proteina codificata dal genoma nucleare che, grazie a una sequenza di targeting mitocondriale, viene indirizzata ai mitocondri dopo la sua sintesi. Le mutazioni nel gene OPA3 sono associate a due patologie neurodegenerative: la Sindrome di Costeff, causata da mutazioni recessive, e una forma di atrofia ottica dominante che si manifesta con cataratta e spesso sordità. L’esatta funzione e regolazione della proteina non sono ancora state completamente chiarite, così come la sua localizzazione nella membrana mitocondriale esterna o interna. Lo scopo di questa tesi era quello di fare luce sulla funzione della proteina OPA3, con particolare interesse alla dinamica mitocondriale e all’autofagia, sulla sua localizzazione subcellulare ed infine di definire il meccanismo patogenetico nelle patologie neurodegenerative causate da mutazioni in questo gene. A questo scopo abbiamo utilizzato sia una linea di neuroblastoma silenziata stabilmente per OPA3 che linee cellulari primarie derivate da pazienti. I risultati del presente studio dimostrano che la riduzione di OPA3, indotta nelle cellule del neuroblastoma e presente nei fibroblasti derivati dai pazienti, produce alterazioni nel network mitocondriale con uno sbilanciamento a favore della fusione. Questo fenomeno è probabilmente dovuto all’aumento della forma long della proteina OPA1 che è stato riscontrato in entrambi i modelli cellulari. Inoltre, seppur con direzione apparentemente opposta, in entrambi i modelli abbiamo osservato un’alterata regolazione dell’autofagia. Infine, abbiamo confermato che OPA3 localizza nella membrana mitocondriale interna ed è esposta per gran parte nella matrice. Inoltre, un segnale della proteina è stato trovato anche nelle mitochondrial associated membranes, suggerendo un possibile ruolo di OPA3 nel trasferimento dei lipidi tra i mitocondri e il reticolo endoplasmatico. Abbiamo rilevato un’interazione della proteina OPA3 con l’acido fosfatidico che non era mai stata evidenziata fino ad oggi. Queste osservazioni sono compatibili con le alterazioni della dinamica mitocondriale e la disregolazione dell’autofagia documentate nei modelli studiati.
Resumo:
The goal of this thesis work is to develop a computational method based on machine learning techniques for predicting disulfide-bonding states of cysteine residues in proteins, which is a sub-problem of a bigger and yet unsolved problem of protein structure prediction. Improvement in the prediction of disulfide bonding states of cysteine residues will help in putting a constraint in the three dimensional (3D) space of the respective protein structure, and thus will eventually help in the prediction of 3D structure of proteins. Results of this work will have direct implications in site-directed mutational studies of proteins, proteins engineering and the problem of protein folding. We have used a combination of Artificial Neural Network (ANN) and Hidden Markov Model (HMM), the so-called Hidden Neural Network (HNN) as a machine learning technique to develop our prediction method. By using different global and local features of proteins (specifically profiles, parity of cysteine residues, average cysteine conservation, correlated mutation, sub-cellular localization, and signal peptide) as inputs and considering Eukaryotes and Prokaryotes separately we have reached to a remarkable accuracy of 94% on cysteine basis for both Eukaryotic and Prokaryotic datasets, and an accuracy of 90% and 93% on protein basis for Eukaryotic dataset and Prokaryotic dataset respectively. These accuracies are best so far ever reached by any existing prediction methods, and thus our prediction method has outperformed all the previously developed approaches and therefore is more reliable. Most interesting part of this thesis work is the differences in the prediction performances of Eukaryotes and Prokaryotes at the basic level of input coding when ‘profile’ information was given as input to our prediction method. And one of the reasons for this we discover is the difference in the amino acid composition of the local environment of bonded and free cysteine residues in Eukaryotes and Prokaryotes. Eukaryotic bonded cysteine examples have a ‘symmetric-cysteine-rich’ environment, where as Prokaryotic bonded examples lack it.