7 resultados para Modeling Rapport Using Hidden Markov Models

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The goal of this thesis work is to develop a computational method based on machine learning techniques for predicting disulfide-bonding states of cysteine residues in proteins, which is a sub-problem of a bigger and yet unsolved problem of protein structure prediction. Improvement in the prediction of disulfide bonding states of cysteine residues will help in putting a constraint in the three dimensional (3D) space of the respective protein structure, and thus will eventually help in the prediction of 3D structure of proteins. Results of this work will have direct implications in site-directed mutational studies of proteins, proteins engineering and the problem of protein folding. We have used a combination of Artificial Neural Network (ANN) and Hidden Markov Model (HMM), the so-called Hidden Neural Network (HNN) as a machine learning technique to develop our prediction method. By using different global and local features of proteins (specifically profiles, parity of cysteine residues, average cysteine conservation, correlated mutation, sub-cellular localization, and signal peptide) as inputs and considering Eukaryotes and Prokaryotes separately we have reached to a remarkable accuracy of 94% on cysteine basis for both Eukaryotic and Prokaryotic datasets, and an accuracy of 90% and 93% on protein basis for Eukaryotic dataset and Prokaryotic dataset respectively. These accuracies are best so far ever reached by any existing prediction methods, and thus our prediction method has outperformed all the previously developed approaches and therefore is more reliable. Most interesting part of this thesis work is the differences in the prediction performances of Eukaryotes and Prokaryotes at the basic level of input coding when ‘profile’ information was given as input to our prediction method. And one of the reasons for this we discover is the difference in the amino acid composition of the local environment of bonded and free cysteine residues in Eukaryotes and Prokaryotes. Eukaryotic bonded cysteine examples have a ‘symmetric-cysteine-rich’ environment, where as Prokaryotic bonded examples lack it.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objective of this work of thesis is the refined estimations of source parameters. To such a purpose we used two different approaches, one in the frequency domain and the other in the time domain. In frequency domain, we analyzed the P- and S-wave displacement spectra to estimate spectral parameters, that is corner frequencies and low frequency spectral amplitudes. We used a parametric modeling approach which is combined with a multi-step, non-linear inversion strategy and includes the correction for attenuation and site effects. The iterative multi-step procedure was applied to about 700 microearthquakes in the moment range 1011-1014 N•m and recorded at the dense, wide-dynamic range, seismic networks operating in Southern Apennines (Italy). The analysis of the source parameters is often complicated when we are not able to model the propagation accurately. In this case the empirical Green function approach is a very useful tool to study the seismic source properties. In fact the Empirical Green Functions (EGFs) consent to represent the contribution of propagation and site effects to signal without using approximate velocity models. An EGF is a recorded three-component set of time-histories of a small earthquake whose source mechanism and propagation path are similar to those of the master event. Thus, in time domain, the deconvolution method of Vallée (2004) was applied to calculate the source time functions (RSTFs) and to accurately estimate source size and rupture velocity. This technique was applied to 1) large event, that is Mw=6.3 2009 L’Aquila mainshock (Central Italy), 2) moderate events, that is cluster of earthquakes of 2009 L’Aquila sequence with moment magnitude ranging between 3 and 5.6, 3) small event, i.e. Mw=2.9 Laviano mainshock (Southern Italy).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The continuous increase of genome sequencing projects produced a huge amount of data in the last 10 years: currently more than 600 prokaryotic and 80 eukaryotic genomes are fully sequenced and publically available. However the sole sequencing process of a genome is able to determine just raw nucleotide sequences. This is only the first step of the genome annotation process that will deal with the issue of assigning biological information to each sequence. The annotation process is done at each different level of the biological information processing mechanism, from DNA to protein, and cannot be accomplished only by in vitro analysis procedures resulting extremely expensive and time consuming when applied at a this large scale level. Thus, in silico methods need to be used to accomplish the task. The aim of this work was the implementation of predictive computational methods to allow a fast, reliable, and automated annotation of genomes and proteins starting from aminoacidic sequences. The first part of the work was focused on the implementation of a new machine learning based method for the prediction of the subcellular localization of soluble eukaryotic proteins. The method is called BaCelLo, and was developed in 2006. The main peculiarity of the method is to be independent from biases present in the training dataset, which causes the over‐prediction of the most represented examples in all the other available predictors developed so far. This important result was achieved by a modification, made by myself, to the standard Support Vector Machine (SVM) algorithm with the creation of the so called Balanced SVM. BaCelLo is able to predict the most important subcellular localizations in eukaryotic cells and three, kingdom‐specific, predictors were implemented. In two extensive comparisons, carried out in 2006 and 2008, BaCelLo reported to outperform all the currently available state‐of‐the‐art methods for this prediction task. BaCelLo was subsequently used to completely annotate 5 eukaryotic genomes, by integrating it in a pipeline of predictors developed at the Bologna Biocomputing group by Dr. Pier Luigi Martelli and Dr. Piero Fariselli. An online database, called eSLDB, was developed by integrating, for each aminoacidic sequence extracted from the genome, the predicted subcellular localization merged with experimental and similarity‐based annotations. In the second part of the work a new, machine learning based, method was implemented for the prediction of GPI‐anchored proteins. Basically the method is able to efficiently predict from the raw aminoacidic sequence both the presence of the GPI‐anchor (by means of an SVM), and the position in the sequence of the post‐translational modification event, the so called ω‐site (by means of an Hidden Markov Model (HMM)). The method is called GPIPE and reported to greatly enhance the prediction performances of GPI‐anchored proteins over all the previously developed methods. GPIPE was able to predict up to 88% of the experimentally annotated GPI‐anchored proteins by maintaining a rate of false positive prediction as low as 0.1%. GPIPE was used to completely annotate 81 eukaryotic genomes, and more than 15000 putative GPI‐anchored proteins were predicted, 561 of which are found in H. sapiens. In average 1% of a proteome is predicted as GPI‐anchored. A statistical analysis was performed onto the composition of the regions surrounding the ω‐site that allowed the definition of specific aminoacidic abundances in the different considered regions. Furthermore the hypothesis that compositional biases are present among the four major eukaryotic kingdoms, proposed in literature, was tested and rejected. All the developed predictors and databases are freely available at: BaCelLo http://gpcr.biocomp.unibo.it/bacello eSLDB http://gpcr.biocomp.unibo.it/esldb GPIPE http://gpcr.biocomp.unibo.it/gpipe

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The silent demographic revolution characterizing the main industrialized countries is an unavoidable factor which has major economic, social, cultural and psychological implications. This thesis studies the main consequences of population ageing and the connections with the phenomenon of migration, The theoretical analysis is developed using Overlapping Generations Models (OLG). The thesis is divided in the following four chapters: 1) “A Model for Determining Consumption and Social Assistance Demand in Uncertainty Conditions”, focuses on the relation between demographic impact and social insurance and proposes the institution of a non selfsufficiency fund for the elderly. 2) "Population Ageing, Longevity and Health", analyzes the effects of health investment on intertemporal individual behaviour and capital accumulation. 3) "Population Ageing and the Nursing Flow", studies the consequences of migration in the nursing sector. 4) "Quality of Multiculturalism and Minorities' Assimilation", focuses on the problem of assimilation and integration of minorities.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The vast majority of known proteins have not yet been experimentally characterized and little is known about their function. The design and implementation of computational tools can provide insight into the function of proteins based on their sequence, their structure, their evolutionary history and their association with other proteins. Knowledge of the three-dimensional (3D) structure of a protein can lead to a deep understanding of its mode of action and interaction, but currently the structures of <1% of sequences have been experimentally solved. For this reason, it became urgent to develop new methods that are able to computationally extract relevant information from protein sequence and structure. The starting point of my work has been the study of the properties of contacts between protein residues, since they constrain protein folding and characterize different protein structures. Prediction of residue contacts in proteins is an interesting problem whose solution may be useful in protein folding recognition and de novo design. The prediction of these contacts requires the study of the protein inter-residue distances related to the specific type of amino acid pair that are encoded in the so-called contact map. An interesting new way of analyzing those structures came out when network studies were introduced, with pivotal papers demonstrating that protein contact networks also exhibit small-world behavior. In order to highlight constraints for the prediction of protein contact maps and for applications in the field of protein structure prediction and/or reconstruction from experimentally determined contact maps, I studied to which extent the characteristic path length and clustering coefficient of the protein contacts network are values that reveal characteristic features of protein contact maps. Provided that residue contacts are known for a protein sequence, the major features of its 3D structure could be deduced by combining this knowledge with correctly predicted motifs of secondary structure. In the second part of my work I focused on a particular protein structural motif, the coiled-coil, known to mediate a variety of fundamental biological interactions. Coiled-coils are found in a variety of structural forms and in a wide range of proteins including, for example, small units such as leucine zippers that drive the dimerization of many transcription factors or more complex structures such as the family of viral proteins responsible for virus-host membrane fusion. The coiled-coil structural motif is estimated to account for 5-10% of the protein sequences in the various genomes. Given their biological importance, in my work I introduced a Hidden Markov Model (HMM) that exploits the evolutionary information derived from multiple sequence alignments, to predict coiled-coil regions and to discriminate coiled-coil sequences. The results indicate that the new HMM outperforms all the existing programs and can be adopted for the coiled-coil prediction and for large-scale genome annotation. Genome annotation is a key issue in modern computational biology, being the starting point towards the understanding of the complex processes involved in biological networks. The rapid growth in the number of protein sequences and structures available poses new fundamental problems that still deserve an interpretation. Nevertheless, these data are at the basis of the design of new strategies for tackling problems such as the prediction of protein structure and function. Experimental determination of the functions of all these proteins would be a hugely time-consuming and costly task and, in most instances, has not been carried out. As an example, currently, approximately only 20% of annotated proteins in the Homo sapiens genome have been experimentally characterized. A commonly adopted procedure for annotating protein sequences relies on the "inheritance through homology" based on the notion that similar sequences share similar functions and structures. This procedure consists in the assignment of sequences to a specific group of functionally related sequences which had been grouped through clustering techniques. The clustering procedure is based on suitable similarity rules, since predicting protein structure and function from sequence largely depends on the value of sequence identity. However, additional levels of complexity are due to multi-domain proteins, to proteins that share common domains but that do not necessarily share the same function, to the finding that different combinations of shared domains can lead to different biological roles. In the last part of this study I developed and validate a system that contributes to sequence annotation by taking advantage of a validated transfer through inheritance procedure of the molecular functions and of the structural templates. After a cross-genome comparison with the BLAST program, clusters were built on the basis of two stringent constraints on sequence identity and coverage of the alignment. The adopted measure explicity answers to the problem of multi-domain proteins annotation and allows a fine grain division of the whole set of proteomes used, that ensures cluster homogeneity in terms of sequence length. A high level of coverage of structure templates on the length of protein sequences within clusters ensures that multi-domain proteins when present can be templates for sequences of similar length. This annotation procedure includes the possibility of reliably transferring statistically validated functions and structures to sequences considering information available in the present data bases of molecular functions and structures.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

L’introduzione dei costumi tecnici nel nuoto ha portato miglioramenti senza precedenti sulla prestazione. I miglioramenti nella velocità di nuoto sono stati attribuiti dalla letteratura a riduzioni nelle resistenze idrodinamiche sul nuotatore. Tuttavia, gli effetti specifici dovuti all’utilizzo di questo tipo di costume non sono ancora completamente chiariti. Questa tesi aveva l’obiettivo di indagare gli effetti del costume tecnico sul galleggiamento statico, sulla posizione del corpo e sulla resistenza idrodinamica in avanzamento passivo. Nello studio preliminare sono stati misurati la spinta idrostatica, i volumi polmonari dinamici e la circonferenza toracica di 9 nuotatori che indossavano un costume tradizionale o un costume tecnico in gomma sintetica. Indossare il costume tecnico ha determinato una riduzione significativa del galleggiamento statico, e la compressione toracica causata da questo tipo di costume potrebbe avere una relazione con la significativa riduzione dei volumi polmonari misurati quando il nuotatore indossa questo tipo di costume. Un successiva analisi prevedeva il traino passivo di 14 nuotatori che mantenevano la miglior posizione idrodinamica di scivolamento indossando un costume tradizionale, tecnico in tessuto e tecnico in gomma. La posizione del corpo in avanzamento è stata misurata con un’analisi cinematica. La resistenza passiva indossando i costumi tecnici è risultata significativamente minore per entrambi i costumi tecnici rispetto alla prova con costume tradizionale. L’analisi condotta attraverso modelli di regressione lineari ha mostrato che una parte della riduzione della resistenza passiva era legata a proprietà intrinseche dei costumi tecnici. Tuttavia, anche l’area di impatto frontale determinata dall’inclinazione del tronco del soggetto in scivolamento e l’inclinazione degli arti inferiori hanno mostrato una marcata influenza sulla resistenza idrodinamica passiva. Pertanto, la riduzione di resistenza idrodinamica durante lo scivolamento passivo effettuato con costume tecnico da nuoto è attribuibile, oltre all’effetto del materiale di composizione del costume, ad una variazione della posizione del corpo del nuotatore.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sheet pile walls are one of the oldest earth retention systems utilized in civil engineering projects. They are used for various purposes; such as excavation support system, cofferdams, cut-off walls under dams, slope stabilization, waterfront structures, and flood walls. Sheet pile walls are one of the most common types of quay walls used in port construction. The worldwide increases in utilization of large ships for transportation have created an urgent need of deepening the seabed within port areas and consequently the rehabilitation of its wharfs. Several methods can be used to increase the load-carrying capacity of sheet-piling walls. The use of additional anchored tie rods grouted into the backfill soil and arranged along the exposed wall height is one of the most practical and appropriate solutions adopted for stabilization and rehabilitation of the existing quay wall. The Ravenna Port Authority initiated a project to deepen the harbor bottom at selected wharves. An extensive parametric study through the finite element program, PLAXIS 2D, version 2012 was carried out to investigate the enhancement of using submerged grouted anchors technique on the load response of sheet-piling quay wall. The influence of grout-ties area, length of grouted body, anchor inclination and anchor location were considered and evaluated due to the effect of different system parameters. Also a comparative study was conducted by Plaxis 2D and 3D program to investigate the behavior of these sheet pile quay walls in terms of horizontal displacements induced along the sheet pile wall and ground surface settlements as well as the anchor force and calculated factor of safety. Finally, a comprehensive study was carried out by using different constitutive models to simulate the mechanical behavior of the soil to investigate the effect of these two models (Mohr-Coulomb and Hardening Soil) on the behavior of these sheet pile quay walls.