4 resultados para TF-map alignment

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The vast majority of known proteins have not yet been experimentally characterized and little is known about their function. The design and implementation of computational tools can provide insight into the function of proteins based on their sequence, their structure, their evolutionary history and their association with other proteins. Knowledge of the three-dimensional (3D) structure of a protein can lead to a deep understanding of its mode of action and interaction, but currently the structures of <1% of sequences have been experimentally solved. For this reason, it became urgent to develop new methods that are able to computationally extract relevant information from protein sequence and structure. The starting point of my work has been the study of the properties of contacts between protein residues, since they constrain protein folding and characterize different protein structures. Prediction of residue contacts in proteins is an interesting problem whose solution may be useful in protein folding recognition and de novo design. The prediction of these contacts requires the study of the protein inter-residue distances related to the specific type of amino acid pair that are encoded in the so-called contact map. An interesting new way of analyzing those structures came out when network studies were introduced, with pivotal papers demonstrating that protein contact networks also exhibit small-world behavior. In order to highlight constraints for the prediction of protein contact maps and for applications in the field of protein structure prediction and/or reconstruction from experimentally determined contact maps, I studied to which extent the characteristic path length and clustering coefficient of the protein contacts network are values that reveal characteristic features of protein contact maps. Provided that residue contacts are known for a protein sequence, the major features of its 3D structure could be deduced by combining this knowledge with correctly predicted motifs of secondary structure. In the second part of my work I focused on a particular protein structural motif, the coiled-coil, known to mediate a variety of fundamental biological interactions. Coiled-coils are found in a variety of structural forms and in a wide range of proteins including, for example, small units such as leucine zippers that drive the dimerization of many transcription factors or more complex structures such as the family of viral proteins responsible for virus-host membrane fusion. The coiled-coil structural motif is estimated to account for 5-10% of the protein sequences in the various genomes. Given their biological importance, in my work I introduced a Hidden Markov Model (HMM) that exploits the evolutionary information derived from multiple sequence alignments, to predict coiled-coil regions and to discriminate coiled-coil sequences. The results indicate that the new HMM outperforms all the existing programs and can be adopted for the coiled-coil prediction and for large-scale genome annotation. Genome annotation is a key issue in modern computational biology, being the starting point towards the understanding of the complex processes involved in biological networks. The rapid growth in the number of protein sequences and structures available poses new fundamental problems that still deserve an interpretation. Nevertheless, these data are at the basis of the design of new strategies for tackling problems such as the prediction of protein structure and function. Experimental determination of the functions of all these proteins would be a hugely time-consuming and costly task and, in most instances, has not been carried out. As an example, currently, approximately only 20% of annotated proteins in the Homo sapiens genome have been experimentally characterized. A commonly adopted procedure for annotating protein sequences relies on the "inheritance through homology" based on the notion that similar sequences share similar functions and structures. This procedure consists in the assignment of sequences to a specific group of functionally related sequences which had been grouped through clustering techniques. The clustering procedure is based on suitable similarity rules, since predicting protein structure and function from sequence largely depends on the value of sequence identity. However, additional levels of complexity are due to multi-domain proteins, to proteins that share common domains but that do not necessarily share the same function, to the finding that different combinations of shared domains can lead to different biological roles. In the last part of this study I developed and validate a system that contributes to sequence annotation by taking advantage of a validated transfer through inheritance procedure of the molecular functions and of the structural templates. After a cross-genome comparison with the BLAST program, clusters were built on the basis of two stringent constraints on sequence identity and coverage of the alignment. The adopted measure explicity answers to the problem of multi-domain proteins annotation and allows a fine grain division of the whole set of proteomes used, that ensures cluster homogeneity in terms of sequence length. A high level of coverage of structure templates on the length of protein sequences within clusters ensures that multi-domain proteins when present can be templates for sequences of similar length. This annotation procedure includes the possibility of reliably transferring statistically validated functions and structures to sequences considering information available in the present data bases of molecular functions and structures.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An extensive sample (2%) of private vehicles in Italy are equipped with a GPS device that periodically measures their position and dynamical state for insurance purposes. Having access to this type of data allows to develop theoretical and practical applications of great interest: the real-time reconstruction of traffic state in a certain region, the development of accurate models of vehicle dynamics, the study of the cognitive dynamics of drivers. In order for these applications to be possible, we first need to develop the ability to reconstruct the paths taken by vehicles on the road network from the raw GPS data. In fact, these data are affected by positioning errors and they are often very distanced from each other (~2 Km). For these reasons, the task of path identification is not straightforward. This thesis describes the approach we followed to reliably identify vehicle paths from this kind of low-sampling data. The problem of matching data with roads is solved with a bayesian approach of maximum likelihood. While the identification of the path taken between two consecutive GPS measures is performed with a specifically developed optimal routing algorithm, based on A* algorithm. The procedure was applied on an off-line urban data sample and proved to be robust and accurate. Future developments will extend the procedure to real-time execution and nation-wide coverage.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Environmental computer models are deterministic models devoted to predict several environmental phenomena such as air pollution or meteorological events. Numerical model output is given in terms of averages over grid cells, usually at high spatial and temporal resolution. However, these outputs are often biased with unknown calibration and not equipped with any information about the associated uncertainty. Conversely, data collected at monitoring stations is more accurate since they essentially provide the true levels. Due the leading role played by numerical models, it now important to compare model output with observations. Statistical methods developed to combine numerical model output and station data are usually referred to as data fusion. In this work, we first combine ozone monitoring data with ozone predictions from the Eta-CMAQ air quality model in order to forecast real-time current 8-hour average ozone level defined as the average of the previous four hours, current hour, and predictions for the next three hours. We propose a Bayesian downscaler model based on first differences with a flexible coefficient structure and an efficient computational strategy to fit model parameters. Model validation for the eastern United States shows consequential improvement of our fully inferential approach compared with the current real-time forecasting system. Furthermore, we consider the introduction of temperature data from a weather forecast model into the downscaler, showing improved real-time ozone predictions. Finally, we introduce a hierarchical model to obtain spatially varying uncertainty associated with numerical model output. We show how we can learn about such uncertainty through suitable stochastic data fusion modeling using some external validation data. We illustrate our Bayesian model by providing the uncertainty map associated with a temperature output over the northeastern United States.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Landslide hazard and risk are growing as a consequence of climate change and demographic pressure. Land‐use planning represents a powerful tool to manage this socio‐economic problem and build sustainable and landslide resilient communities. Landslide inventory maps are a cornerstone of land‐use planning and, consequently, their quality assessment represents a burning issue. This work aimed to define the quality parameters of a landslide inventory and assess its spatial and temporal accuracy with regard to its possible applications to land‐use planning. In this sense, I proceeded according to a two‐steps approach. An overall assessment of the accuracy of data geographic positioning was performed on four case study sites located in the Italian Northern Apennines. The quantification of the overall spatial and temporal accuracy, instead, focused on the Dorgola Valley (Province of Reggio Emilia). The assessment of spatial accuracy involved a comparison between remotely sensed and field survey data, as well as an innovative fuzzylike analysis of a multi‐temporal landslide inventory map. Conversely, long‐ and short‐term landslide temporal persistence was appraised over a period of 60 years with the aid of 18 remotely sensed image sets. These results were eventually compared with the current Territorial Plan for Provincial Coordination (PTCP) of the Province of Reggio Emilia. The outcome of this work suggested that geomorphologically detected and mapped landslides are a significant approximation of a more complex reality. In order to convey to the end‐users this intrinsic uncertainty, a new form of cartographic representation is needed. In this sense, a fuzzy raster landslide map may be an option. With regard to land‐use planning, landslide inventory maps, if appropriately updated, confirmed to be essential decision‐support tools. This research, however, proved that their spatial and temporal uncertainty discourages any direct use as zoning maps, especially when zoning itself is associated to statutory or advisory regulations.