6 resultados para Probabilistic graphical models
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
Different types of proteins exist with diverse functions that are essential for living organisms. An important class of proteins is represented by transmembrane proteins which are specifically designed to be inserted into biological membranes and devised to perform very important functions in the cell such as cell communication and active transport across the membrane. Transmembrane β-barrels (TMBBs) are a sub-class of membrane proteins largely under-represented in structure databases because of the extreme difficulty in experimental structure determination. For this reason, computational tools that are able to predict the structure of TMBBs are needed. In this thesis, two computational problems related to TMBBs were addressed: the detection of TMBBs in large datasets of proteins and the prediction of the topology of TMBB proteins. Firstly, a method for TMBB detection was presented based on a novel neural network framework for variable-length sequence classification. The proposed approach was validated on a non-redundant dataset of proteins. Furthermore, we carried-out genome-wide detection using the entire Escherichia coli proteome. In both experiments, the method significantly outperformed other existing state-of-the-art approaches, reaching very high PPV (92%) and MCC (0.82). Secondly, a method was also introduced for TMBB topology prediction. The proposed approach is based on grammatical modelling and probabilistic discriminative models for sequence data labeling. The method was evaluated using a newly generated dataset of 38 TMBB proteins obtained from high-resolution data in the PDB. Results have shown that the model is able to correctly predict topologies of 25 out of 38 protein chains in the dataset. When tested on previously released datasets, the performances of the proposed approach were measured as comparable or superior to the current state-of-the-art of TMBB topology prediction.
Resumo:
Biology is now a “Big Data Science” thanks to technological advancements allowing the characterization of the whole macromolecular content of a cell or a collection of cells. This opens interesting perspectives, but only a small portion of this data may be experimentally characterized. From this derives the demand of accurate and efficient computational tools for automatic annotation of biological molecules. This is even more true when dealing with membrane proteins, on which my research project is focused leading to the development of two machine learning-based methods: BetAware-Deep and SVMyr. BetAware-Deep is a tool for the detection and topology prediction of transmembrane beta-barrel proteins found in Gram-negative bacteria. These proteins are involved in many biological processes and primary candidates as drug targets. BetAware-Deep exploits the combination of a deep learning framework (bidirectional long short-term memory) and a probabilistic graphical model (grammatical-restrained hidden conditional random field). Moreover, it introduced a modified formulation of the hydrophobic moment, designed to include the evolutionary information. BetAware-Deep outperformed all the available methods in topology prediction and reported high scores in the detection task. Glycine myristoylation in Eukaryotes is the binding of a myristic acid on an N-terminal glycine. SVMyr is a fast method based on support vector machines designed to predict this modification in dataset of proteomic scale. It uses as input octapeptides and exploits computational scores derived from experimental examples and mean physicochemical features. SVMyr outperformed all the available methods for co-translational myristoylation prediction. In addition, it allows (as a unique feature) the prediction of post-translational myristoylation. Both the tools here described are designed having in mind best practices for the development of machine learning-based tools outlined by the bioinformatics community. Moreover, they are made available via user-friendly web servers. All this make them valuable tools for filling the gap between sequential and annotated data.
Resumo:
The inherent stochastic character of most of the physical quantities involved in engineering models has led to an always increasing interest for probabilistic analysis. Many approaches to stochastic analysis have been proposed. However, it is widely acknowledged that the only universal method available to solve accurately any kind of stochastic mechanics problem is Monte Carlo Simulation. One of the key parts in the implementation of this technique is the accurate and efficient generation of samples of the random processes and fields involved in the problem at hand. In the present thesis an original method for the simulation of homogeneous, multi-dimensional, multi-variate, non-Gaussian random fields is proposed. The algorithm has proved to be very accurate in matching both the target spectrum and the marginal probability. The computational efficiency and robustness are very good too, even when dealing with strongly non-Gaussian distributions. What is more, the resulting samples posses all the relevant, welldefined and desired properties of “translation fields”, including crossing rates and distributions of extremes. The topic of the second part of the thesis lies in the field of non-destructive parametric structural identification. Its objective is to evaluate the mechanical characteristics of constituent bars in existing truss structures, using static loads and strain measurements. In the cases of missing data and of damages that interest only a small portion of the bar, Genetic Algorithm have proved to be an effective tool to solve the problem.
Resumo:
The advent of distributed and heterogeneous systems has laid the foundation for the birth of new architectural paradigms, in which many separated and autonomous entities collaborate and interact to the aim of achieving complex strategic goals, impossible to be accomplished on their own. A non exhaustive list of systems targeted by such paradigms includes Business Process Management, Clinical Guidelines and Careflow Protocols, Service-Oriented and Multi-Agent Systems. It is largely recognized that engineering these systems requires novel modeling techniques. In particular, many authors are claiming that an open, declarative perspective is needed to complement the closed, procedural nature of the state of the art specification languages. For example, the ConDec language has been recently proposed to target the declarative and open specification of Business Processes, overcoming the over-specification and over-constraining issues of classical procedural approaches. On the one hand, the success of such novel modeling languages strongly depends on their usability by non-IT savvy: they must provide an appealing, intuitive graphical front-end. On the other hand, they must be prone to verification, in order to guarantee the trustworthiness and reliability of the developed model, as well as to ensure that the actual executions of the system effectively comply with it. In this dissertation, we claim that Computational Logic is a suitable framework for dealing with the specification, verification, execution, monitoring and analysis of these systems. We propose to adopt an extended version of the ConDec language for specifying interaction models with a declarative, open flavor. We show how all the (extended) ConDec constructs can be automatically translated to the CLIMB Computational Logic-based language, and illustrate how its corresponding reasoning techniques can be successfully exploited to provide support and verification capabilities along the whole life cycle of the targeted systems.
Resumo:
The determination of skeletal loading conditions in vivo and their relationship to the health of bone tissues, remain an open question. Computational modeling of the musculoskeletal system is the only practicable method providing a valuable approach to muscle and joint loading analyses, although crucial shortcomings limit the translation process of computational methods into the orthopedic and neurological practice. A growing attention focused on subject-specific modeling, particularly when pathological musculoskeletal conditions need to be studied. Nevertheless, subject-specific data cannot be always collected in the research and clinical practice, and there is a lack of efficient methods and frameworks for building models and incorporating them in simulations of motion. The overall aim of the present PhD thesis was to introduce improvements to the state-of-the-art musculoskeletal modeling for the prediction of physiological muscle and joint loads during motion. A threefold goal was articulated as follows: (i) develop state-of-the art subject-specific models and analyze skeletal load predictions; (ii) analyze the sensitivity of model predictions to relevant musculotendon model parameters and kinematic uncertainties; (iii) design an efficient software framework simplifying the effort-intensive phases of subject-specific modeling pre-processing. The first goal underlined the relevance of subject-specific musculoskeletal modeling to determine physiological skeletal loads during gait, corroborating the choice of full subject-specific modeling for the analyses of pathological conditions. The second goal characterized the sensitivity of skeletal load predictions to major musculotendon parameters and kinematic uncertainties, and robust probabilistic methods were applied for methodological and clinical purposes. The last goal created an efficient software framework for subject-specific modeling and simulation, which is practical, user friendly and effort effective. Future research development aims at the implementation of more accurate models describing lower-limb joint mechanics and musculotendon paths, and the assessment of an overall scenario of the crucial model parameters affecting the skeletal load predictions through probabilistic modeling.
Resumo:
Spatial prediction of hourly rainfall via radar calibration is addressed. The change of support problem (COSP), arising when the spatial supports of different data sources do not coincide, is faced in a non-Gaussian setting; in fact, hourly rainfall in Emilia-Romagna region, in Italy, is characterized by abundance of zero values and right-skeweness of the distribution of positive amounts. Rain gauge direct measurements on sparsely distributed locations and hourly cumulated radar grids are provided by the ARPA-SIMC Emilia-Romagna. We propose a three-stage Bayesian hierarchical model for radar calibration, exploiting rain gauges as reference measure. Rain probability and amounts are modeled via linear relationships with radar in the log scale; spatial correlated Gaussian effects capture the residual information. We employ a probit link for rainfall probability and Gamma distribution for rainfall positive amounts; the two steps are joined via a two-part semicontinuous model. Three model specifications differently addressing COSP are presented; in particular, a stochastic weighting of all radar pixels, driven by a latent Gaussian process defined on the grid, is employed. Estimation is performed via MCMC procedures implemented in C, linked to R software. Communication and evaluation of probabilistic, point and interval predictions is investigated. A non-randomized PIT histogram is proposed for correctly assessing calibration and coverage of two-part semicontinuous models. Predictions obtained with the different model specifications are evaluated via graphical tools (Reliability Plot, Sharpness Histogram, PIT Histogram, Brier Score Plot and Quantile Decomposition Plot), proper scoring rules (Brier Score, Continuous Rank Probability Score) and consistent scoring functions (Root Mean Square Error and Mean Absolute Error addressing the predictive mean and median, respectively). Calibration is reached and the inclusion of neighbouring information slightly improves predictions. All specifications outperform a benchmark model with incorrelated effects, confirming the relevance of spatial correlation for modeling rainfall probability and accumulation.