3 resultados para Probabilistic graphical model
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
                                
Resumo:
Biology is now a “Big Data Science” thanks to technological advancements allowing the characterization of the whole macromolecular content of a cell or a collection of cells. This opens interesting perspectives, but only a small portion of this data may be experimentally characterized. From this derives the demand of accurate and efficient computational tools for automatic annotation of biological molecules. This is even more true when dealing with membrane proteins, on which my research project is focused leading to the development of two machine learning-based methods: BetAware-Deep and SVMyr. BetAware-Deep is a tool for the detection and topology prediction of transmembrane beta-barrel proteins found in Gram-negative bacteria. These proteins are involved in many biological processes and primary candidates as drug targets. BetAware-Deep exploits the combination of a deep learning framework (bidirectional long short-term memory) and a probabilistic graphical model (grammatical-restrained hidden conditional random field). Moreover, it introduced a modified formulation of the hydrophobic moment, designed to include the evolutionary information. BetAware-Deep outperformed all the available methods in topology prediction and reported high scores in the detection task. Glycine myristoylation in Eukaryotes is the binding of a myristic acid on an N-terminal glycine. SVMyr is a fast method based on support vector machines designed to predict this modification in dataset of proteomic scale. It uses as input octapeptides and exploits computational scores derived from experimental examples and mean physicochemical features. SVMyr outperformed all the available methods for co-translational myristoylation prediction. In addition, it allows (as a unique feature) the prediction of post-translational myristoylation. Both the tools here described are designed having in mind best practices for the development of machine learning-based tools outlined by the bioinformatics community. Moreover, they are made available via user-friendly web servers. All this make them valuable tools for filling the gap between sequential and annotated data.
                                
Resumo:
In recent years, radars have been used in many applications such as precision agriculture and advanced driver assistant systems. Optimal techniques for the estimation of the number of targets and of their coordinates require solving multidimensional optimization problems entailing huge computational efforts. This has motivated the development of sub-optimal estimation techniques able to achieve good accuracy at a manageable computational cost. Another technical issue in advanced driver assistant systems is the tracking of multiple targets. Even if various filtering techniques have been developed, new efficient and robust algorithms for target tracking can be devised exploiting a probabilistic approach, based on the use of the factor graph and the sum-product algorithm. The two contributions provided by this dissertation are the investigation of the filtering and smoothing problems from a factor graph perspective and the development of efficient algorithms for two and three-dimensional radar imaging. Concerning the first contribution, a new factor graph for filtering is derived and the sum-product rule is applied to this graphical model; this allows to interpret known algorithms and to develop new filtering techniques. Then, a general method, based on graphical modelling, is proposed to derive filtering algorithms that involve a network of interconnected Bayesian filters. Finally, the proposed graphical approach is exploited to devise a new smoothing algorithm. Numerical results for dynamic systems evidence that our algorithms can achieve a better complexity-accuracy tradeoff and tracking capability than other techniques in the literature. Regarding radar imaging, various algorithms are developed for frequency modulated continuous wave radars; these algorithms rely on novel and efficient methods for the detection and estimation of multiple superimposed tones in noise. The accuracy achieved in the presence of multiple closely spaced targets is assessed on the basis of both synthetically generated data and of the measurements acquired through two commercial multiple-input multiple-output radars.
                                
Resumo:
Spatial prediction of hourly rainfall via radar calibration is addressed. The change of support problem (COSP), arising when the spatial supports of different data sources do not coincide, is faced in a non-Gaussian setting; in fact, hourly rainfall in Emilia-Romagna region, in Italy, is characterized by abundance of zero values and right-skeweness of the distribution of positive amounts. Rain gauge direct measurements on sparsely distributed locations and hourly cumulated radar grids are provided by the ARPA-SIMC Emilia-Romagna. We propose a three-stage Bayesian hierarchical model for radar calibration, exploiting rain gauges as reference measure. Rain probability and amounts are modeled via linear relationships with radar in the log scale; spatial correlated Gaussian effects capture the residual information. We employ a probit link for rainfall probability and Gamma distribution for rainfall positive amounts; the two steps are joined via a two-part semicontinuous model. Three model specifications differently addressing COSP are presented; in particular, a stochastic weighting of all radar pixels, driven by a latent Gaussian process defined on the grid, is employed. Estimation is performed via MCMC procedures implemented in C, linked to R software. Communication and evaluation of probabilistic, point and interval predictions is investigated. A non-randomized PIT histogram is proposed for correctly assessing calibration and coverage of two-part semicontinuous models. Predictions obtained with the different model specifications are evaluated via graphical tools (Reliability Plot, Sharpness Histogram, PIT Histogram, Brier Score Plot and Quantile Decomposition Plot), proper scoring rules (Brier Score, Continuous Rank Probability Score) and consistent scoring functions (Root Mean Square Error and Mean Absolute Error addressing the predictive mean and median, respectively). Calibration is reached and the inclusion of neighbouring information slightly improves predictions. All specifications outperform a benchmark model with incorrelated effects, confirming the relevance of spatial correlation for modeling rainfall probability and accumulation.
 
                    