13 resultados para Bayesian Mixture Model, Cavalieri Method, Trapezoidal Rule
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
A new methodology is being devised for ensemble ocean forecasting using distributions of the surface wind field derived from a Bayesian Hierarchical Model (BHM). The ocean members are forced with samples from the posterior distribution of the wind during the assimilation of satellite and in-situ ocean data. The initial condition perturbations are then consistent with the best available knowledge of the ocean state at the beginning of the forecast and amplify the ocean response to uncertainty only in the forcing. The ECMWF Ensemble Prediction System (EPS) surface winds are also used to generate a reference ocean ensemble to evaluate the performance of the BHM method that proves to be eective in concentrating the forecast uncertainty at the ocean meso-scale. An height month experiment of weekly BHM ensemble forecasts was performed in the framework of the operational Mediterranean Forecasting System. The statistical properties of the ensemble are compared with model errors throughout the seasonal cycle proving the existence of a strong relationship between forecast uncertainties due to atmospheric forcing and the seasonal cycle.
Resumo:
Environmental computer models are deterministic models devoted to predict several environmental phenomena such as air pollution or meteorological events. Numerical model output is given in terms of averages over grid cells, usually at high spatial and temporal resolution. However, these outputs are often biased with unknown calibration and not equipped with any information about the associated uncertainty. Conversely, data collected at monitoring stations is more accurate since they essentially provide the true levels. Due the leading role played by numerical models, it now important to compare model output with observations. Statistical methods developed to combine numerical model output and station data are usually referred to as data fusion. In this work, we first combine ozone monitoring data with ozone predictions from the Eta-CMAQ air quality model in order to forecast real-time current 8-hour average ozone level defined as the average of the previous four hours, current hour, and predictions for the next three hours. We propose a Bayesian downscaler model based on first differences with a flexible coefficient structure and an efficient computational strategy to fit model parameters. Model validation for the eastern United States shows consequential improvement of our fully inferential approach compared with the current real-time forecasting system. Furthermore, we consider the introduction of temperature data from a weather forecast model into the downscaler, showing improved real-time ozone predictions. Finally, we introduce a hierarchical model to obtain spatially varying uncertainty associated with numerical model output. We show how we can learn about such uncertainty through suitable stochastic data fusion modeling using some external validation data. We illustrate our Bayesian model by providing the uncertainty map associated with a temperature output over the northeastern United States.
Resumo:
The recent advent of Next-generation sequencing technologies has revolutionized the way of analyzing the genome. This innovation allows to get deeper information at a lower cost and in less time, and provides data that are discrete measurements. One of the most important applications with these data is the differential analysis, that is investigating if one gene exhibit a different expression level in correspondence of two (or more) biological conditions (such as disease states, treatments received and so on). As for the statistical analysis, the final aim will be statistical testing and for modeling these data the Negative Binomial distribution is considered the most adequate one especially because it allows for "over dispersion". However, the estimation of the dispersion parameter is a very delicate issue because few information are usually available for estimating it. Many strategies have been proposed, but they often result in procedures based on plug-in estimates, and in this thesis we show that this discrepancy between the estimation and the testing framework can lead to uncontrolled first-type errors. We propose a mixture model that allows each gene to share information with other genes that exhibit similar variability. Afterwards, three consistent statistical tests are developed for differential expression analysis. We show that the proposed method improves the sensitivity of detecting differentially expressed genes with respect to the common procedures, since it is the best one in reaching the nominal value for the first-type error, while keeping elevate power. The method is finally illustrated on prostate cancer RNA-seq data.
Resumo:
Spatial prediction of hourly rainfall via radar calibration is addressed. The change of support problem (COSP), arising when the spatial supports of different data sources do not coincide, is faced in a non-Gaussian setting; in fact, hourly rainfall in Emilia-Romagna region, in Italy, is characterized by abundance of zero values and right-skeweness of the distribution of positive amounts. Rain gauge direct measurements on sparsely distributed locations and hourly cumulated radar grids are provided by the ARPA-SIMC Emilia-Romagna. We propose a three-stage Bayesian hierarchical model for radar calibration, exploiting rain gauges as reference measure. Rain probability and amounts are modeled via linear relationships with radar in the log scale; spatial correlated Gaussian effects capture the residual information. We employ a probit link for rainfall probability and Gamma distribution for rainfall positive amounts; the two steps are joined via a two-part semicontinuous model. Three model specifications differently addressing COSP are presented; in particular, a stochastic weighting of all radar pixels, driven by a latent Gaussian process defined on the grid, is employed. Estimation is performed via MCMC procedures implemented in C, linked to R software. Communication and evaluation of probabilistic, point and interval predictions is investigated. A non-randomized PIT histogram is proposed for correctly assessing calibration and coverage of two-part semicontinuous models. Predictions obtained with the different model specifications are evaluated via graphical tools (Reliability Plot, Sharpness Histogram, PIT Histogram, Brier Score Plot and Quantile Decomposition Plot), proper scoring rules (Brier Score, Continuous Rank Probability Score) and consistent scoring functions (Root Mean Square Error and Mean Absolute Error addressing the predictive mean and median, respectively). Calibration is reached and the inclusion of neighbouring information slightly improves predictions. All specifications outperform a benchmark model with incorrelated effects, confirming the relevance of spatial correlation for modeling rainfall probability and accumulation.
Resumo:
Long-term monitoring of acoustical environments is gaining popularity thanks to the relevant amount of scientific and engineering insights that it provides. The increasing interest is due to the constant growth of storage capacity and computational power to process large amounts of data. In this perspective, machine learning (ML) provides a broad family of data-driven statistical techniques to deal with large databases. Nowadays, the conventional praxis of sound level meter measurements limits the global description of a sound scene to an energetic point of view. The equivalent continuous level Leq represents the main metric to define an acoustic environment, indeed. Finer analyses involve the use of statistical levels. However, acoustic percentiles are based on temporal assumptions, which are not always reliable. A statistical approach, based on the study of the occurrences of sound pressure levels, would bring a different perspective to the analysis of long-term monitoring. Depicting a sound scene through the most probable sound pressure level, rather than portions of energy, brought more specific information about the activity carried out during the measurements. The statistical mode of the occurrences can capture typical behaviors of specific kinds of sound sources. The present work aims to propose an ML-based method to identify, separate and measure coexisting sound sources in real-world scenarios. It is based on long-term monitoring and is addressed to acousticians focused on the analysis of environmental noise in manifold contexts. The presented method is based on clustering analysis. Two algorithms, Gaussian Mixture Model and K-means clustering, represent the main core of a process to investigate different active spaces monitored through sound level meters. The procedure has been applied in two different contexts: university lecture halls and offices. The proposed method shows robust and reliable results in describing the acoustic scenario and it could represent an important analytical tool for acousticians.
Resumo:
Nell’attuale contesto di aumento degli impatti antropici e di “Global Climate Change” emerge la necessità di comprenderne i possibili effetti di questi sugli ecosistemi inquadrati come fruitori di servizi e funzioni imprescindibili sui quali si basano intere tessiture economiche e sociali. Lo studio previsionale degli ecosistemi si scontra con l’elevata complessità di questi ultimi in luogo di una altrettanto elevata scarsità di osservazioni integrate. L’approccio modellistico appare il più adatto all’analisi delle dinamiche complesse degli ecosistemi ed alla contestualizzazione complessa di risultati sperimentali ed osservazioni empiriche. L’approccio riduzionista-deterministico solitamente utilizzato nell’implementazione di modelli non si è però sin qui dimostrato in grado di raggiungere i livelli di complessità più elevati all’interno della struttura eco sistemica. La componente che meglio descrive la complessità ecosistemica è quella biotica in virtù dell’elevata dipendenza dalle altre componenti e dalle loro interazioni. In questo lavoro di tesi viene proposto un approccio modellistico stocastico basato sull’utilizzo di un compilatore naive Bayes operante in ambiente fuzzy. L’utilizzo congiunto di logica fuzzy e approccio naive Bayes è utile al processa mento del livello di complessità e conseguentemente incertezza insito negli ecosistemi. I modelli generativi ottenuti, chiamati Fuzzy Bayesian Ecological Model(FBEM) appaiono in grado di modellizare gli stati eco sistemici in funzione dell’ elevato numero di interazioni che entrano in gioco nella determinazione degli stati degli ecosistemi. Modelli FBEM sono stati utilizzati per comprendere il rischio ambientale per habitat intertidale di spiagge sabbiose in caso di eventi di flooding costiero previsti nell’arco di tempo 2010-2100. L’applicazione è stata effettuata all’interno del progetto EU “Theseus” per il quale i modelli FBEM sono stati utilizzati anche per una simulazione a lungo termine e per il calcolo dei tipping point specifici dell’habitat secondo eventi di flooding di diversa intensità.
Resumo:
Tradizionalmente, l'obiettivo della calibrazione di un modello afflussi-deflussi è sempre stato quello di ottenere un set di parametri (o una distribuzione di probabilità dei parametri) che massimizzasse l'adattamento dei dati simulati alla realtà osservata, trattando parzialmente le finalità applicative del modello. Nel lavoro di tesi viene proposta una metodologia di calibrazione che trae spunto dell'evidenza che non sempre la corrispondenza tra dati osservati e simulati rappresenti il criterio più appropriato per calibrare un modello idrologico. Ai fini applicativi infatti, può risultare maggiormente utile una miglior rappresentazione di un determinato aspetto dell'idrogramma piuttosto che un altro. Il metodo di calibrazione che viene proposto mira a valutare le prestazioni del modello stimandone l'utilità nell'applicazione prevista. Tramite l'utilizzo di opportune funzioni, ad ogni passo temporale viene valutata l'utilità della simulazione ottenuta. La calibrazione viene quindi eseguita attraverso la massimizzazione di una funzione obiettivo costituita dalla somma delle utilità stimate nei singoli passi temporali. Le analisi mostrano come attraverso l'impiego di tali funzioni obiettivo sia possibile migliorare le prestazioni del modello laddove ritenute di maggior interesse per per le finalità applicative previste.
Resumo:
In this work we aim to propose a new approach for preliminary epidemiological studies on Standardized Mortality Ratios (SMR) collected in many spatial regions. A preliminary study on SMRs aims to formulate hypotheses to be investigated via individual epidemiological studies that avoid bias carried on by aggregated analyses. Starting from collecting disease counts and calculating expected disease counts by means of reference population disease rates, in each area an SMR is derived as the MLE under the Poisson assumption on each observation. Such estimators have high standard errors in small areas, i.e. where the expected count is low either because of the low population underlying the area or the rarity of the disease under study. Disease mapping models and other techniques for screening disease rates among the map aiming to detect anomalies and possible high-risk areas have been proposed in literature according to the classic and the Bayesian paradigm. Our proposal is approaching this issue by a decision-oriented method, which focus on multiple testing control, without however leaving the preliminary study perspective that an analysis on SMR indicators is asked to. We implement the control of the FDR, a quantity largely used to address multiple comparisons problems in the eld of microarray data analysis but which is not usually employed in disease mapping. Controlling the FDR means providing an estimate of the FDR for a set of rejected null hypotheses. The small areas issue arises diculties in applying traditional methods for FDR estimation, that are usually based only on the p-values knowledge (Benjamini and Hochberg, 1995; Storey, 2003). Tests evaluated by a traditional p-value provide weak power in small areas, where the expected number of disease cases is small. Moreover tests cannot be assumed as independent when spatial correlation between SMRs is expected, neither they are identical distributed when population underlying the map is heterogeneous. The Bayesian paradigm oers a way to overcome the inappropriateness of p-values based methods. Another peculiarity of the present work is to propose a hierarchical full Bayesian model for FDR estimation in testing many null hypothesis of absence of risk.We will use concepts of Bayesian models for disease mapping, referring in particular to the Besag York and Mollié model (1991) often used in practice for its exible prior assumption on the risks distribution across regions. The borrowing of strength between prior and likelihood typical of a hierarchical Bayesian model takes the advantage of evaluating a singular test (i.e. a test in a singular area) by means of all observations in the map under study, rather than just by means of the singular observation. This allows to improve the power test in small areas and addressing more appropriately the spatial correlation issue that suggests that relative risks are closer in spatially contiguous regions. The proposed model aims to estimate the FDR by means of the MCMC estimated posterior probabilities b i's of the null hypothesis (absence of risk) for each area. An estimate of the expected FDR conditional on data (\FDR) can be calculated in any set of b i's relative to areas declared at high-risk (where thenull hypothesis is rejected) by averaging the b i's themselves. The\FDR can be used to provide an easy decision rule for selecting high-risk areas, i.e. selecting as many as possible areas such that the\FDR is non-lower than a prexed value; we call them\FDR based decision (or selection) rules. The sensitivity and specicity of such rule depend on the accuracy of the FDR estimate, the over-estimation of FDR causing a loss of power and the under-estimation of FDR producing a loss of specicity. Moreover, our model has the interesting feature of still being able to provide an estimate of relative risk values as in the Besag York and Mollié model (1991). A simulation study to evaluate the model performance in FDR estimation accuracy, sensitivity and specificity of the decision rule, and goodness of estimation of relative risks, was set up. We chose a real map from which we generated several spatial scenarios whose counts of disease vary according to the spatial correlation degree, the size areas, the number of areas where the null hypothesis is true and the risk level in the latter areas. In summarizing simulation results we will always consider the FDR estimation in sets constituted by all b i's selected lower than a threshold t. We will show graphs of the\FDR and the true FDR (known by simulation) plotted against a threshold t to assess the FDR estimation. Varying the threshold we can learn which FDR values can be accurately estimated by the practitioner willing to apply the model (by the closeness between\FDR and true FDR). By plotting the calculated sensitivity and specicity (both known by simulation) vs the\FDR we can check the sensitivity and specicity of the corresponding\FDR based decision rules. For investigating the over-smoothing level of relative risk estimates we will compare box-plots of such estimates in high-risk areas (known by simulation), obtained by both our model and the classic Besag York Mollié model. All the summary tools are worked out for all simulated scenarios (in total 54 scenarios). Results show that FDR is well estimated (in the worst case we get an overestimation, hence a conservative FDR control) in small areas, low risk levels and spatially correlated risks scenarios, that are our primary aims. In such scenarios we have good estimates of the FDR for all values less or equal than 0.10. The sensitivity of\FDR based decision rules is generally low but specicity is high. In such scenario the use of\FDR = 0:05 or\FDR = 0:10 based selection rule can be suggested. In cases where the number of true alternative hypotheses (number of true high-risk areas) is small, also FDR = 0:15 values are well estimated, and \FDR = 0:15 based decision rules gains power maintaining an high specicity. On the other hand, in non-small areas and non-small risk level scenarios the FDR is under-estimated unless for very small values of it (much lower than 0.05); this resulting in a loss of specicity of a\FDR = 0:05 based decision rule. In such scenario\FDR = 0:05 or, even worse,\FDR = 0:1 based decision rules cannot be suggested because the true FDR is actually much higher. As regards the relative risk estimation, our model achieves almost the same results of the classic Besag York Molliè model. For this reason, our model is interesting for its ability to perform both the estimation of relative risk values and the FDR control, except for non-small areas and large risk level scenarios. A case of study is nally presented to show how the method can be used in epidemiology.
Resumo:
The aim of the thesi is to formulate a suitable Item Response Theory (IRT) based model to measure HRQoL (as latent variable) using a mixed responses questionnaire and relaxing the hypothesis of normal distributed latent variable. The new model is a combination of two models already presented in literature, that is, a latent trait model for mixed responses and an IRT model for Skew Normal latent variable. It is developed in a Bayesian framework, a Markov chain Monte Carlo procedure is used to generate samples of the posterior distribution of the parameters of interest. The proposed model is test on a questionnaire composed by 5 discrete items and one continuous to measure HRQoL in children, the EQ-5D-Y questionnaire. A large sample of children collected in the schools was used. In comparison with a model for only discrete responses and a model for mixed responses and normal latent variable, the new model has better performances, in term of deviance information criterion (DIC), chain convergences times and precision of the estimates.
Resumo:
In this PhD thesis the crashworthiness topic is studied with the perspective of the development of a small-scale experimental test able to characterize a material in terms of energy absorption. The material properties obtained are then used to validate a nu- merical model of the experimental test itself. Consequently, the numerical model, calibrated on the specific ma- terial, can be extended to more complex structures and used to simulate their energy absorption behavior. The experimental activity started at University of Washington in Seattle, WA (USA) and continued at Second Faculty of Engi- neering, University of Bologna, Forl`ı (Italy), where the numerical model for the simulation of the experimental test was implemented and optimized.
Resumo:
Forest models are tools for explaining and predicting the dynamics of forest ecosystems. They simulate forest behavior by integrating information on the underlying processes in trees, soil and atmosphere. Bayesian calibration is the application of probability theory to parameter estimation. It is a method, applicable to all models, that quantifies output uncertainty and identifies key parameters and variables. This study aims at testing the Bayesian procedure for calibration to different types of forest models, to evaluate their performances and the uncertainties associated with them. In particular,we aimed at 1) applying a Bayesian framework to calibrate forest models and test their performances in different biomes and different environmental conditions, 2) identifying and solve structure-related issues in simple models, and 3) identifying the advantages of additional information made available when calibrating forest models with a Bayesian approach. We applied the Bayesian framework to calibrate the Prelued model on eight Italian eddy-covariance sites in Chapter 2. The ability of Prelued to reproduce the estimated Gross Primary Productivity (GPP) was tested over contrasting natural vegetation types that represented a wide range of climatic and environmental conditions. The issues related to Prelued's multiplicative structure were the main topic of Chapter 3: several different MCMC-based procedures were applied within a Bayesian framework to calibrate the model, and their performances were compared. A more complex model was applied in Chapter 4, focusing on the application of the physiology-based model HYDRALL to the forest ecosystem of Lavarone (IT) to evaluate the importance of additional information in the calibration procedure and their impact on model performances, model uncertainties, and parameter estimation. Overall, the Bayesian technique proved to be an excellent and versatile tool to successfully calibrate forest models of different structure and complexity, on different kind and number of variables and with a different number of parameters involved.
Resumo:
In recent years, radars have been used in many applications such as precision agriculture and advanced driver assistant systems. Optimal techniques for the estimation of the number of targets and of their coordinates require solving multidimensional optimization problems entailing huge computational efforts. This has motivated the development of sub-optimal estimation techniques able to achieve good accuracy at a manageable computational cost. Another technical issue in advanced driver assistant systems is the tracking of multiple targets. Even if various filtering techniques have been developed, new efficient and robust algorithms for target tracking can be devised exploiting a probabilistic approach, based on the use of the factor graph and the sum-product algorithm. The two contributions provided by this dissertation are the investigation of the filtering and smoothing problems from a factor graph perspective and the development of efficient algorithms for two and three-dimensional radar imaging. Concerning the first contribution, a new factor graph for filtering is derived and the sum-product rule is applied to this graphical model; this allows to interpret known algorithms and to develop new filtering techniques. Then, a general method, based on graphical modelling, is proposed to derive filtering algorithms that involve a network of interconnected Bayesian filters. Finally, the proposed graphical approach is exploited to devise a new smoothing algorithm. Numerical results for dynamic systems evidence that our algorithms can achieve a better complexity-accuracy tradeoff and tracking capability than other techniques in the literature. Regarding radar imaging, various algorithms are developed for frequency modulated continuous wave radars; these algorithms rely on novel and efficient methods for the detection and estimation of multiple superimposed tones in noise. The accuracy achieved in the presence of multiple closely spaced targets is assessed on the basis of both synthetically generated data and of the measurements acquired through two commercial multiple-input multiple-output radars.