914 resultados para Parameter Inference
Resumo:
PySSM is a Python package that has been developed for the analysis of time series using linear Gaussian state space models (SSM). PySSM is easy to use; models can be set up quickly and efficiently and a variety of different settings are available to the user. It also takes advantage of scientific libraries Numpy and Scipy and other high level features of the Python language. PySSM is also used as a platform for interfacing between optimised and parallelised Fortran routines. These Fortran routines heavily utilise Basic Linear Algebra (BLAS) and Linear Algebra Package (LAPACK) functions for maximum performance. PySSM contains classes for filtering, classical smoothing as well as simulation smoothing.
Resumo:
Background This paper presents a novel approach to searching electronic medical records that is based on concept matching rather than keyword matching. Aim The concept-based approach is intended to overcome specific challenges we identified in searching medical records. Method Queries and documents were transformed from their term-based originals into medical concepts as defined by the SNOMED-CT ontology. Results Evaluation on a real-world collection of medical records showed our concept-based approach outperformed a keyword baseline by 25% in Mean Average Precision. Conclusion The concept-based approach provides a framework for further development of inference based search systems for dealing with medical data.
Inherent errors in pollutant build-up estimation in considering urban land use as a lumped parameter
Resumo:
Stormwater quality modelling results is subject to uncertainty. The variability of input parameters is an important source of overall model error. An in-depth understanding of the variability associated with input parameters can provide knowledge on the uncertainty associated with these parameters and consequently assist in uncertainty analysis of stormwater quality models and the decision making based on modelling outcomes. This paper discusses the outcomes of a research study undertaken to analyse the variability related to pollutant build-up parameters in stormwater quality modelling. The study was based on the analysis of pollutant build-up samples collected from 12 road surfaces in residential, commercial and industrial land uses. It was found that build-up characteristics vary appreciably even within the same land use. Therefore, using land use as a lumped parameter would contribute significant uncertainties in stormwater quality modelling. Additionally, it was also found that the variability in pollutant build-up can also be significant depending on the pollutant type. This underlines the importance of taking into account specific land use characteristics and targeted pollutant species when undertaking uncertainty analysis of stormwater quality models or in interpreting the modelling outcomes.
Resumo:
Advances in algorithms for approximate sampling from a multivariable target function have led to solutions to challenging statistical inference problems that would otherwise not be considered by the applied scientist. Such sampling algorithms are particularly relevant to Bayesian statistics, since the target function is the posterior distribution of the unobservables given the observables. In this thesis we develop, adapt and apply Bayesian algorithms, whilst addressing substantive applied problems in biology and medicine as well as other applications. For an increasing number of high-impact research problems, the primary models of interest are often sufficiently complex that the likelihood function is computationally intractable. Rather than discard these models in favour of inferior alternatives, a class of Bayesian "likelihoodfree" techniques (often termed approximate Bayesian computation (ABC)) has emerged in the last few years, which avoids direct likelihood computation through repeated sampling of data from the model and comparing observed and simulated summary statistics. In Part I of this thesis we utilise sequential Monte Carlo (SMC) methodology to develop new algorithms for ABC that are more efficient in terms of the number of model simulations required and are almost black-box since very little algorithmic tuning is required. In addition, we address the issue of deriving appropriate summary statistics to use within ABC via a goodness-of-fit statistic and indirect inference. Another important problem in statistics is the design of experiments. That is, how one should select the values of the controllable variables in order to achieve some design goal. The presences of parameter and/or model uncertainty are computational obstacles when designing experiments but can lead to inefficient designs if not accounted for correctly. The Bayesian framework accommodates such uncertainties in a coherent way. If the amount of uncertainty is substantial, it can be of interest to perform adaptive designs in order to accrue information to make better decisions about future design points. This is of particular interest if the data can be collected sequentially. In a sense, the current posterior distribution becomes the new prior distribution for the next design decision. Part II of this thesis creates new algorithms for Bayesian sequential design to accommodate parameter and model uncertainty using SMC. The algorithms are substantially faster than previous approaches allowing the simulation properties of various design utilities to be investigated in a more timely manner. Furthermore the approach offers convenient estimation of Bayesian utilities and other quantities that are particularly relevant in the presence of model uncertainty. Finally, Part III of this thesis tackles a substantive medical problem. A neurological disorder known as motor neuron disease (MND) progressively causes motor neurons to no longer have the ability to innervate the muscle fibres, causing the muscles to eventually waste away. When this occurs the motor unit effectively ‘dies’. There is no cure for MND, and fatality often results from a lack of muscle strength to breathe. The prognosis for many forms of MND (particularly amyotrophic lateral sclerosis (ALS)) is particularly poor, with patients usually only surviving a small number of years after the initial onset of disease. Measuring the progress of diseases of the motor units, such as ALS, is a challenge for clinical neurologists. Motor unit number estimation (MUNE) is an attempt to directly assess underlying motor unit loss rather than indirect techniques such as muscle strength assessment, which generally is unable to detect progressions due to the body’s natural attempts at compensation. Part III of this thesis builds upon a previous Bayesian technique, which develops a sophisticated statistical model that takes into account physiological information about motor unit activation and various sources of uncertainties. More specifically, we develop a more reliable MUNE method by applying marginalisation over latent variables in order to improve the performance of a previously developed reversible jump Markov chain Monte Carlo sampler. We make other subtle changes to the model and algorithm to improve the robustness of the approach.
Resumo:
This chapter contains sections titled: Introduction Case study: Estimating transmission rates of nosocomial pathogens Models and methods Data analysis and results Discussion References
Resumo:
In this paper we present a new simulation methodology in order to obtain exact or approximate Bayesian inference for models for low-valued count time series data that have computationally demanding likelihood functions. The algorithm fits within the framework of particle Markov chain Monte Carlo (PMCMC) methods. The particle filter requires only model simulations and, in this regard, our approach has connections with approximate Bayesian computation (ABC). However, an advantage of using the PMCMC approach in this setting is that simulated data can be matched with data observed one-at-a-time, rather than attempting to match on the full dataset simultaneously or on a low-dimensional non-sufficient summary statistic, which is common practice in ABC. For low-valued count time series data we find that it is often computationally feasible to match simulated data with observed data exactly. Our particle filter maintains $N$ particles by repeating the simulation until $N+1$ exact matches are obtained. Our algorithm creates an unbiased estimate of the likelihood, resulting in exact posterior inferences when included in an MCMC algorithm. In cases where exact matching is computationally prohibitive, a tolerance is introduced as per ABC. A novel aspect of our approach is that we introduce auxiliary variables into our particle filter so that partially observed and/or non-Markovian models can be accommodated. We demonstrate that Bayesian model choice problems can be easily handled in this framework.
Resumo:
We estimated the heritability and correlations between body and carcass weight traits in a cultured stock of giant freshwater prawn (GFP) (Macrobrachium rosenbergii) selected for harvest body weight in Vietnam. The data set consisted of 18,387 body and 1,730 carcass records, as well as full pedigree information collected over four generations. Variance and covariance components were estimated by restricted maximum likelihood fitting a multi-trait animal model. Across generations, estimates of heritability for body and carcass weight traits were moderate and ranged from 0.14 to 0.19 and 0.17 to 0.21, respectively. Body trait heritabilities estimated for females were significantly higher than for males whereas carcass weight trait heritabilities estimated for females and males were not significantly different (P>. 0.05). Maternal effects for body traits accounted for 4 to 5% of the total variance and were greater in females than in males. Genetic correlations among body traits were generally high in the mixed sexes. Genetic correlations between body and carcass weight traits were also high. Although some issues remain regarding the best statistical model to be fitted to GFP data, our results suggest that selection for high harvest body weight based on breeding values estimated by fitting an animal model to the data can significantly improve mean body and carcass weight in GFP.
Resumo:
Current Bayesian network software packages provide good graphical interface for users who design and develop Bayesian networks for various applications. However, the intended end-users of these networks may not necessarily find such an interface appealing and at times it could be overwhelming, particularly when the number of nodes in the network is large. To circumvent this problem, this paper presents an intuitive dashboard, which provides an additional layer of abstraction, enabling the end-users to easily perform inferences over the Bayesian networks. Unlike most software packages, which display the nodes and arcs of the network, the developed tool organises the nodes based on the cause-and-effect relationship, making the user-interaction more intuitive and friendly. In addition to performing various types of inferences, the users can conveniently use the tool to verify the behaviour of the developed Bayesian network. The tool has been developed using QT and SMILE libraries in C++.
Resumo:
Indirect inference (II) is a methodology for estimating the parameters of an intractable (generative) model on the basis of an alternative parametric (auxiliary) model that is both analytically and computationally easier to deal with. Such an approach has been well explored in the classical literature but has received substantially less attention in the Bayesian paradigm. The purpose of this paper is to compare and contrast a collection of what we call parametric Bayesian indirect inference (pBII) methods. One class of pBII methods uses approximate Bayesian computation (referred to here as ABC II) where the summary statistic is formed on the basis of the auxiliary model, using ideas from II. Another approach proposed in the literature, referred to here as parametric Bayesian indirect likelihood (pBIL), we show to be a fundamentally different approach to ABC II. We devise new theoretical results for pBIL to give extra insights into its behaviour and also its differences with ABC II. Furthermore, we examine in more detail the assumptions required to use each pBII method. The results, insights and comparisons developed in this paper are illustrated on simple examples and two other substantive applications. The first of the substantive examples involves performing inference for complex quantile distributions based on simulated data while the second is for estimating the parameters of a trivariate stochastic process describing the evolution of macroparasites within a host based on real data. We create a novel framework called Bayesian indirect likelihood (BIL) which encompasses pBII as well as general ABC methods so that the connections between the methods can be established.
Resumo:
This paper addresses one of the foundational components of beginning infernce, namely variation, with 5 classes of Year 4 students undertaking a measurement activity using scaled instruments in two contexts: all students measuring one person's arm span and recording the values obtained, and each student having his/her own arm span measured and recorded. The results included documentation of students' explicit appreciation of the variety of ways in which varitation can occur, including outliers, and their ability to create and describe valid representations of their data.
Resumo:
In this paper, we propose a novel online hidden Markov model (HMM) parameter estimator based on Kerridge inaccuracy rate (KIR) concepts. Under mild identifiability conditions, we prove that our online KIR-based estimator is strongly consistent. In simulation studies, we illustrate the convergence behaviour of our proposed online KIR-based estimator and provide a counter-example illustrating the local convergence properties of the well known recursive maximum likelihood estimator (arguably the best existing solution).
Resumo:
A major challenge for robot localization and mapping systems is maintaining reliable operation in a changing environment. Vision-based systems in particular are susceptible to changes in illumination and weather, and the same location at another time of day may appear radically different to a system using a feature-based visual localization system. One approach for mapping changing environments is to create and maintain maps that contain multiple representations of each physical location in a topological framework or manifold. However, this requires the system to be able to correctly link two or more appearance representations to the same spatial location, even though the representations may appear quite dissimilar. This paper proposes a method of linking visual representations from the same location without requiring a visual match, thereby allowing vision-based localization systems to create multiple appearance representations of physical locations. The most likely position on the robot path is determined using particle filter methods based on dead reckoning data and recent visual loop closures. In order to avoid erroneous loop closures, the odometry-based inferences are only accepted when the inferred path's end point is confirmed as correct by the visual matching system. Algorithm performance is demonstrated using an indoor robot dataset and a large outdoor camera dataset.
Resumo:
The study of the relationship between macroscopic traffic parameters, such as flow, speed and travel time, is essential to the understanding of the behaviour of freeway and arterial roads. However, the temporal dynamics of these parameters are difficult to model, especially for arterial roads, where the process of traffic change is driven by a variety of variables. The introduction of the Bluetooth technology into the transportation area has proven exceptionally useful for monitoring vehicular traffic, as it allows reliable estimation of travel times and traffic demands. In this work, we propose an approach based on Bayesian networks for analyzing and predicting the complex dynamics of flow or volume, based on travel time observations from Bluetooth sensors. The spatio-temporal relationship between volume and travel time is captured through a first-order transition model, and a univariate Gaussian sensor model. The two models are trained and tested on travel time and volume data, from an arterial link, collected over a period of six days. To reduce the computational costs of the inference tasks, volume is converted into a discrete variable. The discretization process is carried out through a Self-Organizing Map. Preliminary results show that a simple Bayesian network can effectively estimate and predict the complex temporal dynamics of arterial volumes from the travel time data. Not only is the model well suited to produce posterior distributions over single past, current and future states; but it also allows computing the estimations of joint distributions, over sequences of states. Furthermore, the Bayesian network can achieve excellent prediction, even when the stream of travel time observation is partially incomplete.
Resumo:
Many model-based investigation techniques, such as sensitivity analysis, optimization, and statistical inference, require a large number of model evaluations to be performed at different input and/or parameter values. This limits the application of these techniques to models that can be implemented in computationally efficient computer codes. Emulators, by providing efficient interpolation between outputs of deterministic simulation models, can considerably extend the field of applicability of such computationally demanding techniques. So far, the dominant techniques for developing emulators have been priors in the form of Gaussian stochastic processes (GASP) that were conditioned with a design data set of inputs and corresponding model outputs. In the context of dynamic models, this approach has two essential disadvantages: (i) these emulators do not consider our knowledge of the structure of the model, and (ii) they run into numerical difficulties if there are a large number of closely spaced input points as is often the case in the time dimension of dynamic models. To address both of these problems, a new concept of developing emulators for dynamic models is proposed. This concept is based on a prior that combines a simplified linear state space model of the temporal evolution of the dynamic model with Gaussian stochastic processes for the innovation terms as functions of model parameters and/or inputs. These innovation terms are intended to correct the error of the linear model at each output step. Conditioning this prior to the design data set is done by Kalman smoothing. This leads to an efficient emulator that, due to the consideration of our knowledge about dominant mechanisms built into the simulation model, can be expected to outperform purely statistical emulators at least in cases in which the design data set is small. The feasibility and potential difficulties of the proposed approach are demonstrated by the application to a simple hydrological model.