24 resultados para bayesian inference
em Aston University Research Archive
Resumo:
In many problems in spatial statistics it is necessary to infer a global problem solution by combining local models. A principled approach to this problem is to develop a global probabilistic model for the relationships between local variables and to use this as the prior in a Bayesian inference procedure. We show how a Gaussian process with hyper-parameters estimated from Numerical Weather Prediction Models yields meteorologically convincing wind fields. We use neural networks to make local estimates of wind vector probabilities. The resulting inference problem cannot be solved analytically, but Markov Chain Monte Carlo methods allow us to retrieve accurate wind fields.
Resumo:
Efficient new Bayesian inference technique is employed for studying critical properties of the Ising linear perceptron and for signal detection in code division multiple access (CDMA). The approach is based on a recently introduced message passing technique for densely connected systems. Here we study both critical and non-critical regimes. Results obtained in the non-critical regime give rise to a highly efficient signal detection algorithm in the context of CDMA; while in the critical regime one observes a first-order transition line that ends in a continuous phase transition point. Finite size effects are also studied. © 2006 Elsevier B.V. All rights reserved.
Resumo:
In many problems in spatial statistics it is necessary to infer a global problem solution by combining local models. A principled approach to this problem is to develop a global probabilistic model for the relationships between local variables and to use this as the prior in a Bayesian inference procedure. We show how a Gaussian process with hyper-parameters estimated from Numerical Weather Prediction Models yields meteorologically convincing wind fields. We use neural networks to make local estimates of wind vector probabilities. The resulting inference problem cannot be solved analytically, but Markov Chain Monte Carlo methods allow us to retrieve accurate wind fields.
Resumo:
In this paper, we present a framework for Bayesian inference in continuous-time diffusion processes. The new method is directly related to the recently proposed variational Gaussian Process approximation (VGPA) approach to Bayesian smoothing of partially observed diffusions. By adopting a basis function expansion (BF-VGPA), both the time-dependent control parameters of the approximate GP process and its moment equations are projected onto a lower-dimensional subspace. This allows us both to reduce the computational complexity and to eliminate the time discretisation used in the previous algorithm. The new algorithm is tested on an Ornstein-Uhlenbeck process. Our preliminary results show that BF-VGPA algorithm provides a reasonably accurate state estimation using a small number of basis functions.
Resumo:
This work introduces a new variational Bayes data assimilation method for the stochastic estimation of precipitation dynamics using radar observations for short term probabilistic forecasting (nowcasting). A previously developed spatial rainfall model based on the decomposition of the observed precipitation field using a basis function expansion captures the precipitation intensity from radar images as a set of ‘rain cells’. The prior distributions for the basis function parameters are carefully chosen to have a conjugate structure for the precipitation field model to allow a novel variational Bayes method to be applied to estimate the posterior distributions in closed form, based on solving an optimisation problem, in a spirit similar to 3D VAR analysis, but seeking approximations to the posterior distribution rather than simply the most probable state. A hierarchical Kalman filter is used to estimate the advection field based on the assimilated precipitation fields at two times. The model is applied to tracking precipitation dynamics in a realistic setting, using UK Met Office radar data from both a summer convective event and a winter frontal event. The performance of the model is assessed both traditionally and using probabilistic measures of fit based on ROC curves. The model is shown to provide very good assimilation characteristics, and promising forecast skill. Improvements to the forecasting scheme are discussed
Resumo:
An efficient Bayesian inference method for problems that can be mapped onto dense graphs is presented. The approach is based on message passing where messages are averaged over a large number of replicated variable systems exposed to the same evidential nodes. An assumption about the symmetry of the solutions is required for carrying out the averages; here we extend the previous derivation based on a replica-symmetric- (RS)-like structure to include a more complex one-step replica-symmetry-breaking-like (1RSB-like) ansatz. To demonstrate the potential of the approach it is employed for studying critical properties of the Ising linear perceptron and for multiuser detection in code division multiple access (CDMA) under different noise models. Results obtained under the RS assumption in the noncritical regime give rise to a highly efficient signal detection algorithm in the context of CDMA; while in the critical regime one observes a first-order transition line that ends in a continuous phase transition point. Finite size effects are also observed. While the 1RSB ansatz is not required for the original problems, it was applied to the CDMA signal detection problem with a more complex noise model that exhibits RSB behavior, resulting in an improvement in performance. © 2007 The American Physical Society.
Resumo:
The generative topographic mapping (GTM) model was introduced by Bishop et al. (1998, Neural Comput. 10(1), 215-234) as a probabilistic re- formulation of the self-organizing map (SOM). It offers a number of advantages compared with the standard SOM, and has already been used in a variety of applications. In this paper we report on several extensions of the GTM, including an incremental version of the EM algorithm for estimating the model parameters, the use of local subspace models, extensions to mixed discrete and continuous data, semi-linear models which permit the use of high-dimensional manifolds whilst avoiding computational intractability, Bayesian inference applied to hyper-parameters, and an alternative framework for the GTM based on Gaussian processes. All of these developments directly exploit the probabilistic structure of the GTM, thereby allowing the underlying modelling assumptions to be made explicit. They also highlight the advantages of adopting a consistent probabilistic framework for the formulation of pattern recognition algorithms.
Resumo:
It is generally assumed when using Bayesian inference methods for neural networks that the input data contains no noise or corruption. For real-world (errors in variable) problems this is clearly an unsafe assumption. This paper presents a Bayesian neural network framework which allows for input noise given that some model of the noise process exists. In the limit where this noise process is small and symmetric it is shown, using the Laplace approximation, that there is an additional term to the usual Bayesian error bar which depends on the variance of the input noise process. Further, by treating the true (noiseless) input as a hidden variable and sampling this jointly with the network's weights, using Markov Chain Monte Carlo methods, it is demonstrated that it is possible to infer the unbiassed regression over the noiseless input.
Resumo:
It is generally assumed when using Bayesian inference methods for neural networks that the input data contains no noise. For real-world (errors in variable) problems this is clearly an unsafe assumption. This paper presents a Bayesian neural network framework which accounts for input noise provided that a model of the noise process exists. In the limit where the noise process is small and symmetric it is shown, using the Laplace approximation, that this method adds an extra term to the usual Bayesian error bar which depends on the variance of the input noise process. Further, by treating the true (noiseless) input as a hidden variable, and sampling this jointly with the network’s weights, using a Markov chain Monte Carlo method, it is demonstrated that it is possible to infer the regression over the noiseless input. This leads to the possibility of training an accurate model of a system using less accurate, or more uncertain, data. This is demonstrated on both the, synthetic, noisy sine wave problem and a real problem of inferring the forward model for a satellite radar backscatter system used to predict sea surface wind vectors.
Resumo:
Large monitoring networks are becoming increasingly common and can generate large datasets from thousands to millions of observations in size, often with high temporal resolution. Processing large datasets using traditional geostatistical methods is prohibitively slow and in real world applications different types of sensor can be found across a monitoring network. Heterogeneities in the error characteristics of different sensors, both in terms of distribution and magnitude, presents problems for generating coherent maps. An assumption in traditional geostatistics is that observations are made directly of the underlying process being studied and that the observations are contaminated with Gaussian errors. Under this assumption, sub–optimal predictions will be obtained if the error characteristics of the sensor are effectively non–Gaussian. One method, model based geostatistics, assumes that a Gaussian process prior is imposed over the (latent) process being studied and that the sensor model forms part of the likelihood term. One problem with this type of approach is that the corresponding posterior distribution will be non–Gaussian and computationally demanding as Monte Carlo methods have to be used. An extension of a sequential, approximate Bayesian inference method enables observations with arbitrary likelihoods to be treated, in a projected process kriging framework which is less computationally intensive. The approach is illustrated using a simulated dataset with a range of sensor models and error characteristics.
Resumo:
We present and analyze three different online algorithms for learning in discrete Hidden Markov Models (HMMs) and compare their performance with the Baldi-Chauvin Algorithm. Using the Kullback-Leibler divergence as a measure of the generalization error we draw learning curves in simplified situations and compare the results. The performance for learning drifting concepts of one of the presented algorithms is analyzed and compared with the Baldi-Chauvin algorithm in the same situations. A brief discussion about learning and symmetry breaking based on our results is also presented. © 2006 American Institute of Physics.
Resumo:
Bayesian algorithms pose a limit to the performance learning algorithms can achieve. Natural selection should guide the evolution of information processing systems towards those limits. What can we learn from this evolution and what properties do the intermediate stages have? While this question is too general to permit any answer, progress can be made by restricting the class of information processing systems under study. We present analytical and numerical results for the evolution of on-line algorithms for learning from examples for neural network classifiers, which might include or not a hidden layer. The analytical results are obtained by solving a variational problem to determine the learning algorithm that leads to maximum generalization ability. Simulations using evolutionary programming, for programs that implement learning algorithms, confirm and expand the results. The principal result is not just that the evolution is towards a Bayesian limit. Indeed it is essentially reached. In addition we find that evolution is driven by the discovery of useful structures or combinations of variables and operators. In different runs the temporal order of the discovery of such combinations is unique. The main result is that combinations that signal the surprise brought by an example arise always before combinations that serve to gauge the performance of the learning algorithm. This latter structures can be used to implement annealing schedules. The temporal ordering can be understood analytically as well by doing the functional optimization in restricted functional spaces. We also show that there is data suggesting that the appearance of these traits also follows the same temporal ordering in biological systems. © 2006 American Institute of Physics.
Resumo:
This thesis is concerned with approximate inference in dynamical systems, from a variational Bayesian perspective. When modelling real world dynamical systems, stochastic differential equations appear as a natural choice, mainly because of their ability to model the noise of the system by adding a variant of some stochastic process to the deterministic dynamics. Hence, inference in such processes has drawn much attention. Here two new extended frameworks are derived and presented that are based on basis function expansions and local polynomial approximations of a recently proposed variational Bayesian algorithm. It is shown that the new extensions converge to the original variational algorithm and can be used for state estimation (smoothing). However, the main focus is on estimating the (hyper-) parameters of these systems (i.e. drift parameters and diffusion coefficients). The new methods are numerically validated on a range of different systems which vary in dimensionality and non-linearity. These are the Ornstein-Uhlenbeck process, for which the exact likelihood can be computed analytically, the univariate and highly non-linear, stochastic double well and the multivariate chaotic stochastic Lorenz '63 (3-dimensional model). The algorithms are also applied to the 40 dimensional stochastic Lorenz '96 system. In this investigation these new approaches are compared with a variety of other well known methods such as the ensemble Kalman filter / smoother, a hybrid Monte Carlo sampler, the dual unscented Kalman filter (for jointly estimating the systems states and model parameters) and full weak-constraint 4D-Var. Empirical analysis of their asymptotic behaviour as a function of observation density or length of time window increases is provided.
Resumo:
The problem of evaluating different learning rules and other statistical estimators is analysed. A new general theory of statistical inference is developed by combining Bayesian decision theory with information geometry. It is coherent and invariant. For each sample a unique ideal estimate exists and is given by an average over the posterior. An optimal estimate within a model is given by a projection of the ideal estimate. The ideal estimate is a sufficient statistic of the posterior, so practical learning rules are functions of the ideal estimator. If the sole purpose of learning is to extract information from the data, the learning rule must also approximate the ideal estimator. This framework is applicable to both Bayesian and non-Bayesian methods, with arbitrary statistical models, and to supervised, unsupervised and reinforcement learning schemes.