27 resultados para Asymptotic behaviour, Bayesian methods, Mixture models, Overfitting, Posterior concentration

em Aston University Research Archive


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Bayesian techniques have been developed over many years in a range of different fields, but have only recently been applied to the problem of learning in neural networks. As well as providing a consistent framework for statistical pattern recognition, the Bayesian approach offers a number of practical advantages including a potential solution to the problem of over-fitting. This chapter aims to provide an introductory overview of the application of Bayesian methods to neural networks. It assumes the reader is familiar with standard feed-forward network models and how to train them using conventional techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Bayesian techniques have been developed over many years in a range of different fields, but have only recently been applied to the problem of learning in neural networks. As well as providing a consistent framework for statistical pattern recognition, the Bayesian approach offers a number of practical advantages including a potential solution to the problem of over-fitting. This chapter aims to provide an introductory overview of the application of Bayesian methods to neural networks. It assumes the reader is familiar with standard feed-forward network models and how to train them using conventional techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The problem of evaluating different learning rules and other statistical estimators is analysed. A new general theory of statistical inference is developed by combining Bayesian decision theory with information geometry. It is coherent and invariant. For each sample a unique ideal estimate exists and is given by an average over the posterior. An optimal estimate within a model is given by a projection of the ideal estimate. The ideal estimate is a sufficient statistic of the posterior, so practical learning rules are functions of the ideal estimator. If the sole purpose of learning is to extract information from the data, the learning rule must also approximate the ideal estimator. This framework is applicable to both Bayesian and non-Bayesian methods, with arbitrary statistical models, and to supervised, unsupervised and reinforcement learning schemes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis is concerned with approximate inference in dynamical systems, from a variational Bayesian perspective. When modelling real world dynamical systems, stochastic differential equations appear as a natural choice, mainly because of their ability to model the noise of the system by adding a variant of some stochastic process to the deterministic dynamics. Hence, inference in such processes has drawn much attention. Here two new extended frameworks are derived and presented that are based on basis function expansions and local polynomial approximations of a recently proposed variational Bayesian algorithm. It is shown that the new extensions converge to the original variational algorithm and can be used for state estimation (smoothing). However, the main focus is on estimating the (hyper-) parameters of these systems (i.e. drift parameters and diffusion coefficients). The new methods are numerically validated on a range of different systems which vary in dimensionality and non-linearity. These are the Ornstein-Uhlenbeck process, for which the exact likelihood can be computed analytically, the univariate and highly non-linear, stochastic double well and the multivariate chaotic stochastic Lorenz '63 (3-dimensional model). The algorithms are also applied to the 40 dimensional stochastic Lorenz '96 system. In this investigation these new approaches are compared with a variety of other well known methods such as the ensemble Kalman filter / smoother, a hybrid Monte Carlo sampler, the dual unscented Kalman filter (for jointly estimating the systems states and model parameters) and full weak-constraint 4D-Var. Empirical analysis of their asymptotic behaviour as a function of observation density or length of time window increases is provided.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work is concerned with approximate inference in dynamical systems, from a variational Bayesian perspective. When modelling real world dynamical systems, stochastic differential equations appear as a natural choice, mainly because of their ability to model the noise of the system by adding a variation of some stochastic process to the deterministic dynamics. Hence, inference in such processes has drawn much attention. Here a new extended framework is derived that is based on a local polynomial approximation of a recently proposed variational Bayesian algorithm. The paper begins by showing that the new extension of this variational algorithm can be used for state estimation (smoothing) and converges to the original algorithm. However, the main focus is on estimating the (hyper-) parameters of these systems (i.e. drift parameters and diffusion coefficients). The new approach is validated on a range of different systems which vary in dimensionality and non-linearity. These are the Ornstein–Uhlenbeck process, the exact likelihood of which can be computed analytically, the univariate and highly non-linear, stochastic double well and the multivariate chaotic stochastic Lorenz ’63 (3D model). As a special case the algorithm is also applied to the 40 dimensional stochastic Lorenz ’96 system. In our investigation we compare this new approach with a variety of other well known methods, such as the hybrid Monte Carlo, dual unscented Kalman filter, full weak-constraint 4D-Var algorithm and analyse empirically their asymptotic behaviour as a function of observation density or length of time window increases. In particular we show that we are able to estimate parameters in both the drift (deterministic) and the diffusion (stochastic) part of the model evolution equations using our new methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Two probabilistic interpretations of the n-tuple recognition method are put forward in order to allow this technique to be analysed with the same Bayesian methods used in connection with other neural network models. Elementary demonstrations are then given of the use of maximum likelihood and maximum entropy methods for tuning the model parameters and assisting their interpretation. One of the models can be used to illustrate the significance of overlapping n-tuple samples with respect to correlations in the patterns.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The principled statistical application of Gaussian random field models used in geostatistics has historically been limited to data sets of a small size. This limitation is imposed by the requirement to store and invert the covariance matrix of all the samples to obtain a predictive distribution at unsampled locations, or to use likelihood-based covariance estimation. Various ad hoc approaches to solve this problem have been adopted, such as selecting a neighborhood region and/or a small number of observations to use in the kriging process, but these have no sound theoretical basis and it is unclear what information is being lost. In this article, we present a Bayesian method for estimating the posterior mean and covariance structures of a Gaussian random field using a sequential estimation algorithm. By imposing sparsity in a well-defined framework, the algorithm retains a subset of “basis vectors” that best represent the “true” posterior Gaussian random field model in the relative entropy sense. This allows a principled treatment of Gaussian random field models on very large data sets. The method is particularly appropriate when the Gaussian random field model is regarded as a latent variable model, which may be nonlinearly related to the observations. We show the application of the sequential, sparse Bayesian estimation in Gaussian random field models and discuss its merits and drawbacks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The assessment of the reliability of systems which learn from data is a key issue to investigate thoroughly before the actual application of information processing techniques to real-world problems. Over the recent years Gaussian processes and Bayesian neural networks have come to the fore and in this thesis their generalisation capabilities are analysed from theoretical and empirical perspectives. Upper and lower bounds on the learning curve of Gaussian processes are investigated in order to estimate the amount of data required to guarantee a certain level of generalisation performance. In this thesis we analyse the effects on the bounds and the learning curve induced by the smoothness of stochastic processes described by four different covariance functions. We also explain the early, linearly-decreasing behaviour of the curves and we investigate the asymptotic behaviour of the upper bounds. The effect of the noise and the characteristic lengthscale of the stochastic process on the tightness of the bounds are also discussed. The analysis is supported by several numerical simulations. The generalisation error of a Gaussian process is affected by the dimension of the input vector and may be decreased by input-variable reduction techniques. In conventional approaches to Gaussian process regression, the positive definite matrix estimating the distance between input points is often taken diagonal. In this thesis we show that a general distance matrix is able to estimate the effective dimensionality of the regression problem as well as to discover the linear transformation from the manifest variables to the hidden-feature space, with a significant reduction of the input dimension. Numerical simulations confirm the significant superiority of the general distance matrix with respect to the diagonal one.In the thesis we also present an empirical investigation of the generalisation errors of neural networks trained by two Bayesian algorithms, the Markov Chain Monte Carlo method and the evidence framework; the neural networks have been trained on the task of labelling segmented outdoor images.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tne object of this research was to investigate the behaviour of birdcage scaffolding as used in falsework structures, assess the suitability of existing design methods and make recommendations for a set of design rules. Since excessive deflection is as undesirable in a structure as total collapse, the project was divided into two sections. These were to determine the ultimate vertical and horizontal load-carrying capacity and also the deflection characteristics of any falsework. So theoretical analyses were developed to ascertain the ability of both the individual standards to resist vertical load, and of the bracing to resist horizontal load.Furthermore a model was evolved which would predict the horizontal deflection of a scaffold under load using strain energy methods. These models were checked by three series of experiments. The first was on individual standards under vertical load only. The second series was carried out on full scale falsework structures loading vertically and horizontally to failure. Finally experiments were conducted on scaffold couplers to provide additional verification of the method of predicting deflections. This thesis gives the history of the project and an introduction into the field of scaffolding. It details both the experiments conducted and the theories developed and the correlation between theory and experiment. Finally it makes recommendations for a design method to be employed by scaffolding designers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present results that compare the performance of neural networks trained with two Bayesian methods, (i) the Evidence Framework of MacKay (1992) and (ii) a Markov Chain Monte Carlo method due to Neal (1996) on a task of classifying segmented outdoor images. We also investigate the use of the Automatic Relevance Determination method for input feature selection.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Following adaptation to an oriented (1-d) signal in central vision, the orientation of subsequently viewed test signals may appear repelled away from or attracted towards the adapting orientation. Small angular differences between the adaptor and test yield 'repulsive' shifts, while large angular differences yield 'attractive' shifts. In peripheral vision, however, both small and large angular differences yield repulsive shifts. To account for these tilt after-effects (TAEs), a cascaded model of orientation estimation that is optimized using hierarchical Bayesian methods is proposed. The model accounts for orientation bias through adaptation-induced losses in information that arise because of signal uncertainties and neural constraints placed upon the propagation of visual information. Repulsive (direct) TAEs arise at early stages of visual processing from adaptation of orientation-selective units with peak sensitivity at the orientation of the adaptor (theta). Attractive (indirect) TAEs result from adaptation of second-stage units with peak sensitivity at theta and theta+90 degrees , which arise from an efficient stage of linear compression that pools across the responses of the first-stage orientation-selective units. A spatial orientation vector is estimated from the transformed oriented unit responses. The change from attractive to repulsive TAEs in peripheral vision can be explained by the differing harmonic biases resulting from constraints on signal power (in central vision) versus signal uncertainties in orientation (in peripheral vision). The proposed model is consistent with recent work by computational neuroscientists in supposing that visual bias reflects the adjustment of a rational system in the light of uncertain signals and system constraints.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conventional feed forward Neural Networks have used the sum-of-squares cost function for training. A new cost function is presented here with a description length interpretation based on Rissanen's Minimum Description Length principle. It is a heuristic that has a rough interpretation as the number of data points fit by the model. Not concerned with finding optimal descriptions, the cost function prefers to form minimum descriptions in a naive way for computational convenience. The cost function is called the Naive Description Length cost function. Finding minimum description models will be shown to be closely related to the identification of clusters in the data. As a consequence the minimum of this cost function approximates the most probable mode of the data rather than the sum-of-squares cost function that approximates the mean. The new cost function is shown to provide information about the structure of the data. This is done by inspecting the dependence of the error to the amount of regularisation. This structure provides a method of selecting regularisation parameters as an alternative or supplement to Bayesian methods. The new cost function is tested on a number of multi-valued problems such as a simple inverse kinematics problem. It is also tested on a number of classification and regression problems. The mode-seeking property of this cost function is shown to improve prediction in time series problems. Description length principles are used in a similar fashion to derive a regulariser to control network complexity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The complexity of adapting software during runtime has spawned interest in how models can be used to validate, monitor and adapt runtime behaviour. The use of models during runtime extends the use of modeling techniques beyond the design and implementation phases. The goal of this workshop is to look at issues related to developing appropriate modeldriven approaches to managing and monitoring the execution of systems and, also, to allow the system to reason about itself. We aim to continue the discussion of research ideas and proposals from researchers who work in relevant areas such as MDE, software architectures, reflection, and autonomic and self-adaptive systems, and provide a 'state-of-the-art' research assessment expressed in terms of challenges and achievements.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Principal component analysis (PCA) is one of the most popular techniques for processing, compressing and visualising data, although its effectiveness is limited by its global linearity. While nonlinear variants of PCA have been proposed, an alternative paradigm is to capture data complexity by a combination of local linear PCA projections. However, conventional PCA does not correspond to a probability density, and so there is no unique way to combine PCA models. Previous attempts to formulate mixture models for PCA have therefore to some extent been ad hoc. In this paper, PCA is formulated within a maximum-likelihood framework, based on a specific form of Gaussian latent variable model. This leads to a well-defined mixture model for probabilistic principal component analysers, whose parameters can be determined using an EM algorithm. We discuss the advantages of this model in the context of clustering, density modelling and local dimensionality reduction, and we demonstrate its application to image compression and handwritten digit recognition.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a novel approach to water pollution detection from remotely sensed low-platform mounted visible band camera images. We examine the feasibility of unsupervised segmentation for slick (oily spills on the water surface) region labelling. Adaptive and non adaptive filtering is combined with density modeling of the obtained textural features. A particular effort is concentrated on the textural feature extraction from raw intensity images using filter banks and adaptive feature extraction from the obtained output coefficients. Segmentation in the extracted feature space is achieved using Gaussian mixture models (GMM).