932 resultados para Learning Bayesian Networks


Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis is a study of the generation of topographic mappings - dimension reducing transformations of data that preserve some element of geometric structure - with feed-forward neural networks. As an alternative to established methods, a transformational variant of Sammon's method is proposed, where the projection is effected by a radial basis function neural network. This approach is related to the statistical field of multidimensional scaling, and from that the concept of a 'subjective metric' is defined, which permits the exploitation of additional prior knowledge concerning the data in the mapping process. This then enables the generation of more appropriate feature spaces for the purposes of enhanced visualisation or subsequent classification. A comparison with established methods for feature extraction is given for data taken from the 1992 Research Assessment Exercise for higher educational institutions in the United Kingdom. This is a difficult high-dimensional dataset, and illustrates well the benefit of the new topographic technique. A generalisation of the proposed model is considered for implementation of the classical multidimensional scaling (¸mds}) routine. This is related to Oja's principal subspace neural network, whose learning rule is shown to descend the error surface of the proposed ¸mds model. Some of the technical issues concerning the design and training of topographic neural networks are investigated. It is shown that neural network models can be less sensitive to entrapment in the sub-optimal global minima that badly affect the standard Sammon algorithm, and tend to exhibit good generalisation as a result of implicit weight decay in the training process. It is further argued that for ideal structure retention, the network transformation should be perfectly smooth for all inter-data directions in input space. Finally, there is a critique of optimisation techniques for topographic mappings, and a new training algorithm is proposed. A convergence proof is given, and the method is shown to produce lower-error mappings more rapidly than previous algorithms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

It is well known that one of the obstacles to effective forecasting of exchange rates is heteroscedasticity (non-stationary conditional variance). The autoregressive conditional heteroscedastic (ARCH) model and its variants have been used to estimate a time dependent variance for many financial time series. However, such models are essentially linear in form and we can ask whether a non-linear model for variance can improve results just as non-linear models (such as neural networks) for the mean have done. In this paper we consider two neural network models for variance estimation. Mixture Density Networks (Bishop 1994, Nix and Weigend 1994) combine a Multi-Layer Perceptron (MLP) and a mixture model to estimate the conditional data density. They are trained using a maximum likelihood approach. However, it is known that maximum likelihood estimates are biased and lead to a systematic under-estimate of variance. More recently, a Bayesian approach to parameter estimation has been developed (Bishop and Qazaz 1996) that shows promise in removing the maximum likelihood bias. However, up to now, this model has not been used for time series prediction. Here we compare these algorithms with two other models to provide benchmark results: a linear model (from the ARIMA family), and a conventional neural network trained with a sum-of-squares error function (which estimates the conditional mean of the time series with a constant variance noise model). This comparison is carried out on daily exchange rate data for five currencies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

For neural networks with a wide class of weight priors, it can be shown that in the limit of an infinite number of hidden units, the prior over functions tends to a gaussian process. In this article, analytic forms are derived for the covariance function of the gaussian processes corresponding to networks with sigmoidal and gaussian hidden units. This allows predictions to be made efficiently using networks with an infinite number of hidden units and shows, somewhat paradoxically, that it may be easier to carry out Bayesian prediction with infinite networks rather than finite ones.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We analyse the dynamics of a number of second order on-line learning algorithms training multi-layer neural networks, using the methods of statistical mechanics. We first consider on-line Newton's method, which is known to provide optimal asymptotic performance. We determine the asymptotic generalization error decay for a soft committee machine, which is shown to compare favourably with the result for standard gradient descent. Matrix momentum provides a practical approximation to this method by allowing an efficient inversion of the Hessian. We consider an idealized matrix momentum algorithm which requires access to the Hessian and find close correspondence with the dynamics of on-line Newton's method. In practice, the Hessian will not be known on-line and we therefore consider matrix momentum using a single example approximation to the Hessian. In this case good asymptotic performance may still be achieved, but the algorithm is now sensitive to parameter choice because of noise in the Hessian estimate. On-line Newton's method is not appropriate during the transient learning phase, since a suboptimal unstable fixed point of the gradient descent dynamics becomes stable for this algorithm. A principled alternative is to use Amari's natural gradient learning algorithm and we show how this method provides a significant reduction in learning time when compared to gradient descent, while retaining the asymptotic performance of on-line Newton's method.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In developing neural network techniques for real world applications it is still very rare to see estimates of confidence placed on the neural network predictions. This is a major deficiency, especially in safety-critical systems. In this paper we explore three distinct methods of producing point-wise confidence intervals using neural networks. We compare and contrast Bayesian, Gaussian Process and Predictive error bars evaluated on real data. The problem domain is concerned with the calibration of a real automotive engine management system for both air-fuel ratio determination and on-line ignition timing. This problem requires real-time control and is a good candidate for exploring the use of confidence predictions due to its safety-critical nature.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We analyse natural gradient learning in a two-layer feed-forward neural network using a statistical mechanics framework which is appropriate for large input dimension. We find significant improvement over standard gradient descent in both the transient and asymptotic phases of learning.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In many problems in spatial statistics it is necessary to infer a global problem solution by combining local models. A principled approach to this problem is to develop a global probabilistic model for the relationships between local variables and to use this as the prior in a Bayesian inference procedure. We show how a Gaussian process with hyper-parameters estimated from Numerical Weather Prediction Models yields meteorologically convincing wind fields. We use neural networks to make local estimates of wind vector probabilities. The resulting inference problem cannot be solved analytically, but Markov Chain Monte Carlo methods allow us to retrieve accurate wind fields.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The ERS-1 Satellite was launched in July 1991 by the European Space Agency into a polar orbit at about km800, carrying a C-band scatterometer. A scatterometer measures the amount of radar back scatter generated by small ripples on the ocean surface induced by instantaneous local winds. Operational methods that extract wind vectors from satellite scatterometer data are based on the local inversion of a forward model, mapping scatterometer observations to wind vectors, by the minimisation of a cost function in the scatterometer measurement space.par This report uses mixture density networks, a principled method for modelling conditional probability density functions, to model the joint probability distribution of the wind vectors given the satellite scatterometer measurements in a single cell (the `inverse' problem). The complexity of the mapping and the structure of the conditional probability density function are investigated by varying the number of units in the hidden layer of the multi-layer perceptron and the number of kernels in the Gaussian mixture model of the mixture density network respectively. The optimal model for networks trained per trace has twenty hidden units and four kernels. Further investigation shows that models trained with incidence angle as an input have results comparable to those models trained by trace. A hybrid mixture density network that incorporates geophysical knowledge of the problem confirms other results that the conditional probability distribution is dominantly bimodal.par The wind retrieval results improve on previous work at Aston, but do not match other neural network techniques that use spatial information in the inputs, which is to be expected given the ambiguity of the inverse problem. Current work uses the local inverse model for autonomous ambiguity removal in a principled Bayesian framework. Future directions in which these models may be improved are given.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The dynamics of on-line learning is investigated for structurally unrealizable tasks in the context of two-layer neural networks with an arbitrary number of hidden neurons. Within a statistical mechanics framework, a closed set of differential equations describing the learning dynamics can be derived, for the general case of unrealizable isotropic tasks. In the asymptotic regime one can solve the dynamics analytically in the limit of large number of hidden neurons, providing an analytical expression for the residual generalization error, the optimal and critical asymptotic training parameters, and the corresponding prefactor of the generalization error decay.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The dynamics of supervised learning in layered neural networks were studied in the regime where the size of the training set is proportional to the number of inputs. The evolution of macroscopic observables, including the two relevant performance measures can be predicted by using the dynamical replica theory. Three approximation schemes aimed at eliminating the need to solve a functional saddle-point equation at each time step have been derived.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We study the dynamics of on-line learning in multilayer neural networks where training examples are sampled with repetition and where the number of examples scales with the number of network weights. The analysis is carried out using the dynamical replica method aimed at obtaining a closed set of coupled equations for a set of macroscopic variables from which both training and generalization errors can be calculated. We focus on scenarios whereby training examples are corrupted by additive Gaussian output noise and regularizers are introduced to improve the network performance. The dependence of the dynamics on the noise level, with and without regularizers, is examined, as well as that of the asymptotic values obtained for both training and generalization errors. We also demonstrate the ability of the method to approximate the learning dynamics in structurally unrealizable scenarios. The theoretical results show good agreement with those obtained by computer simulations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we review recent theoretical approaches for analysing the dynamics of on-line learning in multilayer neural networks using methods adopted from statistical physics. The analysis is based on monitoring a set of macroscopic variables from which the generalisation error can be calculated. A closed set of dynamical equations for the macroscopic variables is derived analytically and solved numerically. The theoretical framework is then employed for defining optimal learning parameters and for analysing the incorporation of second order information into the learning process using natural gradient descent and matrix-momentum based methods. We will also briefly explain an extension of the original framework for analysing the case where training examples are sampled with repetition.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Based on a statistical mechanics approach, we develop a method for approximately computing average case learning curves and their sample fluctuations for Gaussian process regression models. We give examples for the Wiener process and show that universal relations (that are independent of the input distribution) between error measures can be derived.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A novel approach, based on statistical mechanics, to analyze typical performance of optimum code-division multiple-access (CDMA) multiuser detectors is reviewed. A `black-box' view ot the basic CDMA channel is introduced, based on which the CDMA multiuser detection problem is regarded as a `learning-from-examples' problem of the `binary linear perceptron' in the neural network literature. Adopting Bayes framework, analysis of the performance of the optimum CDMA multiuser detectors is reduced to evaluation of the average of the cumulant generating function of a relevant posterior distribution. The evaluation of the average cumulant generating function is done, based on formal analogy with a similar calculation appearing in the spin glass theory in statistical mechanics, by making use of the replica method, a method developed in the spin glass theory.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A practical Bayesian approach for inference in neural network models has been available for ten years, and yet it is not used frequently in medical applications. In this chapter we show how both regularisation and feature selection can bring significant benefits in diagnostic tasks through two case studies: heart arrhythmia classification based on ECG data and the prognosis of lupus. In the first of these, the number of variables was reduced by two thirds without significantly affecting performance, while in the second, only the Bayesian models had an acceptable accuracy. In both tasks, neural networks outperformed other pattern recognition approaches.