932 resultados para Learning Bayesian Networks


Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper reviews some basic issues and methods involved in using neural networks to respond in a desired fashion to a temporally-varying environment. Some popular network models and training methods are introduced. A speech recognition example is then used to illustrate the central difficulty of temporal data processing: learning to notice and remember relevant contextual information. Feedforward network methods are applicable to cases where this problem is not severe. The application of these methods are explained and applications are discussed in the areas of pure mathematics, chemical and physical systems, and economic systems. A more powerful but less practical algorithm for temporal problems, the moving targets algorithm, is sketched and discussed. For completeness, a few remarks are made on reinforcement learning.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper reports preliminary progress on a principled approach to modelling nonstationary phenomena using neural networks. We are concerned with both parameter and model order complexity estimation. The basic methodology assumes a Bayesian foundation. However to allow the construction of pragmatic models, successive approximations have to be made to permit computational tractibility. The lowest order corresponds to the (Extended) Kalman filter approach to parameter estimation which has already been applied to neural networks. We illustrate some of the deficiencies of the existing approaches and discuss our preliminary generalisations, by considering the application to nonstationary time series.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Neural network learning rules can be viewed as statistical estimators. They should be studied in Bayesian framework even if they are not Bayesian estimators. Generalisation should be measured by the divergence between the true distribution and the estimated distribution. Information divergences are invariant measurements of the divergence between two distributions. The posterior average information divergence is used to measure the generalisation ability of a network. The optimal estimators for multinomial distributions with Dirichlet priors are studied in detail. This confirms that the definition is compatible with intuition. The results also show that many commonly used methods can be put under this unified framework, by assume special priors and special divergences.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Neural networks can be regarded as statistical models, and can be analysed in a Bayesian framework. Generalisation is measured by the performance on independent test data drawn from the same distribution as the training data. Such performance can be quantified by the posterior average of the information divergence between the true and the model distributions. Averaging over the Bayesian posterior guarantees internal coherence; Using information divergence guarantees invariance with respect to representation. The theory generalises the least mean squares theory for linear Gaussian models to general problems of statistical estimation. The main results are: (1)~the ideal optimal estimate is always given by average over the posterior; (2)~the optimal estimate within a computational model is given by the projection of the ideal estimate to the model. This incidentally shows some currently popular methods dealing with hyperpriors are in general unnecessary and misleading. The extension of information divergence to positive normalisable measures reveals a remarkable relation between the dlt dual affine geometry of statistical manifolds and the geometry of the dual pair of Banach spaces Ld and Ldd. It therefore offers conceptual simplification to information geometry. The general conclusion on the issue of evaluating neural network learning rules and other statistical inference methods is that such evaluations are only meaningful under three assumptions: The prior P(p), describing the environment of all the problems; the divergence Dd, specifying the requirement of the task; and the model Q, specifying available computing resources.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Neural networks are statistical models and learning rules are estimators. In this paper a theory for measuring generalisation is developed by combining Bayesian decision theory with information geometry. The performance of an estimator is measured by the information divergence between the true distribution and the estimate, averaged over the Bayesian posterior. This unifies the majority of error measures currently in use. The optimal estimators also reveal some intricate interrelationships among information geometry, Banach spaces and sufficient statistics.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Two probabilistic interpretations of the n-tuple recognition method are put forward in order to allow this technique to be analysed with the same Bayesian methods used in connection with other neural network models. Elementary demonstrations are then given of the use of maximum likelihood and maximum entropy methods for tuning the model parameters and assisting their interpretation. One of the models can be used to illustrate the significance of overlapping n-tuple samples with respect to correlations in the patterns.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The problem of evaluating different learning rules and other statistical estimators is analysed. A new general theory of statistical inference is developed by combining Bayesian decision theory with information geometry. It is coherent and invariant. For each sample a unique ideal estimate exists and is given by an average over the posterior. An optimal estimate within a model is given by a projection of the ideal estimate. The ideal estimate is a sufficient statistic of the posterior, so practical learning rules are functions of the ideal estimator. If the sole purpose of learning is to extract information from the data, the learning rule must also approximate the ideal estimator. This framework is applicable to both Bayesian and non-Bayesian methods, with arbitrary statistical models, and to supervised, unsupervised and reinforcement learning schemes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the Bayesian framework, predictions for a regression problem are expressed in terms of a distribution of output values. The mode of this distribution corresponds to the most probable output, while the uncertainty associated with the predictions can conveniently be expressed in terms of error bars. In this paper we consider the evaluation of error bars in the context of the class of generalized linear regression models. We provide insights into the dependence of the error bars on the location of the data points and we derive an upper bound on the true error bars in terms of the contributions from individual data points which are themselves easily evaluated.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We propose a Bayesian framework for regression problems, which covers areas which are usually dealt with by function approximation. An online learning algorithm is derived which solves regression problems with a Kalman filter. Its solution always improves with increasing model complexity, without the risk of over-fitting. In the infinite dimension limit it approaches the true Bayesian posterior. The issues of prior selection and over-fitting are also discussed, showing that some of the commonly held beliefs are misleading. The practical implementation is summarised. Simulations using 13 popular publicly available data sets are used to demonstrate the method and highlight important issues concerning the choice of priors.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We investigate the dependence of Bayesian error bars on the distribution of data in input space. For generalized linear regression models we derive an upper bound on the error bars which shows that, in the neighbourhood of the data points, the error bars are substantially reduced from their prior values. For regions of high data density we also show that the contribution to the output variance due to the uncertainty in the weights can exhibit an approximate inverse proportionality to the probability density. Empirical results support these conclusions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Neural networks are usually curved statistical models. They do not have finite dimensional sufficient statistics, so on-line learning on the model itself inevitably loses information. In this paper we propose a new scheme for training curved models, inspired by the ideas of ancillary statistics and adaptive critics. At each point estimate an auxiliary flat model (exponential family) is built to locally accommodate both the usual statistic (tangent to the model) and an ancillary statistic (normal to the model). The auxiliary model plays a role in determining credit assignment analogous to that played by an adaptive critic in solving temporal problems. The method is illustrated with the Cauchy model and the algorithm is proved to be asymptotically efficient.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Neural networks have often been motivated by superficial analogy with biological nervous systems. Recently, however, it has become widely recognised that the effective application of neural networks requires instead a deeper understanding of the theoretical foundations of these models. Insight into neural networks comes from a number of fields including statistical pattern recognition, computational learning theory, statistics, information geometry and statistical mechanics. As an illustration of the importance of understanding the theoretical basis for neural network models, we consider their application to the solution of multi-valued inverse problems. We show how a naive application of the standard least-squares approach can lead to very poor results, and how an appreciation of the underlying statistical goals of the modelling process allows the development of a more general and more powerful formalism which can tackle the problem of multi-modality.