2 resultados para Bivariate geometric distributions
em Aston University Research Archive
Resumo:
A family of measurements of generalisation is proposed for estimators of continuous distributions. In particular, they apply to neural network learning rules associated with continuous neural networks. The optimal estimators (learning rules) in this sense are Bayesian decision methods with information divergence as loss function. The Bayesian framework guarantees internal coherence of such measurements, while the information geometric loss function guarantees invariance. The theoretical solution for the optimal estimator is derived by a variational method. It is applied to the family of Gaussian distributions and the implications are discussed. This is one in a series of technical reports on this topic; it generalises the results of ¸iteZhu95:prob.discrete to continuous distributions and serve as a concrete example of a larger picture ¸iteZhu95:generalisation.
Resumo:
Neural networks can be regarded as statistical models, and can be analysed in a Bayesian framework. Generalisation is measured by the performance on independent test data drawn from the same distribution as the training data. Such performance can be quantified by the posterior average of the information divergence between the true and the model distributions. Averaging over the Bayesian posterior guarantees internal coherence; Using information divergence guarantees invariance with respect to representation. The theory generalises the least mean squares theory for linear Gaussian models to general problems of statistical estimation. The main results are: (1)~the ideal optimal estimate is always given by average over the posterior; (2)~the optimal estimate within a computational model is given by the projection of the ideal estimate to the model. This incidentally shows some currently popular methods dealing with hyperpriors are in general unnecessary and misleading. The extension of information divergence to positive normalisable measures reveals a remarkable relation between the dlt dual affine geometry of statistical manifolds and the geometry of the dual pair of Banach spaces Ld and Ldd. It therefore offers conceptual simplification to information geometry. The general conclusion on the issue of evaluating neural network learning rules and other statistical inference methods is that such evaluations are only meaningful under three assumptions: The prior P(p), describing the environment of all the problems; the divergence Dd, specifying the requirement of the task; and the model Q, specifying available computing resources.