61 resultados para learning networks
em Aston University Research Archive
Resumo:
The performance of feed-forward neural networks in real applications can be often be improved significantly if use is made of a-priori information. For interpolation problems this prior knowledge frequently includes smoothness requirements on the network mapping, and can be imposed by the addition to the error function of suitable regularization terms. The new error function, however, now depends on the derivatives of the network mapping, and so the standard back-propagation algorithm cannot be applied. In this paper, we derive a computationally efficient learning algorithm, for a feed-forward network of arbitrary topology, which can be used to minimize the new error function. Networks having a single hidden layer, for which the learning algorithm simplifies, are treated as a special case.
Resumo:
We consider the problem of on-line gradient descent learning for general two-layer neural networks. An analytic solution is presented and used to investigate the role of the learning rate in controlling the evolution and convergence of the learning process.
Resumo:
We present an analytic solution to the problem of on-line gradient-descent learning for two-layer neural networks with an arbitrary number of hidden units in both teacher and student networks. The technique, demonstrated here for the case of adaptive input-to-hidden weights, becomes exact as the dimensionality of the input space increases.
Resumo:
An adaptive back-propagation algorithm is studied and compared with gradient descent (standard back-propagation) for on-line learning in two-layer neural networks with an arbitrary number of hidden units. Within a statistical mechanics framework, both numerical studies and a rigorous analysis show that the adaptive back-propagation method results in faster training by breaking the symmetry between hidden units more efficiently and by providing faster convergence to optimal generalization than gradient descent.
Resumo:
We study the effect of two types of noise, data noise and model noise, in an on-line gradient-descent learning scenario for general two-layer student network with an arbitrary number of hidden units. Training examples are randomly drawn input vectors labeled by a two-layer teacher network with an arbitrary number of hidden units. Data is then corrupted by Gaussian noise affecting either the output or the model itself. We examine the effect of both types of noise on the evolution of order parameters and the generalization error in various phases of the learning process.
Resumo:
We complement recent advances in thermodynamic limit analyses of mean on-line gradient descent learning dynamics in multi-layer networks by calculating fluctuations possessed by finite dimensional systems. Fluctuations from the mean dynamics are largest at the onset of specialisation as student hidden unit weight vectors begin to imitate specific teacher vectors, increasing with the degree of symmetry of the initial conditions. In light of this, we include a term to stimulate asymmetry in the learning process, which typically also leads to a significant decrease in training time.
Resumo:
We study the effect of regularization in an on-line gradient-descent learning scenario for a general two-layer student network with an arbitrary number of hidden units. Training examples are randomly drawn input vectors labelled by a two-layer teacher network with an arbitrary number of hidden units which may be corrupted by Gaussian output noise. We examine the effect of weight decay regularization on the dynamical evolution of the order parameters and generalization error in various phases of the learning process, in both noiseless and noisy scenarios.
Resumo:
We present a framework for calculating globally optimal parameters, within a given time frame, for on-line learning in multilayer neural networks. We demonstrate the capability of this method by computing optimal learning rates in typical learning scenarios. A similar treatment allows one to determine the relevance of related training algorithms based on modifications to the basic gradient descent rule as well as to compare different training methods.
Resumo:
We present a method for determining the globally optimal on-line learning rule for a soft committee machine under a statistical mechanics framework. This rule maximizes the total reduction in generalization error over the whole learning process. A simple example demonstrates that the locally optimal rule, which maximizes the rate of decrease in generalization error, may perform poorly in comparison.
Resumo:
The influence of biases on the learning dynamics of a two-layer neural network, a normalized soft-committee machine, is studied for on-line gradient descent learning. Within a statistical mechanics framework, numerical studies show that the inclusion of adjustable biases dramatically alters the learning dynamics found previously. The symmetric phase which has often been predominant in the original model all but disappears for a non-degenerate bias task. The extended model furthermore exhibits a much richer dynamical behavior, e.g. attractive suboptimal symmetric phases even for realizable cases and noiseless data.
Resumo:
On-line learning is examined for the radial basis function network, an important and practical type of neural network. The evolution of generalization error is calculated within a framework which allows the phenomena of the learning process, such as the specialization of the hidden units, to be analyzed. The distinct stages of training are elucidated, and the role of the learning rate described. The three most important stages of training, the symmetric phase, the symmetry-breaking phase, and the convergence phase, are analyzed in detail; the convergence phase analysis allows derivation of maximal and optimal learning rates. As well as finding the evolution of the mean system parameters, the variances of these parameters are derived and shown to be typically small. Finally, the analytic results are strongly confirmed by simulations.
Resumo:
An adaptive back-propagation algorithm parameterized by an inverse temperature 1/T is studied and compared with gradient descent (standard back-propagation) for on-line learning in two-layer neural networks with an arbitrary number of hidden units. Within a statistical mechanics framework, we analyse these learning algorithms in both the symmetric and the convergence phase for finite learning rates in the case of uncorrelated teachers of similar but arbitrary length T. These analyses show that adaptive back-propagation results generally in faster training by breaking the symmetry between hidden units more efficiently and by providing faster convergence to optimal generalization than gradient descent.
Resumo:
An analytic investigation of the average case learning and generalization properties of Radial Basis Function Networks (RBFs) is presented, utilising on-line gradient descent as the learning rule. The analytic method employed allows both the calculation of generalization error and the examination of the internal dynamics of the network. The generalization error and internal dynamics are then used to examine the role of the learning rate and the specialization of the hidden units, which gives insight into decreasing the time required for training. The realizable and over-realizable cases are studied in detail; the phase of learning in which the hidden units are unspecialized (symmetric phase) and the phase in which asymptotic convergence occurs are analyzed, and their typical properties found. Finally, simulations are performed which strongly confirm the analytic results.
Resumo:
A method for calculating the globally optimal learning rate in on-line gradient-descent training of multilayer neural networks is presented. The method is based on a variational approach which maximizes the decrease in generalization error over a given time frame. We demonstrate the method by computing optimal learning rates in typical learning scenarios. The method can also be employed when different learning rates are allowed for different parameter vectors as well as to determine the relevance of related training algorithms based on modifications to the basic gradient descent rule.
Resumo:
On-line learning is one of the most powerful and commonly used techniques for training large layered networks and has been used successfully in many real-world applications. Traditional analytical methods have been recently complemented by ones from statistical physics and Bayesian statistics. This powerful combination of analytical methods provides more insight and deeper understanding of existing algorithms and leads to novel and principled proposals for their improvement. This book presents a coherent picture of the state-of-the-art in the theoretical analysis of on-line learning. An introduction relates the subject to other developments in neural networks and explains the overall picture. Surveys by leading experts in the field combine new and established material and enable non-experts to learn more about the techniques and methods used. This book, the first in the area, provides a comprehensive view of the subject and will be welcomed by mathematicians, scientists and engineers, whether in industry or academia.