16 resultados para neural algorithms
Resumo:
The performance of seven minimization algorithms are compared on five neural network problems. These include a variable-step-size algorithm, conjugate gradient, and several methods with explicit analytic or numerical approximations to the Hessian.
Resumo:
Attractor properties of a popular discrete-time neural network model are illustrated through numerical simulations. The most complex dynamics is found to occur within particular ranges of parameters controlling the symmetry and magnitude of the weight matrix. A small network model is observed to produce fixed points, limit cycles, mode-locking, the Ruelle-Takens route to chaos, and the period-doubling route to chaos. Training algorithms for tuning this dynamical behaviour are discussed. Training can be an easy or difficult task, depending whether the problem requires the use of temporal information distributed over long time intervals. Such problems require training algorithms which can handle hidden nodes. The most prominent of these algorithms, back propagation through time, solves the temporal credit assignment problem in a way which can work only if the relevant information is distributed locally in time. The Moving Targets algorithm works for the more general case, but is computationally intensive, and prone to local minima.
Resumo:
An overview of neural networks, covering multilayer perceptrons, radial basis functions, constructive algorithms, Kohonen and K-means unupervised algorithms, RAMnets, first and second order training methods, and Bayesian regularisation methods.
Resumo:
Introductory accounts of artificial neural networks often rely for motivation on analogies with models of information processing in biological networks. One limitation of such an approach is that it offers little guidance on how to find optimal algorithms, or how to verify the correct performance of neural network systems. A central goal of this paper is to draw attention to a quite different viewpoint in which neural networks are seen as algorithms for statistical pattern recognition based on a principled, i.e. theoretically well-founded, framework. We illustrate the concept of a principled viewpoint by considering a specific issue concerned with the interpretation of the outputs of a trained network. Finally, we discuss the relevance of such an approach to the issue of the validation and verification of neural network systems.
Resumo:
We present an analytic solution to the problem of on-line gradient-descent learning for two-layer neural networks with an arbitrary number of hidden units in both teacher and student networks. The technique, demonstrated here for the case of adaptive input-to-hidden weights, becomes exact as the dimensionality of the input space increases.
Resumo:
Neural networks are usually curved statistical models. They do not have finite dimensional sufficient statistics, so on-line learning on the model itself inevitably loses information. In this paper we propose a new scheme for training curved models, inspired by the ideas of ancillary statistics and adaptive critics. At each point estimate an auxiliary flat model (exponential family) is built to locally accommodate both the usual statistic (tangent to the model) and an ancillary statistic (normal to the model). The auxiliary model plays a role in determining credit assignment analogous to that played by an adaptive critic in solving temporal problems. The method is illustrated with the Cauchy model and the algorithm is proved to be asymptotically efficient.
Resumo:
This thesis is a study of the generation of topographic mappings - dimension reducing transformations of data that preserve some element of geometric structure - with feed-forward neural networks. As an alternative to established methods, a transformational variant of Sammon's method is proposed, where the projection is effected by a radial basis function neural network. This approach is related to the statistical field of multidimensional scaling, and from that the concept of a 'subjective metric' is defined, which permits the exploitation of additional prior knowledge concerning the data in the mapping process. This then enables the generation of more appropriate feature spaces for the purposes of enhanced visualisation or subsequent classification. A comparison with established methods for feature extraction is given for data taken from the 1992 Research Assessment Exercise for higher educational institutions in the United Kingdom. This is a difficult high-dimensional dataset, and illustrates well the benefit of the new topographic technique. A generalisation of the proposed model is considered for implementation of the classical multidimensional scaling (¸mds}) routine. This is related to Oja's principal subspace neural network, whose learning rule is shown to descend the error surface of the proposed ¸mds model. Some of the technical issues concerning the design and training of topographic neural networks are investigated. It is shown that neural network models can be less sensitive to entrapment in the sub-optimal global minima that badly affect the standard Sammon algorithm, and tend to exhibit good generalisation as a result of implicit weight decay in the training process. It is further argued that for ideal structure retention, the network transformation should be perfectly smooth for all inter-data directions in input space. Finally, there is a critique of optimisation techniques for topographic mappings, and a new training algorithm is proposed. A convergence proof is given, and the method is shown to produce lower-error mappings more rapidly than previous algorithms.
Resumo:
It is well known that one of the obstacles to effective forecasting of exchange rates is heteroscedasticity (non-stationary conditional variance). The autoregressive conditional heteroscedastic (ARCH) model and its variants have been used to estimate a time dependent variance for many financial time series. However, such models are essentially linear in form and we can ask whether a non-linear model for variance can improve results just as non-linear models (such as neural networks) for the mean have done. In this paper we consider two neural network models for variance estimation. Mixture Density Networks (Bishop 1994, Nix and Weigend 1994) combine a Multi-Layer Perceptron (MLP) and a mixture model to estimate the conditional data density. They are trained using a maximum likelihood approach. However, it is known that maximum likelihood estimates are biased and lead to a systematic under-estimate of variance. More recently, a Bayesian approach to parameter estimation has been developed (Bishop and Qazaz 1996) that shows promise in removing the maximum likelihood bias. However, up to now, this model has not been used for time series prediction. Here we compare these algorithms with two other models to provide benchmark results: a linear model (from the ARIMA family), and a conventional neural network trained with a sum-of-squares error function (which estimates the conditional mean of the time series with a constant variance noise model). This comparison is carried out on daily exchange rate data for five currencies.
Resumo:
We present a framework for calculating globally optimal parameters, within a given time frame, for on-line learning in multilayer neural networks. We demonstrate the capability of this method by computing optimal learning rates in typical learning scenarios. A similar treatment allows one to determine the relevance of related training algorithms based on modifications to the basic gradient descent rule as well as to compare different training methods.
Resumo:
A method for calculating the globally optimal learning rate in on-line gradient-descent training of multilayer neural networks is presented. The method is based on a variational approach which maximizes the decrease in generalization error over a given time frame. We demonstrate the method by computing optimal learning rates in typical learning scenarios. The method can also be employed when different learning rates are allowed for different parameter vectors as well as to determine the relevance of related training algorithms based on modifications to the basic gradient descent rule.
Resumo:
On-line learning is one of the most powerful and commonly used techniques for training large layered networks and has been used successfully in many real-world applications. Traditional analytical methods have been recently complemented by ones from statistical physics and Bayesian statistics. This powerful combination of analytical methods provides more insight and deeper understanding of existing algorithms and leads to novel and principled proposals for their improvement. This book presents a coherent picture of the state-of-the-art in the theoretical analysis of on-line learning. An introduction relates the subject to other developments in neural networks and explains the overall picture. Surveys by leading experts in the field combine new and established material and enable non-experts to learn more about the techniques and methods used. This book, the first in the area, provides a comprehensive view of the subject and will be welcomed by mathematicians, scientists and engineers, whether in industry or academia.
Resumo:
We derive a mean field algorithm for binary classification with Gaussian processes which is based on the TAP approach originally proposed in Statistical Physics of disordered systems. The theory also yields an approximate leave-one-out estimator for the generalization error which is computed with no extra computational cost. We show that from the TAP approach, it is possible to derive both a simpler 'naive' mean field theory and support vector machines (SVM) as limiting cases. For both mean field algorithms and support vectors machines, simulation results for three small benchmark data sets are presented. They show 1. that one may get state of the art performance by using the leave-one-out estimator for model selection and 2. the built-in leave-one-out estimators are extremely precise when compared to the exact leave-one-out estimate. The latter result is a taken as a strong support for the internal consistency of the mean field approach.
Resumo:
Are the learning procedures of genetic algorithms (GAs) able to generate optimal architectures for artificial neural networks (ANNs) in high frequency data? In this experimental study,GAs are used to identify the best architecture for ANNs. Additional learning is undertaken by the ANNs to forecast daily excess stock returns. No ANN architectures were able to outperform a random walk,despite the finding of non-linearity in the excess returns. This failure is attributed to the absence of suitable ANN structures and further implies that researchers need to be cautious when making inferences from ANN results that use high frequency data.
Resumo:
Introductory accounts of artificial neural networks often rely for motivation on analogies with models of information processing in biological networks. One limitation of such an approach is that it offers little guidance on how to find optimal algorithms, or how to verify the correct performance of neural network systems. A central goal of this paper is to draw attention to a quite different viewpoint in which neural networks are seen as algorithms for statistical pattern recognition based on a principled, i.e. theoretically well-founded, framework. We illustrate the concept of a principled viewpoint by considering a specific issue concerned with the interpretation of the outputs of a trained network. Finally, we discuss the relevance of such an approach to the issue of the validation and verification of neural network systems.
Resumo:
The assessment of the reliability of systems which learn from data is a key issue to investigate thoroughly before the actual application of information processing techniques to real-world problems. Over the recent years Gaussian processes and Bayesian neural networks have come to the fore and in this thesis their generalisation capabilities are analysed from theoretical and empirical perspectives. Upper and lower bounds on the learning curve of Gaussian processes are investigated in order to estimate the amount of data required to guarantee a certain level of generalisation performance. In this thesis we analyse the effects on the bounds and the learning curve induced by the smoothness of stochastic processes described by four different covariance functions. We also explain the early, linearly-decreasing behaviour of the curves and we investigate the asymptotic behaviour of the upper bounds. The effect of the noise and the characteristic lengthscale of the stochastic process on the tightness of the bounds are also discussed. The analysis is supported by several numerical simulations. The generalisation error of a Gaussian process is affected by the dimension of the input vector and may be decreased by input-variable reduction techniques. In conventional approaches to Gaussian process regression, the positive definite matrix estimating the distance between input points is often taken diagonal. In this thesis we show that a general distance matrix is able to estimate the effective dimensionality of the regression problem as well as to discover the linear transformation from the manifest variables to the hidden-feature space, with a significant reduction of the input dimension. Numerical simulations confirm the significant superiority of the general distance matrix with respect to the diagonal one.In the thesis we also present an empirical investigation of the generalisation errors of neural networks trained by two Bayesian algorithms, the Markov Chain Monte Carlo method and the evidence framework; the neural networks have been trained on the task of labelling segmented outdoor images.