26 resultados para Error Function
em Aston University Research Archive
Resumo:
Regression problems are concerned with predicting the values of one or more continuous quantities, given the values of a number of input variables. For virtually every application of regression, however, it is also important to have an indication of the uncertainty in the predictions. Such uncertainties are expressed in terms of the error bars, which specify the standard deviation of the distribution of predictions about the mean. Accurate estimate of error bars is of practical importance especially when safety and reliability is an issue. The Bayesian view of regression leads naturally to two contributions to the error bars. The first arises from the intrinsic noise on the target data, while the second comes from the uncertainty in the values of the model parameters which manifests itself in the finite width of the posterior distribution over the space of these parameters. The Hessian matrix which involves the second derivatives of the error function with respect to the weights is needed for implementing the Bayesian formalism in general and estimating the error bars in particular. A study of different methods for evaluating this matrix is given with special emphasis on the outer product approximation method. The contribution of the uncertainty in model parameters to the error bars is a finite data size effect, which becomes negligible as the number of data points in the training set increases. A study of this contribution is given in relation to the distribution of data in input space. It is shown that the addition of data points to the training set can only reduce the local magnitude of the error bars or leave it unchanged. Using the asymptotic limit of an infinite data set, it is shown that the error bars have an approximate relation to the density of data in input space.
Resumo:
The performance of feed-forward neural networks in real applications can be often be improved significantly if use is made of a-priori information. For interpolation problems this prior knowledge frequently includes smoothness requirements on the network mapping, and can be imposed by the addition to the error function of suitable regularization terms. The new error function, however, now depends on the derivatives of the network mapping, and so the standard back-propagation algorithm cannot be applied. In this paper, we derive a computationally efficient learning algorithm, for a feed-forward network of arbitrary topology, which can be used to minimize the new error function. Networks having a single hidden layer, for which the learning algorithm simplifies, are treated as a special case.
Resumo:
A simple method for training the dynamical behavior of a neural network is derived. It is applicable to any training problem in discrete-time networks with arbitrary feedback. The algorithm resembles back-propagation in that an error function is minimized using a gradient-based method, but the optimization is carried out in the hidden part of state space either instead of, or in addition to weight space. Computational results are presented for some simple dynamical training problems, one of which requires response to a signal 100 time steps in the past.
Resumo:
Minimization of a sum-of-squares or cross-entropy error function leads to network outputs which approximate the conditional averages of the target data, conditioned on the input vector. For classifications problems, with a suitably chosen target coding scheme, these averages represent the posterior probabilities of class membership, and so can be regarded as optimal. For problems involving the prediction of continuous variables, however, the conditional averages provide only a very limited description of the properties of the target variables. This is particularly true for problems in which the mapping to be learned is multi-valued, as often arises in the solution of inverse problems, since the average of several correct target values is not necessarily itself a correct value. In order to obtain a complete description of the data, for the purposes of predicting the outputs corresponding to new input vectors, we must model the conditional probability distribution of the target data, again conditioned on the input vector. In this paper we introduce a new class of network models obtained by combining a conventional neural network with a mixture density model. The complete system is called a Mixture Density Network, and can in principle represent arbitrary conditional probability distributions in the same way that a conventional neural network can represent arbitrary functions. We demonstrate the effectiveness of Mixture Density Networks using both a toy problem and a problem involving robot inverse kinematics.
Resumo:
It is well known that the addition of noise to the input data of a neural network during training can, in some circumstances, lead to significant improvements in generalization performance. Previous work has shown that such training with noise is equivalent to a form of regularization in which an extra term is added to the error function. However, the regularization term, which involves second derivatives of the error function, is not bounded below, and so can lead to difficulties if used directly in a learning algorithm based on error minimization. In this paper we show that, for the purposes of network training, the regularization term can be reduced to a positive definite form which involves only first derivatives of the network mapping. For a sum-of-squares error function, the regularization term belongs to the class of generalized Tikhonov regularizers. Direct minimization of the regularized error function provides a practical alternative to training with noise.
Resumo:
Mixture Density Networks (MDNs) are a well-established method for modelling the conditional probability density which is useful for complex multi-valued functions where regression methods (such as MLPs) fail. In this paper we extend earlier research of a regularisation method for a special case of MDNs to the general case using evidence based regularisation and we show how the Hessian of the MDN error function can be evaluated using R-propagation. The method is tested on two data sets and compared with early stopping.
Resumo:
In most treatments of the regression problem it is assumed that the distribution of target data can be described by a deterministic function of the inputs, together with additive Gaussian noise having constant variance. The use of maximum likelihood to train such models then corresponds to the minimization of a sum-of-squares error function. In many applications a more realistic model would allow the noise variance itself to depend on the input variables. However, the use of maximum likelihood to train such models would give highly biased results. In this paper we show how a Bayesian treatment can allow for an input-dependent variance while overcoming the bias of maximum likelihood.
Resumo:
It is well known that one of the obstacles to effective forecasting of exchange rates is heteroscedasticity (non-stationary conditional variance). The autoregressive conditional heteroscedastic (ARCH) model and its variants have been used to estimate a time dependent variance for many financial time series. However, such models are essentially linear in form and we can ask whether a non-linear model for variance can improve results just as non-linear models (such as neural networks) for the mean have done. In this paper we consider two neural network models for variance estimation. Mixture Density Networks (Bishop 1994, Nix and Weigend 1994) combine a Multi-Layer Perceptron (MLP) and a mixture model to estimate the conditional data density. They are trained using a maximum likelihood approach. However, it is known that maximum likelihood estimates are biased and lead to a systematic under-estimate of variance. More recently, a Bayesian approach to parameter estimation has been developed (Bishop and Qazaz 1996) that shows promise in removing the maximum likelihood bias. However, up to now, this model has not been used for time series prediction. Here we compare these algorithms with two other models to provide benchmark results: a linear model (from the ARIMA family), and a conventional neural network trained with a sum-of-squares error function (which estimates the conditional mean of the time series with a constant variance noise model). This comparison is carried out on daily exchange rate data for five currencies.
Resumo:
Training Mixture Density Network (MDN) configurations within the NETLAB framework takes time due to the nature of the computation of the error function and the gradient of the error function. By optimising the computation of these functions, so that gradient information is computed in parameter space, training time is decreased by at least a factor of sixty for the example given. Decreased training time increases the spectrum of problems to which MDNs can be practically applied making the MDN framework an attractive method to the applied problem solver.
Resumo:
Obtaining wind vectors over the ocean is important for weather forecasting and ocean modelling. Several satellite systems used operationally by meteorological agencies utilise scatterometers to infer wind vectors over the oceans. In this paper we present the results of using novel neural network based techniques to estimate wind vectors from such data. The problem is partitioned into estimating wind speed and wind direction. Wind speed is modelled using a multi-layer perceptron (MLP) and a sum of squares error function. Wind direction is a periodic variable and a multi-valued function for a given set of inputs; a conventional MLP fails at this task, and so we model the full periodic probability density of direction conditioned on the satellite derived inputs using a Mixture Density Network (MDN) with periodic kernel functions. A committee of the resulting MDNs is shown to improve the results.
Resumo:
Obtaining wind vectors over the ocean is important for weather forecasting and ocean modelling. Several satellite systems used operationally by meteorological agencies utilise scatterometers to infer wind vectors over the oceans. In this paper we present the results of using novel neural network based techniques to estimate wind vectors from such data. The problem is partitioned into estimating wind speed and wind direction. Wind speed is modelled using a multi-layer perceptron (MLP) and a sum of squares error function. Wind direction is a periodic variable and a multi-valued function for a given set of inputs; a conventional MLP fails at this task, and so we model the full periodic probability density of direction conditioned on the satellite derived inputs using a Mixture Density Network (MDN) with periodic kernel functions. A committee of the resulting MDNs is shown to improve the results.
Resumo:
The kinematic mapping of a rigid open-link manipulator is a homomorphism between Lie groups. The homomorphisrn has solution groups that act on an inverse kinematic solution element. A canonical representation of solution group operators that act on a solution element of three and seven degree-of-freedom (do!) dextrous manipulators is determined by geometric analysis. Seven canonical solution groups are determined for the seven do! Robotics Research K-1207 and Hollerbach arms. The solution element of a dextrous manipulator is a collection of trivial fibre bundles with solution fibres homotopic to the Torus. If fibre solutions are parameterised by a scalar, a direct inverse funct.ion that maps the scalar and Cartesian base space coordinates to solution element fibre coordinates may be defined. A direct inverse pararneterisation of a solution element may be approximated by a local linear map generated by an inverse augmented Jacobian correction of a linear interpolation. The action of canonical solution group operators on a local linear approximation of the solution element of inverse kinematics of dextrous manipulators generates cyclical solutions. The solution representation is proposed as a model of inverse kinematic transformations in primate nervous systems. Simultaneous calibration of a composition of stereo-camera and manipulator kinematic models is under-determined by equi-output parameter groups in the composition of stereo-camera and Denavit Hartenberg (DH) rnodels. An error measure for simultaneous calibration of a composition of models is derived and parameter subsets with no equi-output groups are determined by numerical experiments to simultaneously calibrate the composition of homogeneous or pan-tilt stereo-camera with DH models. For acceleration of exact Newton second-order re-calibration of DH parameters after a sequential calibration of stereo-camera and DH parameters, an optimal numerical evaluation of DH matrix first order and second order error derivatives with respect to a re-calibration error function is derived, implemented and tested. A distributed object environment for point and click image-based tele-command of manipulators and stereo-cameras is specified and implemented that supports rapid prototyping of numerical experiments in distributed system control. The environment is validated by a hierarchical k-fold cross validated calibration to Cartesian space of a radial basis function regression correction of an affine stereo model. Basic design and performance requirements are defined for scalable virtual micro-kernels that broker inter-Java-virtual-machine remote method invocations between components of secure manageable fault-tolerant open distributed agile Total Quality Managed ISO 9000+ conformant Just in Time manufacturing systems.
Resumo:
This thesis seeks to describe the development of an inexpensive and efficient clustering technique for multivariate data analysis. The technique starts from a multivariate data matrix and ends with graphical representation of the data and pattern recognition discriminant function. The technique also results in distances frequency distribution that might be useful in detecting clustering in the data or for the estimation of parameters useful in the discrimination between the different populations in the data. The technique can also be used in feature selection. The technique is essentially for the discovery of data structure by revealing the component parts of the data. lhe thesis offers three distinct contributions for cluster analysis and pattern recognition techniques. The first contribution is the introduction of transformation function in the technique of nonlinear mapping. The second contribution is the us~ of distances frequency distribution instead of distances time-sequence in nonlinear mapping, The third contribution is the formulation of a new generalised and normalised error function together with its optimal step size formula for gradient method minimisation. The thesis consists of five chapters. The first chapter is the introduction. The second chapter describes multidimensional scaling as an origin of nonlinear mapping technique. The third chapter describes the first developing step in the technique of nonlinear mapping that is the introduction of "transformation function". The fourth chapter describes the second developing step of the nonlinear mapping technique. This is the use of distances frequency distribution instead of distances time-sequence. The chapter also includes the new generalised and normalised error function formulation. Finally, the fifth chapter, the conclusion, evaluates all developments and proposes a new program. for cluster analysis and pattern recognition by integrating all the new features.
Resumo:
Background/aim: The technique of photoretinoscopy is unique in being able to measure the dynamics of the oculomotor system (ocular accommodation, vergence, and pupil size) remotely (working distance typically 1 metre) and objectively in both eyes simultaneously. The aim af this study was to evaluate clinically the measurement of refractive error by a recent commercial photoretinoscopic device, the PowerRefractor (PlusOptiX, Germany). Method: The validity and repeatability of the PowerRefractor was compared to: subjective (non-cycloplegic) refraction on 100 adult subjects (mean age 23.8 (SD 5.7) years) and objective autarefractian (Shin-Nippon SRW-5000, Japan) on 150 subjects (20.1 (4.2) years). Repeatability was assessed by examining the differences between autorefractor readings taken from each eye and by re-measuring the objective prescription of 100 eyes at a subsequent session. Results: On average the PowerRefractor prescription was not significantly different from the subjective refraction, although quite variable (difference -0.05 (0.63) D, p = 0.41) and more negative than the SRW-5000 prescription (by -0.20 (0.72) D, p<0.001). There was no significant bias in the accuracy of the instrument with regard to the type or magnitude of refractive error. The PowerRefractor was found to be repeatable over the prescription range of -8.75D to +4.00D (mean spherical equivalent) examined. Conclusion: The PowerRefractor is a useful objective screening instrument and because of its remote and rapid measurement of both eyes simultaneously is able to assess the oculomotor response in a variety of unrestricted viewing conditions and patient types.
Resumo:
On-line learning is examined for the radial basis function network, an important and practical type of neural network. The evolution of generalization error is calculated within a framework which allows the phenomena of the learning process, such as the specialization of the hidden units, to be analyzed. The distinct stages of training are elucidated, and the role of the learning rate described. The three most important stages of training, the symmetric phase, the symmetry-breaking phase, and the convergence phase, are analyzed in detail; the convergence phase analysis allows derivation of maximal and optimal learning rates. As well as finding the evolution of the mean system parameters, the variances of these parameters are derived and shown to be typically small. Finally, the analytic results are strongly confirmed by simulations.