79 resultados para neural networks teaching
Resumo:
For neural networks with a wide class of weight-priors, it can be shown that in the limit of an infinite number of hidden units the prior over functions tends to a Gaussian process. In this paper analytic forms are derived for the covariance function of the Gaussian processes corresponding to networks with sigmoidal and Gaussian hidden units. This allows predictions to be made efficiently using networks with an infinite number of hidden units, and shows that, somewhat paradoxically, it may be easier to compute with infinite networks than finite ones.
Resumo:
Mixture Density Networks (MDNs) are a well-established method for modelling the conditional probability density which is useful for complex multi-valued functions where regression methods (such as MLPs) fail. In this paper we extend earlier research of a regularisation method for a special case of MDNs to the general case using evidence based regularisation and we show how the Hessian of the MDN error function can be evaluated using R-propagation. The method is tested on two data sets and compared with early stopping.
Resumo:
This paper reports the initial results of a joint research project carried out by Aston University and Lloyd's Register to develop a practical method of assessing neural network applications. A set of assessment guidelines for neural network applications were developed and tested on two applications. These case studies showed that it is practical to assess neural networks in a statistical pattern recognition framework. However there is need for more standardisation in neural network technology and a wider takeup of good development practice amongst the neural network community.
Resumo:
An adaptive back-propagation algorithm parameterized by an inverse temperature 1/T is studied and compared with gradient descent (standard back-propagation) for on-line learning in two-layer neural networks with an arbitrary number of hidden units. Within a statistical mechanics framework, we analyse these learning algorithms in both the symmetric and the convergence phase for finite learning rates in the case of uncorrelated teachers of similar but arbitrary length T. These analyses show that adaptive back-propagation results generally in faster training by breaking the symmetry between hidden units more efficiently and by providing faster convergence to optimal generalization than gradient descent.
Resumo:
Radial Basis Function networks with linear outputs are often used in regression problems because they can be substantially faster to train than Multi-layer Perceptrons. For classification problems, the use of linear outputs is less appropriate as the outputs are not guaranteed to represent probabilities. In this paper we show how RBFs with logistic and softmax outputs can be trained efficiently using algorithms derived from Generalised Linear Models. This approach is compared with standard non-linear optimisation algorithms on a number of datasets.
Resumo:
Mixture Density Networks are a principled method to model conditional probability density functions which are non-Gaussian. This is achieved by modelling the conditional distribution for each pattern with a Gaussian Mixture Model for which the parameters are generated by a neural network. This thesis presents a novel method to introduce regularisation in this context for the special case where the mean and variance of the spherical Gaussian Kernels in the mixtures are fixed to predetermined values. Guidelines for how these parameters can be initialised are given, and it is shown how to apply the evidence framework to mixture density networks to achieve regularisation. This also provides an objective stopping criteria that can replace the `early stopping' methods that have previously been used. If the neural network used is an RBF network with fixed centres this opens up new opportunities for improved initialisation of the network weights, which are exploited to start training relatively close to the optimum. The new method is demonstrated on two data sets. The first is a simple synthetic data set while the second is a real life data set, namely satellite scatterometer data used to infer the wind speed and wind direction near the ocean surface. For both data sets the regularisation method performs well in comparison with earlier published results. Ideas on how the constraint on the kernels may be relaxed to allow fully adaptable kernels are presented.
Resumo:
It is generally assumed when using Bayesian inference methods for neural networks that the input data contains no noise or corruption. For real-world (errors in variable) problems this is clearly an unsafe assumption. This paper presents a Bayesian neural network framework which allows for input noise given that some model of the noise process exists. In the limit where this noise process is small and symmetric it is shown, using the Laplace approximation, that there is an additional term to the usual Bayesian error bar which depends on the variance of the input noise process. Further, by treating the true (noiseless) input as a hidden variable and sampling this jointly with the network's weights, using Markov Chain Monte Carlo methods, it is demonstrated that it is possible to infer the unbiassed regression over the noiseless input.
Resumo:
Using methods of Statistical Physics, we investigate the generalization performance of support vector machines (SVMs), which have been recently introduced as a general alternative to neural networks. For nonlinear classification rules, the generalization error saturates on a plateau, when the number of examples is too small to properly estimate the coefficients of the nonlinear part. When trained on simple rules, we find that SVMs overfit only weakly. The performance of SVMs is strongly enhanced, when the distribution of the inputs has a gap in feature space.
Resumo:
In this paper we explore the practical use of neural networks for controlling complex non-linear systems. The system used to demonstrate this approach is a simulation of a gas turbine engine typical of those used to power commercial aircraft. The novelty of the work lies in the requirement for multiple controllers which are used to maintain system variables in safe operating regions as well as governing the engine thrust.
Resumo:
Understanding a complex network's structure holds the key to understanding its function. The physics community has contributed a multitude of methods and analyses to this cross-disciplinary endeavor. Structural features exist on both the microscopic level, resulting from differences between single node properties, and the mesoscopic level resulting from properties shared by groups of nodes. Disentangling the determinants of network structure on these different scales has remained a major, and so far unsolved, challenge. Here we show how multiscale generative probabilistic exponential random graph models combined with efficient, distributive message-passing inference techniques can be used to achieve this separation of scales, leading to improved detection accuracy of latent classes as demonstrated on benchmark problems. It sheds new light on the statistical significance of motif-distributions in neural networks and improves the link-prediction accuracy as exemplified for gene-disease associations in the highly consequential Online Mendelian Inheritance in Man database. © 2011 Reichardt et al.
Resumo:
Satellite-borne scatterometers are used to measure backscattered micro-wave radiation from the ocean surface. This data may be used to infer surface wind vectors where no direct measurements exist. Inherent in this data are outliers owing to aberrations on the water surface and measurement errors within the equipment. We present two techniques for identifying outliers using neural networks; the outliers may then be removed to improve models derived from the data. Firstly the generative topographic mapping (GTM) is used to create a probability density model; data with low probability under the model may be classed as outliers. In the second part of the paper, a sensor model with input-dependent noise is used and outliers are identified based on their probability under this model. GTM was successfully modified to incorporate prior knowledge of the shape of the observation manifold; however, GTM could not learn the double skinned nature of the observation manifold. To learn this double skinned manifold necessitated the use of a sensor model which imposes strong constraints on the mapping. The results using GTM with a fixed noise level suggested the noise level may vary as a function of wind speed. This was confirmed by experiments using a sensor model with input-dependent noise, where the variation in noise is most sensitive to the wind speed input. Both models successfully identified gross outliers with the largest differences between models occurring at low wind speeds. © 2003 Elsevier Science Ltd. All rights reserved.
Resumo:
It is generally assumed when using Bayesian inference methods for neural networks that the input data contains no noise. For real-world (errors in variable) problems this is clearly an unsafe assumption. This paper presents a Bayesian neural network framework which accounts for input noise provided that a model of the noise process exists. In the limit where the noise process is small and symmetric it is shown, using the Laplace approximation, that this method adds an extra term to the usual Bayesian error bar which depends on the variance of the input noise process. Further, by treating the true (noiseless) input as a hidden variable, and sampling this jointly with the network’s weights, using a Markov chain Monte Carlo method, it is demonstrated that it is possible to infer the regression over the noiseless input. This leads to the possibility of training an accurate model of a system using less accurate, or more uncertain, data. This is demonstrated on both the, synthetic, noisy sine wave problem and a real problem of inferring the forward model for a satellite radar backscatter system used to predict sea surface wind vectors.
Resumo:
The main theme of research of this project concerns the study of neutral networks to control uncertain and non-linear control systems. This involves the control of continuous time, discrete time, hybrid and stochastic systems with input, state or output constraints by ensuring good performances. A great part of this project is devoted to the opening of frontiers between several mathematical and engineering approaches in order to tackle complex but very common non-linear control problems. The objectives are: 1. Design and develop procedures for neutral network enhanced self-tuning adaptive non-linear control systems; 2. To design, as a general procedure, neural network generalised minimum variance self-tuning controller for non-linear dynamic plants (Integration of neural network mapping with generalised minimum variance self-tuning controller strategies); 3. To develop a software package to evaluate control system performances using Matlab, Simulink and Neural Network toolbox. An adaptive control algorithm utilising a recurrent network as a model of a partial unknown non-linear plant with unmeasurable state is proposed. Appropriately, it appears that structured recurrent neural networks can provide conveniently parameterised dynamic models for many non-linear systems for use in adaptive control. Properties of static neural networks, which enabled successful design of stable adaptive control in the state feedback case, are also identified. A survey of the existing results is presented which puts them in a systematic framework showing their relation to classical self-tuning adaptive control application of neural control to a SISO/MIMO control. Simulation results demonstrate that the self-tuning design methods may be practically applicable to a reasonably large class of unknown linear and non-linear dynamic control systems.
Resumo:
This paper introduces a mechanism for generating a series of rules that characterize the money price relationship for the USA, defined as the relationship between the rate of growth of the money supply and inflation. Monetary component data is used to train a selection of candidate feedforward neural networks. The selected network is mined for rules, expressed in human-readable and machine-executable form. The rule and network accuracy are compared, and expert commentary is made on the readability and reliability of the extracted rule set. The ultimate goal of this research is to produce rules that meaningfully and accurately describe inflation in terms of the monetary component dataset.
Resumo:
Background - The Met allele of the catechol-O-methyltransferase (COMT) valine-to-methionine (Val158Met) polymorphism is known to affect dopamine-dependent affective regulation within amygdala-prefrontal cortical (PFC) networks. It is also thought to increase the risk of a number of disorders characterized by affective morbidity including bipolar disorder (BD), major depressive disorder (MDD) and anxiety disorders. The disease risk conferred is small, suggesting that this polymorphism represents a modifier locus. Therefore our aim was to investigate how the COMT Val158Met may contribute to phenotypic variation in clinical diagnosis using sad facial affect processing as a probe for its neural action. Method - We employed functional magnetic resonance imaging to measure activation in the amygdala, ventromedial PFC (vmPFC) and ventrolateral PFC (vlPFC) during sad facial affect processing in family members with BD (n=40), MDD and anxiety disorders (n=22) or no psychiatric diagnosis (n=25) and 50 healthy controls. Results - Irrespective of clinical phenotype, the Val158 allele was associated with greater amygdala activation and the Met allele with greater signal change in the vmPFC and vlPFC. Signal changes in the amygdala and vmPFC were not associated with disease expression. However, in the right vlPFC the Met158 allele was associated with greater activation in all family members with affective morbidity compared with relatives without a psychiatric diagnosis and healthy controls. Conclusions - Our results suggest that the COMT Val158Met polymorphism has a pleiotropic effect within the neural networks subserving emotional processing. Furthermore the Met158 allele further reduces cortical efficiency in the vlPFC in individuals with affective morbidity. © 2010 Cambridge University Press.