700 resultados para Multilayer Perceptron
Resumo:
The storage capacity of multilayer networks with overlapping receptive fields is investigated for a constructive algorithm within a one-step replica symmetry breaking (RSB) treatment. We find that the storage capacity increases logarithmically with the number of hidden units K without saturating the Mitchison-Durbin bound. The slope of the logarithmic increase decays exponentionally with the stability with which the patterns have been stored.
Resumo:
Error rates of a Boolean perceptron with threshold and either spherical or Ising constraint on the weight vector are calculated for storing patterns from biased input and output distributions derived within a one-step replica symmetry breaking (RSB) treatment. For unbiased output distribution and non-zero stability of the patterns, we find a critical load, α p, above which two solutions to the saddlepoint equations appear; one with higher free energy and zero threshold and a dominant solution with non-zero threshold. We examine this second-order phase transition and the dependence of α p on the required pattern stability, κ, for both one-step RSB and replica symmetry (RS) in the spherical case and for one-step RSB in the Ising case.
Resumo:
A formalism for modelling the dynamics of Genetic Algorithms (GAs) using methods from statistical mechanics, originally due to Prugel-Bennett and Shapiro, is reviewed, generalized and improved upon. This formalism can be used to predict the averaged trajectory of macroscopic statistics describing the GA's population. These macroscopics are chosen to average well between runs, so that fluctuations from mean behaviour can often be neglected. Where necessary, non-trivial terms are determined by assuming maximum entropy with constraints on known macroscopics. Problems of realistic size are described in compact form and finite population effects are included, often proving to be of fundamental importance. The macroscopics used here are cumulants of an appropriate quantity within the population and the mean correlation (Hamming distance) within the population. Including the correlation as an explicit macroscopic provides a significant improvement over the original formulation. The formalism is applied to a number of simple optimization problems in order to determine its predictive power and to gain insight into GA dynamics. Problems which are most amenable to analysis come from the class where alleles within the genotype contribute additively to the phenotype. This class can be treated with some generality, including problems with inhomogeneous contributions from each site, non-linear or noisy fitness measures, simple diploid representations and temporally varying fitness. The results can also be applied to a simple learning problem, generalization in a binary perceptron, and a limit is identified for which the optimal training batch size can be determined for this problem. The theory is compared to averaged results from a real GA in each case, showing excellent agreement if the maximum entropy principle holds. Some situations where this approximation brakes down are identified. In order to fully test the formalism, an attempt is made on the strong sc np-hard problem of storing random patterns in a binary perceptron. Here, the relationship between the genotype and phenotype (training error) is strongly non-linear. Mutation is modelled under the assumption that perceptron configurations are typical of perceptrons with a given training error. Unfortunately, this assumption does not provide a good approximation in general. It is conjectured that perceptron configurations would have to be constrained by other statistics in order to accurately model mutation for this problem. Issues arising from this study are discussed in conclusion and some possible areas of further research are outlined.
Resumo:
This paper presents a forecasting technique for forward energy prices, one day ahead. This technique combines a wavelet transform and forecasting models such as multi- layer perceptron, linear regression or GARCH. These techniques are applied to real data from the UK gas markets to evaluate their performance. The results show that the forecasting accuracy is improved significantly by using the wavelet transform. The methodology can be also applied to forecasting market clearing prices and electricity/gas loads.
Resumo:
It is well known that one of the obstacles to effective forecasting of exchange rates is heteroscedasticity (non-stationary conditional variance). The autoregressive conditional heteroscedastic (ARCH) model and its variants have been used to estimate a time dependent variance for many financial time series. However, such models are essentially linear in form and we can ask whether a non-linear model for variance can improve results just as non-linear models (such as neural networks) for the mean have done. In this paper we consider two neural network models for variance estimation. Mixture Density Networks (Bishop 1994, Nix and Weigend 1994) combine a Multi-Layer Perceptron (MLP) and a mixture model to estimate the conditional data density. They are trained using a maximum likelihood approach. However, it is known that maximum likelihood estimates are biased and lead to a systematic under-estimate of variance. More recently, a Bayesian approach to parameter estimation has been developed (Bishop and Qazaz 1996) that shows promise in removing the maximum likelihood bias. However, up to now, this model has not been used for time series prediction. Here we compare these algorithms with two other models to provide benchmark results: a linear model (from the ARIMA family), and a conventional neural network trained with a sum-of-squares error function (which estimates the conditional mean of the time series with a constant variance noise model). This comparison is carried out on daily exchange rate data for five currencies.
Resumo:
A theoretical model is presented which describes selection in a genetic algorithm (GA) under a stochastic fitness measure and correctly accounts for finite population effects. Although this model describes a number of selection schemes, we only consider Boltzmann selection in detail here as results for this form of selection are particularly transparent when fitness is corrupted by additive Gaussian noise. Finite population effects are shown to be of fundamental importance in this case, as the noise has no effect in the infinite population limit. In the limit of weak selection we show how the effects of any Gaussian noise can be removed by increasing the population size appropriately. The theory is tested on two closely related problems: the one-max problem corrupted by Gaussian noise and generalization in a perceptron with binary weights. The averaged dynamics can be accurately modelled for both problems using a formalism which describes the dynamics of the GA using methods from statistical mechanics. The second problem is a simple example of a learning problem and by considering this problem we show how the accurate characterization of noise in the fitness evaluation may be relevant in machine learning. The training error (negative fitness) is the number of misclassified training examples in a batch and can be considered as a noisy version of the generalization error if an independent batch is used for each evaluation. The noise is due to the finite batch size and in the limit of large problem size and weak selection we show how the effect of this noise can be removed by increasing the population size. This allows the optimal batch size to be determined, which minimizes computation time as well as the total number of training examples required.
Resumo:
Obtaining wind vectors over the ocean is important for weather forecasting and ocean modelling. Several satellite systems used operationally by meteorological agencies utilise scatterometers to infer wind vectors over the oceans. In this paper we present the results of using novel neural network based techniques to estimate wind vectors from such data. The problem is partitioned into estimating wind speed and wind direction. Wind speed is modelled using a multi-layer perceptron (MLP) and a sum of squares error function. Wind direction is a periodic variable and a multi-valued function for a given set of inputs; a conventional MLP fails at this task, and so we model the full periodic probability density of direction conditioned on the satellite derived inputs using a Mixture Density Network (MDN) with periodic kernel functions. A committee of the resulting MDNs is shown to improve the results.
Resumo:
The ERS-1 Satellite was launched in July 1991 by the European Space Agency into a polar orbit at about km800, carrying a C-band scatterometer. A scatterometer measures the amount of radar back scatter generated by small ripples on the ocean surface induced by instantaneous local winds. Operational methods that extract wind vectors from satellite scatterometer data are based on the local inversion of a forward model, mapping scatterometer observations to wind vectors, by the minimisation of a cost function in the scatterometer measurement space.par This report uses mixture density networks, a principled method for modelling conditional probability density functions, to model the joint probability distribution of the wind vectors given the satellite scatterometer measurements in a single cell (the `inverse' problem). The complexity of the mapping and the structure of the conditional probability density function are investigated by varying the number of units in the hidden layer of the multi-layer perceptron and the number of kernels in the Gaussian mixture model of the mixture density network respectively. The optimal model for networks trained per trace has twenty hidden units and four kernels. Further investigation shows that models trained with incidence angle as an input have results comparable to those models trained by trace. A hybrid mixture density network that incorporates geophysical knowledge of the problem confirms other results that the conditional probability distribution is dominantly bimodal.par The wind retrieval results improve on previous work at Aston, but do not match other neural network techniques that use spatial information in the inputs, which is to be expected given the ambiguity of the inverse problem. Current work uses the local inverse model for autonomous ambiguity removal in a principled Bayesian framework. Future directions in which these models may be improved are given.
Resumo:
We study the dynamics of on-line learning in multilayer neural networks where training examples are sampled with repetition and where the number of examples scales with the number of network weights. The analysis is carried out using the dynamical replica method aimed at obtaining a closed set of coupled equations for a set of macroscopic variables from which both training and generalization errors can be calculated. We focus on scenarios whereby training examples are corrupted by additive Gaussian output noise and regularizers are introduced to improve the network performance. The dependence of the dynamics on the noise level, with and without regularizers, is examined, as well as that of the asymptotic values obtained for both training and generalization errors. We also demonstrate the ability of the method to approximate the learning dynamics in structurally unrealizable scenarios. The theoretical results show good agreement with those obtained by computer simulations.
Resumo:
In this paper we review recent theoretical approaches for analysing the dynamics of on-line learning in multilayer neural networks using methods adopted from statistical physics. The analysis is based on monitoring a set of macroscopic variables from which the generalisation error can be calculated. A closed set of dynamical equations for the macroscopic variables is derived analytically and solved numerically. The theoretical framework is then employed for defining optimal learning parameters and for analysing the incorporation of second order information into the learning process using natural gradient descent and matrix-momentum based methods. We will also briefly explain an extension of the original framework for analysing the case where training examples are sampled with repetition.
Resumo:
A novel approach, based on statistical mechanics, to analyze typical performance of optimum code-division multiple-access (CDMA) multiuser detectors is reviewed. A `black-box' view ot the basic CDMA channel is introduced, based on which the CDMA multiuser detection problem is regarded as a `learning-from-examples' problem of the `binary linear perceptron' in the neural network literature. Adopting Bayes framework, analysis of the performance of the optimum CDMA multiuser detectors is reduced to evaluation of the average of the cumulant generating function of a relevant posterior distribution. The evaluation of the average cumulant generating function is done, based on formal analogy with a similar calculation appearing in the spin glass theory in statistical mechanics, by making use of the replica method, a method developed in the spin glass theory.
Resumo:
Obtaining wind vectors over the ocean is important for weather forecasting and ocean modelling. Several satellite systems used operationally by meteorological agencies utilise scatterometers to infer wind vectors over the oceans. In this paper we present the results of using novel neural network based techniques to estimate wind vectors from such data. The problem is partitioned into estimating wind speed and wind direction. Wind speed is modelled using a multi-layer perceptron (MLP) and a sum of squares error function. Wind direction is a periodic variable and a multi-valued function for a given set of inputs; a conventional MLP fails at this task, and so we model the full periodic probability density of direction conditioned on the satellite derived inputs using a Mixture Density Network (MDN) with periodic kernel functions. A committee of the resulting MDNs is shown to improve the results.
Resumo:
This thesis describes the study of various grating based optical fibre sensors for applications in refractive index sensing. The sensitivity of these sensors has been studied and in some cases enhanced using novel techniques. The major areas of development are as follows. The sensitivity of long period gratings (LPGs) to surrounding medium refractive index (SRI) for various periods was investigated. The most sensitive period of LPG was found to be around 160 µm and this was due to the core mode coupling to a single cladding mode but phase matching at two wavelength locations, creating two attenuation peaks, close to the waveguide dispersion turning point. Large angle tilted fibre gratings (TFGs) have similar behaviour to LPGs, in that they couple to the co-propagating cladding modes. The tilted structure of the index modulation within the core of the fibre gives rise to a polarisation dependency, differing the large angle TFG from a LPG. Since the large angle TFG couple to the cladding mode they are SRI sensitive, the sensitivity to SRI can be further increased through cladding etching using HF acid. The thinning of the cladding layer caused a reordering of the cladding modes and shifted to more SRI sensitive cladding modes as the investigation discovered. In a SRI range of 1.36 to 1.40 a sensitivity of 506.9 nm/URI was achieved for the etched large angle TFG, which is greater than the dual resonance LPG. UV inscribed LPGs were coated with sol-gel materials with high RIs. The high RI of the coating caused an increase in cladding mode effective index which in turn caused an increase in the LPG sensitivity to SRI. LPGs of various periods of LPG were coated with sol-gel TiO2 and the optimal thickness was found to vary for each period. By coating of the already highly SRI sensitive 160µm period LPG (which is a dual resonance) with a sol-gel TiO2, the SRI sensitivity was further increased with a peak value of 1458 nm/URI, which was an almost 3 fold increase compared to the uncoated LPG. LPGs were also inscribed using a femtosecond laser which produced a highly focused index change which was no uniform throughout the core of the optical fibre. The inscription technique gave rise to a large polarisation sensitivity and the ability to couple to multiple azimuthal cladding mode sets, not seen with uniform UV inscribed gratings. Through coupling of the core mode to multiple sets of cladding modes, attenuation peaks with opposite wavelength shifts for increasing SRI was observed. Through combining this opposite wavelength shifts, a SRI sensitivity was achieved greater than any single observed attenuations peak. The maximum SRI achieved was 1680 nm/URI for a femtosecond inscribed LPG of period 400 µm. Three different types of surface plasmon resonance (SPR) sensors with a multilayer metal top coating were investigated in D shape optical fibre. The sensors could be separated into two types, utilized a pre UV inscribed tilted Bragg grating and the other employed a post UV exposure to generate surface relief grating structure. This surface perturbation aided the out coupling of light from the core but also changed the sensing mechanism from SPR to localised surface plasmon resonance (LSPR). This greatly increased the SRI sensitivity, compared to the SPR sensors; with the gold coated top layer surface relief sensor producing the largest SRI sensitivity of 2111.5nm/URI was achieved. While, the platinum and silver coated top layer surface relief sensors also gave high SRI sensitivities but also the ability to produce resonances in air (not previously seen with the SPR sensors). These properties were employed in two applications. The silver and platinum surface relief devices were used as gas sensors and were shown to be capable of detecting the minute RI change of different gases. The calculated maximum sensitivities produced were 1882.1dB/URI and 1493.5nm/URI for silver and platinum, respectively. Using a DFB laser and power meter a cheap alternative approach was investigated which showed the ability of the sensors to distinguish between different gases and flow rates of those gases. The gold surface relief sensor was coated in a with a bio compound called an aptamer and it was able to detect various concentrations of a biological compound called Thrombin, ranging from 1mM to as low as 10fM. A solution of 2M NaCl was found to give the best stripping results for Thrombin from the aptamer and showed the reusability of the sensor. The association and disassociation constants were calculated to be 1.0638×106Ms-1 and 0.2482s-1, respectively, showing the high affinity of the Aptamer to thrombin. This supports existing working stating that aptamers could be alternative to enzymes for chemical detection and also helps to explain the low detection limit of the gold surface relief sensor.
Resumo:
An efficient Bayesian inference method for problems that can be mapped onto dense graphs is presented. The approach is based on message passing where messages are averaged over a large number of replicated variable systems exposed to the same evidential nodes. An assumption about the symmetry of the solutions is required for carrying out the averages; here we extend the previous derivation based on a replica-symmetric- (RS)-like structure to include a more complex one-step replica-symmetry-breaking-like (1RSB-like) ansatz. To demonstrate the potential of the approach it is employed for studying critical properties of the Ising linear perceptron and for multiuser detection in code division multiple access (CDMA) under different noise models. Results obtained under the RS assumption in the noncritical regime give rise to a highly efficient signal detection algorithm in the context of CDMA; while in the critical regime one observes a first-order transition line that ends in a continuous phase transition point. Finite size effects are also observed. While the 1RSB ansatz is not required for the original problems, it was applied to the CDMA signal detection problem with a more complex noise model that exhibits RSB behavior, resulting in an improvement in performance. © 2007 The American Physical Society.
Resumo:
This paper presents some forecasting techniques for energy demand and price prediction, one day ahead. These techniques combine wavelet transform (WT) with fixed and adaptive machine learning/time series models (multi-layer perceptron (MLP), radial basis functions, linear regression, or GARCH). To create an adaptive model, we use an extended Kalman filter or particle filter to update the parameters continuously on the test set. The adaptive GARCH model is a new contribution, broadening the applicability of GARCH methods. We empirically compared two approaches of combining the WT with prediction models: multicomponent forecasts and direct forecasts. These techniques are applied to large sets of real data (both stationary and non-stationary) from the UK energy markets, so as to provide comparative results that are statistically stronger than those previously reported. The results showed that the forecasting accuracy is significantly improved by using the WT and adaptive models. The best models on the electricity demand/gas price forecast are the adaptive MLP/GARCH with the multicomponent forecast; their MSEs are 0.02314 and 0.15384 respectively.