91 resultados para Clipping Noise
em Aston University Research Archive
Resumo:
It is well known that the addition of noise to the input data of a neural network during training can, in some circumstances, lead to significant improvements in generalization performance. Previous work has shown that such training with noise is equivalent to a form of regularization in which an extra term is added to the error function. However, the regularization term, which involves second derivatives of the error function, is not bounded below, and so can lead to difficulties if used directly in a learning algorithm based on error minimization. In this paper we show that, for the purposes of network training, the regularization term can be reduced to a positive definite form which involves only first derivatives of the network mapping. For a sum-of-squares error function, the regularization term belongs to the class of generalized Tikhonov regularizers. Direct minimization of the regularized error function provides a practical alternative to training with noise.
Resumo:
We study the effect of two types of noise, data noise and model noise, in an on-line gradient-descent learning scenario for general two-layer student network with an arbitrary number of hidden units. Training examples are randomly drawn input vectors labeled by a two-layer teacher network with an arbitrary number of hidden units. Data is then corrupted by Gaussian noise affecting either the output or the model itself. We examine the effect of both types of noise on the evolution of order parameters and the generalization error in various phases of the learning process.
Resumo:
Gaussian processes provide natural non-parametric prior distributions over regression functions. In this paper we consider regression problems where there is noise on the output, and the variance of the noise depends on the inputs. If we assume that the noise is a smooth function of the inputs, then it is natural to model the noise variance using a second Gaussian process, in addition to the Gaussian process governing the noise-free output value. We show that prior uncertainty about the parameters controlling both processes can be handled and that the posterior distribution of the noise rate can be sampled from using Markov chain Monte Carlo methods. Our results on a synthetic data set give a posterior noise variance that well-approximates the true variance.
Resumo:
In most treatments of the regression problem it is assumed that the distribution of target data can be described by a deterministic function of the inputs, together with additive Gaussian noise having constant variance. The use of maximum likelihood to train such models then corresponds to the minimization of a sum-of-squares error function. In many applications a more realistic model would allow the noise variance itself to depend on the input variables. However, the use of maximum likelihood to train such models would give highly biased results. In this paper we show how a Bayesian treatment can allow for an input-dependent variance while overcoming the bias of maximum likelihood.
Resumo:
The ERS-1 satellite carries a scatterometer which measures the amount of radiation scattered back toward the satellite by the ocean's surface. These measurements can be used to infer wind vectors. The implementation of a neural network based forward model which maps wind vectors to radar backscatter is addressed. Input noise cannot be neglected. To account for this noise, a Bayesian framework is adopted. However, Markov Chain Monte Carlo sampling is too computationally expensive. Instead, gradient information is used with a non-linear optimisation algorithm to find the maximum em a posteriori probability values of the unknown variables. The resulting models are shown to compare well with the current operational model when visualised in the target space.
Resumo:
We study the dynamics of on-line learning in multilayer neural networks where training examples are sampled with repetition and where the number of examples scales with the number of network weights. The analysis is carried out using the dynamical replica method aimed at obtaining a closed set of coupled equations for a set of macroscopic variables from which both training and generalization errors can be calculated. We focus on scenarios whereby training examples are corrupted by additive Gaussian output noise and regularizers are introduced to improve the network performance. The dependence of the dynamics on the noise level, with and without regularizers, is examined, as well as that of the asymptotic values obtained for both training and generalization errors. We also demonstrate the ability of the method to approximate the learning dynamics in structurally unrealizable scenarios. The theoretical results show good agreement with those obtained by computer simulations.
Resumo:
We determine the critical noise level for decoding low density parity check error correcting codes based on the magnetization enumerator , rather than on the weight enumerator employed in the information theory literature. The interpretation of our method is appealingly simple, and the relation between the different decoding schemes such as typical pairs decoding, MAP, and finite temperature decoding (MPM) becomes clear. In addition, our analysis provides an explanation for the difference in performance between MN and Gallager codes. Our results are more optimistic than those derived via the methods of information theory and are in excellent agreement with recent results from another statistical physics approach.
Resumo:
The replica method, developed in statistical physics, is employed in conjunction with Gallager's methodology to accurately evaluate zero error noise thresholds for Gallager code ensembles. Our approach generally provides more optimistic evaluations than those reported in the information theory literature for sparse matrices; the difference vanishes as the parity check matrix becomes dense.
Resumo:
The ERS-1 satellite carries a scatterometer which measures the amount of radiation scattered back toward the satellite by the ocean's surface. These measurements can be used to infer wind vectors. The implementation of a neural network based forward model which maps wind vectors to radar backscatter is addressed. Input noise cannot be neglected. To account for this noise, a Bayesian framework is adopted. However, Markov Chain Monte Carlo sampling is too computationally expensive. Instead, gradient information is used with a non-linear optimisation algorithm to find the maximum em a posteriori probability values of the unknown variables. The resulting models are shown to compare well with the current operational model when visualised in the target space.
Resumo:
On 20 October 1997 the London Stock Exchange introduced a new trading system called SETS. This system was to replace the dealer system SEAQ, which had been in operation since 1986. Using the iterative sum of squares test introduced by Inclan and Tiao (1994), we investigate whether there was a change in the unconditional variance of opening and closing returns, at the time SETS was introduced. We show that for the FTSE-100 stocks traded on SETS, on the days following its introduction, there was a widespread increase in the volatility of both opening and closing returns. However, no synchronous volatility changes were found to be associated with the FTSE-100 index or FTSE-250 stocks. We conclude therefore that the introduction of the SETS trading mechanism caused an increase in noise at the time the system was introduced.
Resumo:
In this article a partial-adjustment model, which shows how equity prices fail to adjust instantaneously to new information, is estimated using a Kalman filter. For the components of the Dow Jones Industrial 30 index I aim to identify whether overreaction or noise is the cause of serial correlation and high volatility associated with opening returns. I find that the tendency for overreaction in opening prices is much stronger than for closing prices; therefore, overreaction rather than noise may account for differences in the return behavior of opening and closing returns.
Resumo:
Visual detection performance (d') is usually an accelerating function of stimulus contrast, which could imply a smooth, threshold-like nonlinearity in the sensory response. Alternatively, Pelli (1985 Journal of the Optical Society of America A 2 1508 - 1532) developed the 'uncertainty model' in which responses were linear with contrast, but the observer was uncertain about which of many noisy channels contained the signal. Such internal uncertainty effectively adds noise to weak signals, and predicts the nonlinear psychometric function. We re-examined these ideas by plotting psychometric functions (as z-scores) for two observers (SAW, PRM) with high precision. The task was to detect a single, vertical, blurred line at the fixation point, or identify its polarity (light vs dark). Detection of a known polarity was nearly linear for SAW but very nonlinear for PRM. Randomly interleaving light and dark trials reduced performance and rendered it non-linear for SAW, but had little effect for PRM. This occurred for both single-interval and 2AFC procedures. The whole pattern of results was well predicted by our Monte Carlo simulation of Pelli's model, with only two free parameters. SAW (highly practised) had very low uncertainty. PRM (with little prior practice) had much greater uncertainty, resulting in lower contrast sensitivity, nonlinear performance, and no effect of external (polarity) uncertainty. For SAW, identification was about v2 better than detection, implying statistically independent channels for stimuli of opposite polarity, rather than an opponent (light - dark) channel. These findings strongly suggest that noise and uncertainty, rather than sensory nonlinearity, limit visual detection.
Resumo:
The ability to distinguish one visual stimulus from another slightly different one depends on the variability of their internal representations. In a recent paper on human visual-contrast discrimination, Kontsevich et al (2002 Vision Research 42 1771 - 1784) re-considered the long-standing question whether the internal noise that limits discrimination is fixed (contrast-invariant) or variable (contrast-dependent). They tested discrimination performance for 3 cycles deg-1 gratings over a wide range of incremental contrast levels at three masking contrasts, and showed that a simple model with an expansive response function and response-dependent noise could fit the data very well. Their conclusion - that noise in visual-discrimination tasks increases markedly with contrast - has profound implications for our understanding and modelling of vision. Here, however, we re-analyse their data, and report that a standard gain-control model with a compressive response function and fixed additive noise can also fit the data remarkably well. Thus these experimental data do not allow us to decide between the two models. The question remains open. [Supported by EPSRC grant GR/S74515/01]
Resumo:
We studied the visual mechanisms that serve to encode spatial contrast at threshold and supra-threshold levels. In a 2AFC contrast-discrimination task, observers had to detect the presence of a vertical 1 cycle deg-1 test grating (of contrast dc) that was superimposed on a similar vertical 1 cycle deg-1 pedestal grating, whereas in pattern masking the test grating was accompanied by a very different masking grating (horizontal 1 cycle deg-1, or oblique 3 cycles deg-1). When expressed as threshold contrast (dc at 75% correct) versus mask contrast (c) our results confirm previous ones in showing a characteristic 'dipper function' for contrast discrimination but a smoothly increasing threshold for pattern masking. However, fresh insight is gained by analysing and modelling performance (p; percent correct) as a joint function of (c, dc) - the performance surface. In contrast discrimination, psychometric functions (p versus logdc) are markedly less steep when c is above threshold, but in pattern masking this reduction of slope did not occur. We explored a standard gain-control model with six free parameters. Three parameters control the contrast response of the detection mechanism and one parameter weights the mask contrast in the cross-channel suppression effect. We assume that signal-detection performance (d') is limited by additive noise of constant variance. Noise level and lapse rate are also fitted parameters of the model. We show that this model accounts very accurately for the whole performance surface in both types of masking, and thus explains the threshold functions and the pattern of variation in psychometric slopes. The cross-channel weight is about 0.20. The model shows that the mechanism response to contrast increment (dc) is linearised by the presence of pedestal contrasts but remains nonlinear in pattern masking.