922 resultados para sums of squares


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Minimization of a sum-of-squares or cross-entropy error function leads to network outputs which approximate the conditional averages of the target data, conditioned on the input vector. For classifications problems, with a suitably chosen target coding scheme, these averages represent the posterior probabilities of class membership, and so can be regarded as optimal. For problems involving the prediction of continuous variables, however, the conditional averages provide only a very limited description of the properties of the target variables. This is particularly true for problems in which the mapping to be learned is multi-valued, as often arises in the solution of inverse problems, since the average of several correct target values is not necessarily itself a correct value. In order to obtain a complete description of the data, for the purposes of predicting the outputs corresponding to new input vectors, we must model the conditional probability distribution of the target data, again conditioned on the input vector. In this paper we introduce a new class of network models obtained by combining a conventional neural network with a mixture density model. The complete system is called a Mixture Density Network, and can in principle represent arbitrary conditional probability distributions in the same way that a conventional neural network can represent arbitrary functions. We demonstrate the effectiveness of Mixture Density Networks using both a toy problem and a problem involving robot inverse kinematics.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

It is well known that the addition of noise to the input data of a neural network during training can, in some circumstances, lead to significant improvements in generalization performance. Previous work has shown that such training with noise is equivalent to a form of regularization in which an extra term is added to the error function. However, the regularization term, which involves second derivatives of the error function, is not bounded below, and so can lead to difficulties if used directly in a learning algorithm based on error minimization. In this paper we show that, for the purposes of network training, the regularization term can be reduced to a positive definite form which involves only first derivatives of the network mapping. For a sum-of-squares error function, the regularization term belongs to the class of generalized Tikhonov regularizers. Direct minimization of the regularized error function provides a practical alternative to training with noise.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In most treatments of the regression problem it is assumed that the distribution of target data can be described by a deterministic function of the inputs, together with additive Gaussian noise having constant variance. The use of maximum likelihood to train such models then corresponds to the minimization of a sum-of-squares error function. In many applications a more realistic model would allow the noise variance itself to depend on the input variables. However, the use of maximum likelihood to train such models would give highly biased results. In this paper we show how a Bayesian treatment can allow for an input-dependent variance while overcoming the bias of maximum likelihood.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

It is well known that one of the obstacles to effective forecasting of exchange rates is heteroscedasticity (non-stationary conditional variance). The autoregressive conditional heteroscedastic (ARCH) model and its variants have been used to estimate a time dependent variance for many financial time series. However, such models are essentially linear in form and we can ask whether a non-linear model for variance can improve results just as non-linear models (such as neural networks) for the mean have done. In this paper we consider two neural network models for variance estimation. Mixture Density Networks (Bishop 1994, Nix and Weigend 1994) combine a Multi-Layer Perceptron (MLP) and a mixture model to estimate the conditional data density. They are trained using a maximum likelihood approach. However, it is known that maximum likelihood estimates are biased and lead to a systematic under-estimate of variance. More recently, a Bayesian approach to parameter estimation has been developed (Bishop and Qazaz 1996) that shows promise in removing the maximum likelihood bias. However, up to now, this model has not been used for time series prediction. Here we compare these algorithms with two other models to provide benchmark results: a linear model (from the ARIMA family), and a conventional neural network trained with a sum-of-squares error function (which estimates the conditional mean of the time series with a constant variance noise model). This comparison is carried out on daily exchange rate data for five currencies.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Obtaining wind vectors over the ocean is important for weather forecasting and ocean modelling. Several satellite systems used operationally by meteorological agencies utilise scatterometers to infer wind vectors over the oceans. In this paper we present the results of using novel neural network based techniques to estimate wind vectors from such data. The problem is partitioned into estimating wind speed and wind direction. Wind speed is modelled using a multi-layer perceptron (MLP) and a sum of squares error function. Wind direction is a periodic variable and a multi-valued function for a given set of inputs; a conventional MLP fails at this task, and so we model the full periodic probability density of direction conditioned on the satellite derived inputs using a Mixture Density Network (MDN) with periodic kernel functions. A committee of the resulting MDNs is shown to improve the results.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We employ the methods of statistical physics to study the performance of Gallager type error-correcting codes. In this approach, the transmitted codeword comprises Boolean sums of the original message bits selected by two randomly-constructed sparse matrices. We show that a broad range of these codes potentially saturate Shannon's bound but are limited due to the decoding dynamics used. Other codes show sub-optimal performance but are not restricted by the decoding dynamics. We show how these codes may also be employed as a practical public-key cryptosystem and are of competitive performance to modern cyptographical methods.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Obtaining wind vectors over the ocean is important for weather forecasting and ocean modelling. Several satellite systems used operationally by meteorological agencies utilise scatterometers to infer wind vectors over the oceans. In this paper we present the results of using novel neural network based techniques to estimate wind vectors from such data. The problem is partitioned into estimating wind speed and wind direction. Wind speed is modelled using a multi-layer perceptron (MLP) and a sum of squares error function. Wind direction is a periodic variable and a multi-valued function for a given set of inputs; a conventional MLP fails at this task, and so we model the full periodic probability density of direction conditioned on the satellite derived inputs using a Mixture Density Network (MDN) with periodic kernel functions. A committee of the resulting MDNs is shown to improve the results.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In January 2001 Greece joined the eurozone. The aim of this article is to examine whether an intention to join the eurozone had any impact on exchange rate volatility. We apply the Iterated Cumulative Sum of Squares (ICSS) algorithm of Inclan and Tiao (1994) to a set of Greek drachma exchange rate changes. We find evidence to suggest that the unconditional volatility of the drachma exchange rate against the dollar, British pound, yen, German mark and ECU/Euro was nonstationary, exhibiting a large number of volatility changes prior to European Monetary Union (EMU) membership. We then use a news archive service to identify the events that might have caused exchange rate volatility to shift. We find that devaluation of the drachma increased exchange rate volatility but ERM membership and a commitment to joining the eurozone led to lower volatility. Our findings therefore suggest that a strong commitment to join the eurozone may be sufficient to reduce some exchange rate volatility which has implications for countries intending to join the eurozone in the future.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper tests one of the fundamental assumptions of regional policy makers over the last 20 years. Western governments, in seeking to attract internationally mobile capital have spent significant sums of public money on subsidies and grants. This is justified on the basis that the social returns to FDI are significantly greater than the private returns, due to productivity or technology spillovers from inward investors to domestic industry. However, this paper generates some estimates of these spillovers for both assisted areas and non-assisted areas in the UK, and questions the size of these social returns, arguing that productivity spillovers do not occur in regions where significant inward investment incentives are available.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Conventional feed forward Neural Networks have used the sum-of-squares cost function for training. A new cost function is presented here with a description length interpretation based on Rissanen's Minimum Description Length principle. It is a heuristic that has a rough interpretation as the number of data points fit by the model. Not concerned with finding optimal descriptions, the cost function prefers to form minimum descriptions in a naive way for computational convenience. The cost function is called the Naive Description Length cost function. Finding minimum description models will be shown to be closely related to the identification of clusters in the data. As a consequence the minimum of this cost function approximates the most probable mode of the data rather than the sum-of-squares cost function that approximates the mean. The new cost function is shown to provide information about the structure of the data. This is done by inspecting the dependence of the error to the amount of regularisation. This structure provides a method of selecting regularisation parameters as an alternative or supplement to Bayesian methods. The new cost function is tested on a number of multi-valued problems such as a simple inverse kinematics problem. It is also tested on a number of classification and regression problems. The mode-seeking property of this cost function is shown to improve prediction in time series problems. Description length principles are used in a similar fashion to derive a regulariser to control network complexity.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This thesis studied the effect of (i) the number of grating components and (ii) parameter randomisation on root-mean-square (r.m.s.) contrast sensitivity and spatial integration. The effectiveness of spatial integration without external spatial noise depended on the number of equally spaced orientation components in the sum of gratings. The critical area marking the saturation of spatial integration was found to decrease when the number of components increased from 1 to 5-6 but increased again at 8-16 components. The critical area behaved similarly as a function of the number of grating components when stimuli consisted of 3, 6 or 16 components with different orientations and/or phases embedded in spatial noise. Spatial integration seemed to depend on the global Fourier structure of the stimulus. Spatial integration was similar for sums of two vertical cosine or sine gratings with various Michelson contrasts in noise. The critical area for a grating sum was found to be a sum of logarithmic critical areas for the component gratings weighted by their relative Michelson contrasts. The human visual system was modelled as a simple image processor where the visual stimuli is first low-pass filtered by the optical modulation transfer function of the human eye and secondly high-pass filtered, up to the spatial cut-off frequency determined by the lowest neural sampling density, by the neural modulation transfer function of the visual pathways. The internal noise is then added before signal interpretation occurs in the brain. The detection is mediated by a local spatially windowed matched filter. The model was extended to include complex stimuli and its applicability to the data was found to be successful. The shape of spatial integration function was similar for non-randomised and randomised simple and complex gratings. However, orientation and/or phase randomised reduced r.m.s contrast sensitivity by a factor of 2. The effect of parameter randomisation on spatial integration was modelled under the assumption that human observers change the observer strategy from cross-correlation (i.e., a matched filter) to auto-correlation detection when uncertainty is introduced to the task. The model described the data accurately.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The aim of this paper is to examine the short term dynamics of foreign exchange rate spreads. Using a vector autoregressive model (VAR) we show that most of the variation in the spread comes from the long run dependencies between past and future spreads rather than being caused by changes in inventory, adverse selection, cost of carry or order processing costs. We apply the Integrated Cumulative Sum of Squares (ICSS) algorithm of Inclan and Tiao (1994) to discover how often spread volatility changes. We find that spread volatility shifts are relatively uncommon and shifts in one currency spread tend not to spillover to other currency spreads. © 2013.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The paper has been presented at the 12th International Conference on Applications of Computer Algebra, Varna, Bulgaria, June, 2006

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The general ordinary quasi-differential expression M of n-th order with complex coefficients and its formal adjoint M + are considered over a regoin (a, b) on the real line, −∞ ≤ a < b ≤ ∞, on which the operator may have a finite number of singular points. By considering M over various subintervals on which singularities occur only at the ends, restrictions of the maximal operator generated by M in L2|w (a, b) which are regularly solvable with respect to the minimal operators T0 (M ) and T0 (M + ). In addition to direct sums of regularly solvable operators defined on the separate subintervals, there are other regularly solvable restrications of the maximal operator which involve linking the various intervals together in interface like style.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We consider the problem of minimizing the max of two convex functions from both approximation and sensitivity point of view.This lead up to study the epiconvergence of a sequence of level sums of convex functions and the related dual problems.