999 resultados para forecast evaluation
Resumo:
This note develops general model-free adjustment procedures for the calculation of unbiased volatility loss functions based on practically feasible realized volatility benchmarks. The procedures, which exploit the recent asymptotic distributional results in Barndorff-Nielsen and Shephard (2002a), are both easy to implement and highly accurate in empirically realistic situations. On properly accounting for the measurement errors in the volatility forecast evaluations reported in Andersen, Bollerslev, Diebold and Labys (2003), the adjustments result in markedly higher estimates for the true degree of return-volatility predictability.
Resumo:
The evaluation of forecast performance plays a central role both in the interpretation and use of forecast systems and in their development. Different evaluation measures (scores) are available, often quantifying different characteristics of forecast performance. The properties of several proper scores for probabilistic forecast evaluation are contrasted and then used to interpret decadal probability hindcasts of global mean temperature. The Continuous Ranked Probability Score (CRPS), Proper Linear (PL) score, and IJ Good’s logarithmic score (also referred to as Ignorance) are compared; although information from all three may be useful, the logarithmic score has an immediate interpretation and is not insensitive to forecast busts. Neither CRPS nor PL is local; this is shown to produce counter intuitive evaluations by CRPS. Benchmark forecasts from empirical models like Dynamic Climatology place the scores in context. Comparing scores for forecast systems based on physical models (in this case HadCM3, from the CMIP5 decadal archive) against such benchmarks is more informative than internal comparison systems based on similar physical simulation models with each other. It is shown that a forecast system based on HadCM3 out performs Dynamic Climatology in decadal global mean temperature hindcasts; Dynamic Climatology previously outperformed a forecast system based upon HadGEM2 and reasons for these results are suggested. Forecasts of aggregate data (5-year means of global mean temperature) are, of course, narrower than forecasts of annual averages due to the suppression of variance; while the average “distance” between the forecasts and a target may be expected to decrease, little if any discernible improvement in probabilistic skill is achieved.
Resumo:
Bibliography: p. 20-23.
Resumo:
We propose new methods for evaluating predictive densities. The methods includeKolmogorov-Smirnov and Cram?r-von Mises-type tests for the correct specification ofpredictive densities robust to dynamic mis-specification. The novelty is that the testscan detect mis-specification in the predictive densities even if it appears only overa fraction of the sample, due to the presence of instabilities. Our results indicatethat our tests are well sized and have good power in detecting mis-specification inpredictive densities, even when it is time-varying. An application to density forecastsof the Survey of Professional Forecasters demonstrates the usefulness of the proposedmethodologies.
Resumo:
This paper proposes new methodologies for evaluating out-of-sample forecastingperformance that are robust to the choice of the estimation window size. The methodologies involve evaluating the predictive ability of forecasting models over a wide rangeof window sizes. We show that the tests proposed in the literature may lack the powerto detect predictive ability and might be subject to data snooping across differentwindow sizes if used repeatedly. An empirical application shows the usefulness of themethodologies for evaluating exchange rate models' forecasting ability.
Resumo:
This study examines the information content of alternative implied volatility measures for the 30 components of the Dow Jones Industrial Average Index from 1996 until 2007. Along with the popular Black-Scholes and \model-free" implied volatility expectations, the recently proposed corridor implied volatil- ity (CIV) measures are explored. For all pair-wise comparisons, it is found that a CIV measure that is closely related to the model-free implied volatility, nearly always delivers the most accurate forecasts for the majority of the firms. This finding remains consistent for different forecast horizons, volatility definitions, loss functions and forecast evaluation settings.
Resumo:
Forecasts of differences in growth between countries serve an important role in the justification of governments’ fiscal policy stances, but are not tested for their accuracy as part of the current range of forecast evaluation methods. This paper examines forecasted and outturn growth differentials between countries to identify if there is usefulness in forecasts of “relative” growth. Using OECD forecasts and outturn values for GDP growth for (combinations of) the G7 countries between 1984 and 2010, the paper finds that the OECD’s success in predicting the relative growth of G7 countries during this period is good. For each two-country combination results indicate that relative growth forecasts are less useful for countries which have smaller outturn growth differentials.
Resumo:
This paper combines multivariate density forecasts of output growth, inflationand interest rates from a suite of models. An out-of-sample weighting scheme based onthe predictive likelihood as proposed by Eklund and Karlsson (2005) and Andersson andKarlsson (2007) is used to combine the models. Three classes of models are considered: aBayesian vector autoregression (BVAR), a factor-augmented vector autoregression (FAVAR)and a medium-scale dynamic stochastic general equilibrium (DSGE) model. Using Australiandata, we find that, at short forecast horizons, the Bayesian VAR model is assignedthe most weight, while at intermediate and longer horizons the factor model is preferred.The DSGE model is assigned little weight at all horizons, a result that can be attributedto the DSGE model producing density forecasts that are very wide when compared withthe actual distribution of observations. While a density forecast evaluation exercise revealslittle formal evidence that the optimally combined densities are superior to those from thebest-performing individual model, or a simple equal-weighting scheme, this may be a resultof the short sample available.
Resumo:
The main goal of this article is to provide an answer to the question: "Does anything forecast exchange rates, and if so, which variables?". It is well known thatexchange rate fluctuations are very difficult to predict using economic models, andthat a random walk forecasts exchange rates better than any economic model (theMeese and Rogoff puzzle). However, the recent literature has identified a series of fundamentals/methodologies that claim to have resolved the puzzle. This article providesa critical review of the recent literature on exchange rate forecasting and illustratesthe new methodologies and fundamentals that have been recently proposed in an up-to-date, thorough empirical analysis. Overall, our analysis of the literature and thedata suggests that the answer to the question: "Are exchange rates predictable?" is,"It depends" -on the choice of predictor, forecast horizon, sample period, model, andforecast evaluation method. Predictability is most apparent when one or more of thefollowing hold: the predictors are Taylor rule or net foreign assets, the model is linear, and a small number of parameters are estimated. The toughest benchmark is therandom walk without drift.
Resumo:
We propose new methods for evaluating predictive densities that focus on the models' actual predictive ability in finite samples. The tests offer a simple way of evaluatingthe correct specification of predictive densities, either parametric or non-parametric.The results indicate that our tests are well sized and have good power in detecting mis-specification in predictive densities. An empirical application to the Survey ofProfessional Forecasters and a baseline Dynamic Stochastic General Equilibrium modelshows the usefulness of our methodology.
Resumo:
Recent research has suggested that forecast evaluation on the basis of standard statistical loss functions could prefer models which are sub-optimal when used in a practical setting. This paper explores a number of statistical models for predicting the daily volatility of several key UK financial time series. The out-of-sample forecasting performance of various linear and GARCH-type models of volatility are compared with forecasts derived from a multivariate approach. The forecasts are evaluated using traditional metrics, such as mean squared error, and also by how adequately they perform in a modern risk management setting. We find that the relative accuracies of the various methods are highly sensitive to the measure used to evaluate them. Such results have implications for any econometric time series forecasts which are subsequently employed in financial decisionmaking.
Resumo:
A number of methods of evaluating the validity of interval forecasts of financial data are analysed, and illustrated using intraday FTSE100 index futures returns. Some existing interval forecast evaluation techniques, such as the Markov chain approach of Christoffersen (1998), are shown to be inappropriate in the presence of periodic heteroscedasticity. Instead, we consider a regression-based test, and a modified version of Christoffersen's Markov chain test for independence, and analyse their properties when the financial time series exhibit periodic volatility. These approaches lead to different conclusions when interval forecasts of FTSE100 index futures returns generated by various GARCH(1,1) and periodic GARCH(1,1) models are evaluated.
Resumo:
We compare linear autoregressive (AR) models and self-exciting threshold autoregressive (SETAR) models in terms of their point forecast performance, and their ability to characterize the uncertainty surrounding those forecasts, i.e. interval or density forecasts. A two-regime SETAR process is used as the data-generating process in an extensive set of Monte Carlo simulations, and we consider the discriminatory power of recently developed methods of forecast evaluation for different degrees of non-linearity. We find that the interval and density evaluation methods are unlikely to show the linear model to be deficient on samples of the size typical for macroeconomic data
Resumo:
We consider evaluating the UK Monetary Policy Committee's inflation density forecasts using probability integral transform goodness-of-fit tests. These tests evaluate the whole forecast density. We also consider whether the probabilities assigned to inflation being in certain ranges are well calibrated, where the ranges are chosen to be those of particular relevance to the MPC, given its remit of maintaining inflation rates in a band around per annum. Finally, we discuss the decision-based approach to forecast evaluation in relation to the MPC forecasts
Resumo:
Simulation models are widely employed to make probability forecasts of future conditions on seasonal to annual lead times. Added value in such forecasts is reflected in the information they add, either to purely empirical statistical models or to simpler simulation models. An evaluation of seasonal probability forecasts from the Development of a European Multimodel Ensemble system for seasonal to inTERannual prediction (DEMETER) and ENSEMBLES multi-model ensemble experiments is presented. Two particular regions are considered: Nino3.4 in the Pacific and the Main Development Region in the Atlantic; these regions were chosen before any spatial distribution of skill was examined. The ENSEMBLES models are found to have skill against the climatological distribution on seasonal time-scales. For models in ENSEMBLES that have a clearly defined predecessor model in DEMETER, the improvement from DEMETER to ENSEMBLES is discussed. Due to the long lead times of the forecasts and the evolution of observation technology, the forecast-outcome archive for seasonal forecast evaluation is small; arguably, evaluation data for seasonal forecasting will always be precious. Issues of information contamination from in-sample evaluation are discussed and impacts (both positive and negative) of variations in cross-validation protocol are demonstrated. Other difficulties due to the small forecast-outcome archive are identified. The claim that the multi-model ensemble provides a ‘better’ probability forecast than the best single model is examined and challenged. Significant forecast information beyond the climatological distribution is also demonstrated in a persistence probability forecast. The ENSEMBLES probability forecasts add significantly more information to empirical probability forecasts on seasonal time-scales than on decadal scales. Current operational forecasts might be enhanced by melding information from both simulation models and empirical models. Simulation models based on physical principles are sometimes expected, in principle, to outperform empirical models; direct comparison of their forecast skill provides information on progress toward that goal.