24 resultados para forecast evaluation.
em CentAUR: Central Archive University of Reading - UK
Resumo:
The evaluation of forecast performance plays a central role both in the interpretation and use of forecast systems and in their development. Different evaluation measures (scores) are available, often quantifying different characteristics of forecast performance. The properties of several proper scores for probabilistic forecast evaluation are contrasted and then used to interpret decadal probability hindcasts of global mean temperature. The Continuous Ranked Probability Score (CRPS), Proper Linear (PL) score, and IJ Good’s logarithmic score (also referred to as Ignorance) are compared; although information from all three may be useful, the logarithmic score has an immediate interpretation and is not insensitive to forecast busts. Neither CRPS nor PL is local; this is shown to produce counter intuitive evaluations by CRPS. Benchmark forecasts from empirical models like Dynamic Climatology place the scores in context. Comparing scores for forecast systems based on physical models (in this case HadCM3, from the CMIP5 decadal archive) against such benchmarks is more informative than internal comparison systems based on similar physical simulation models with each other. It is shown that a forecast system based on HadCM3 out performs Dynamic Climatology in decadal global mean temperature hindcasts; Dynamic Climatology previously outperformed a forecast system based upon HadGEM2 and reasons for these results are suggested. Forecasts of aggregate data (5-year means of global mean temperature) are, of course, narrower than forecasts of annual averages due to the suppression of variance; while the average “distance” between the forecasts and a target may be expected to decrease, little if any discernible improvement in probabilistic skill is achieved.
Resumo:
Recent research has suggested that forecast evaluation on the basis of standard statistical loss functions could prefer models which are sub-optimal when used in a practical setting. This paper explores a number of statistical models for predicting the daily volatility of several key UK financial time series. The out-of-sample forecasting performance of various linear and GARCH-type models of volatility are compared with forecasts derived from a multivariate approach. The forecasts are evaluated using traditional metrics, such as mean squared error, and also by how adequately they perform in a modern risk management setting. We find that the relative accuracies of the various methods are highly sensitive to the measure used to evaluate them. Such results have implications for any econometric time series forecasts which are subsequently employed in financial decisionmaking.
Resumo:
A number of methods of evaluating the validity of interval forecasts of financial data are analysed, and illustrated using intraday FTSE100 index futures returns. Some existing interval forecast evaluation techniques, such as the Markov chain approach of Christoffersen (1998), are shown to be inappropriate in the presence of periodic heteroscedasticity. Instead, we consider a regression-based test, and a modified version of Christoffersen's Markov chain test for independence, and analyse their properties when the financial time series exhibit periodic volatility. These approaches lead to different conclusions when interval forecasts of FTSE100 index futures returns generated by various GARCH(1,1) and periodic GARCH(1,1) models are evaluated.
Resumo:
We compare linear autoregressive (AR) models and self-exciting threshold autoregressive (SETAR) models in terms of their point forecast performance, and their ability to characterize the uncertainty surrounding those forecasts, i.e. interval or density forecasts. A two-regime SETAR process is used as the data-generating process in an extensive set of Monte Carlo simulations, and we consider the discriminatory power of recently developed methods of forecast evaluation for different degrees of non-linearity. We find that the interval and density evaluation methods are unlikely to show the linear model to be deficient on samples of the size typical for macroeconomic data
Resumo:
We consider evaluating the UK Monetary Policy Committee's inflation density forecasts using probability integral transform goodness-of-fit tests. These tests evaluate the whole forecast density. We also consider whether the probabilities assigned to inflation being in certain ranges are well calibrated, where the ranges are chosen to be those of particular relevance to the MPC, given its remit of maintaining inflation rates in a band around per annum. Finally, we discuss the decision-based approach to forecast evaluation in relation to the MPC forecasts
Resumo:
Simulation models are widely employed to make probability forecasts of future conditions on seasonal to annual lead times. Added value in such forecasts is reflected in the information they add, either to purely empirical statistical models or to simpler simulation models. An evaluation of seasonal probability forecasts from the Development of a European Multimodel Ensemble system for seasonal to inTERannual prediction (DEMETER) and ENSEMBLES multi-model ensemble experiments is presented. Two particular regions are considered: Nino3.4 in the Pacific and the Main Development Region in the Atlantic; these regions were chosen before any spatial distribution of skill was examined. The ENSEMBLES models are found to have skill against the climatological distribution on seasonal time-scales. For models in ENSEMBLES that have a clearly defined predecessor model in DEMETER, the improvement from DEMETER to ENSEMBLES is discussed. Due to the long lead times of the forecasts and the evolution of observation technology, the forecast-outcome archive for seasonal forecast evaluation is small; arguably, evaluation data for seasonal forecasting will always be precious. Issues of information contamination from in-sample evaluation are discussed and impacts (both positive and negative) of variations in cross-validation protocol are demonstrated. Other difficulties due to the small forecast-outcome archive are identified. The claim that the multi-model ensemble provides a ‘better’ probability forecast than the best single model is examined and challenged. Significant forecast information beyond the climatological distribution is also demonstrated in a persistence probability forecast. The ENSEMBLES probability forecasts add significantly more information to empirical probability forecasts on seasonal time-scales than on decadal scales. Current operational forecasts might be enhanced by melding information from both simulation models and empirical models. Simulation models based on physical principles are sometimes expected, in principle, to outperform empirical models; direct comparison of their forecast skill provides information on progress toward that goal.
Resumo:
Simulations of the top-of-atmosphere radiative-energy budget from the Met Office global numerical weather-prediction model are evaluated using new data from the Geostationary Earth Radiation Budget (GERB) instrument on board the Meteosat-8 satellite. Systematic discrepancies between the model simulations and GERB measurements greater than 20 Wm-2 in outgoing long-wave radiation (OLR) and greater than 60 Wm-2 in reflected short-wave radiation (RSR) are identified over the period April-September 2006 using 12 UTC data. Convective cloud over equatorial Africa is spatially less organized and less reflective than in the GERB data. This bias depends strongly on convective-cloud cover, which is highly sensitive to changes in the model convective parametrization. Underestimates in model OLR over the Gulf of Guinea coincide with unrealistic southerly cloud outflow from convective centres to the north. Large overestimates in model RSR over the subtropical ocean, greater than 50 Wm-2 at 12 UTC, are explained by unrealistic radiative properties of low-level cloud relating to overestimation of cloud liquid water compared with independent satellite measurements. The results of this analysis contribute to the development and improvement of parametrizations in the global forecast model.
Resumo:
Medium range flood forecasting activities, driven by various meteorological forecasts ranging from high resolution deterministic forecasts to low spatial resolution ensemble prediction systems, share a major challenge in the appropriateness and design of performance measures. In this paper possible limitations of some traditional hydrological and meteorological prediction quality and verification measures are identified. Some simple modifications are applied in order to circumvent the problem of the autocorrelation dominating river discharge time-series and in order to create a benchmark model enabling the decision makers to evaluate the forecast quality and the model quality. Although the performance period is quite short the advantage of a simple cost-loss function as a measure of forecast quality can be demonstrated.
Resumo:
In this paper we introduce a new testing procedure for evaluating the rationality of fixed-event forecasts based on a pseudo-maximum likelihood estimator. The procedure is designed to be robust to departures in the normality assumption. A model is introduced to show that such departures are likely when forecasters experience a credibility loss when they make large changes to their forecasts. The test is illustrated using monthly fixed-event forecasts produced by four UK institutions. Use of the robust test leads to the conclusion that certain forecasts are rational while use of the Gaussian-based test implies that certain forecasts are irrational. The difference in the results is due to the nature of the underlying data. Copyright © 2001 John Wiley & Sons, Ltd.
Resumo:
Many studies evaluating model boundary-layer schemes focus either on near-surface parameters or on short-term observational campaigns. This reflects the observational datasets that are widely available for use in model evaluation. In this paper we show how surface and long-term Doppler lidar observations, combined in a way to match model representation of the boundary layer as closely as possible, can be used to evaluate the skill of boundary-layer forecasts. We use a 2-year observational dataset from a rural site in the UK to evaluate a climatology of boundary layer type forecast by the UK Met Office Unified Model. In addition, we demonstrate the use of a binary skill score (Symmetric Extremal Dependence Index) to investigate the dependence of forecast skill on season, horizontal resolution and forecast leadtime. A clear diurnal and seasonal cycle can be seen in the climatology of both the model and observations, with the main discrepancies being the model overpredicting cumulus capped and decoupled stratocumulus capped boundary-layers and underpredicting well mixed boundary-layers. Using the SEDI skill score the model is most skillful at predicting the surface stability. The skill of the model in predicting cumulus capped and stratocumulus capped stable boundary layer forecasts is low but greater than a 24 hr persistence forecast. In contrast, the prediction of decoupled boundary-layers and boundary-layers with multiple cloud layers is lower than persistence. This process based evaluation approach has the potential to be applied to other boundary-layer parameterisation schemes with similar decision structures.
Resumo:
A parametrization for ice supersaturation is introduced into the ECMWF Integrated Forecast System (IFS), compatible with the cloud scheme that allows partial cloud coverage. It is based on the simple, but often justifiable, diagnostic assumption that the ice nucleation and subsequent depositional growth time-scales are short compared to the model time step, thus supersaturation is only permitted in the clear-sky portion of the grid cell. Results from model integrations using the new scheme are presented, which is demonstrated to increase upper-tropospheric humidity, decrease high-level cloud cover and, to a much lesser extent, cloud ice amounts, all as expected from simple arguments. Evaluation of the relative distribution of supersaturated humidity amounts shows good agreement with the observed climatology derived from in situ aircraft observations. With the new scheme, the global distribution of frequency of occurrence of supersaturated regions compares well with remotely sensed microwave limb sounder (MLS) data, with the most marked errors of underprediction occurring in regions where the model is known to underpredict deep convection. Finally, it is also demonstrated that the new scheme leads to improved predictions of permanent contrail cloud over southern England, which indirectly implies upper-tropospheric humidity fields are better represented for this region.
Resumo:
This paper analyses historic records of agricultural land use and management for England and Wales from 1931 and 1991 and uses export coefficient modelling to hindcast the impact of these practices on the rates of diffuse nitrogen (N) and phosphorus (P) export to water bodies for each of the major geo-climatic regions of England and Wales. Key trends indicate the importance of animal agriculture as a contributor to the total diffuse agricultural nutrient loading on waters, and the need to bring these sources under control if conditions suitable for sustaining 'Good Ecological Status' under the Water Framework Directive are to be generated. The analysis highlights the importance of measuring changes in nutrient loading in relation to the catchment-specific baseline state for different water bodies. The approach is also used to forecast the likely impact of broad regional scale scenarios on nutrient export to waters and highlights the need to take sensitive land out of production, introduce ceilings on fertilizer use and stocking densities, and controls on agricultural practice in higher risk areas where intensive agriculture is combined with a low intrinsic nutrient retention capacity, although the uncertainties associated with the modelling applied at this scale should be taken into account in the interpretation of model output. The paper advocates the need for a two-tiered approach to nutrient management, combining broad regional policies with targeted management in high risk areas at the catchment and farm scale.
Resumo:
Recent observations from the Argo dataset of temperature and salinity profiles are used to evaluate a series of 3-year data assimilation experiments in a global ice–ocean general circulation model. The experiments are designed to evaluate a new data assimilation system whereby salinity is assimilated along isotherms, S(T ). In addition, the role of a balancing salinity increment to maintain water mass properties is investigated. This balancing increment is found to effectively prevent spurious mixing in tropical regions induced by univariate temperature assimilation, allowing the correction of isotherm geometries without adversely influencing temperature–salinity relationships. In addition, the balancing increment is able to correct a fresh bias associated with a weak subtropical gyre in the North Atlantic using only temperature observations. The S(T ) assimilation method is found to provide an important improvement over conventional depth level assimilation, with lower root-mean-squared forecast errors over the upper 500 m in the tropical Atlantic and Pacific Oceans. An additional set of experiments is performed whereby Argo data are withheld and used for independent evaluation. The most significant improvements from Argo assimilation are found in less well-observed regions (Indian, South Atlantic and South Pacific Oceans). When Argo salinity data are assimilated in addition to temperature, improvements to modelled temperature fields are obtained due to corrections to model density gradients and the resulting circulation. It is found that observations from the Argo array provide an invaluable tool for both correcting modelled water mass properties through data assimilation and for evaluating the assimilation methods themselves.
Resumo:
Given the significance of forecasting in real estate investment decisions, this paper investigates forecast uncertainty and disagreement in real estate market forecasts. It compares the performance of real estate forecasters with non-real estate forecasters. Using the Investment Property Forum (IPF) quarterly survey amongst UK independent real estate forecasters and a similar survey of macro-economic and capital market forecasters, these forecasts are compared with actual performance to assess a number of forecasting issues in the UK over 1999-2004, including forecast error, bias and consensus. The results suggest that both groups are biased, less volatile compared to market returns and inefficient in that forecast errors tend to persist. The strongest finding is that forecasters display the characteristics associated with a consensus indicating herding.
Resumo:
Several methods are examined which allow to produce forecasts for time series in the form of probability assignments. The necessary concepts are presented, addressing questions such as how to assess the performance of a probabilistic forecast. A particular class of models, cluster weighted models (CWMs), is given particular attention. CWMs, originally proposed for deterministic forecasts, can be employed for probabilistic forecasting with little modification. Two examples are presented. The first involves estimating the state of (numerically simulated) dynamical systems from noise corrupted measurements, a problem also known as filtering. There is an optimal solution to this problem, called the optimal filter, to which the considered time series models are compared. (The optimal filter requires the dynamical equations to be known.) In the second example, we aim at forecasting the chaotic oscillations of an experimental bronze spring system. Both examples demonstrate that the considered time series models, and especially the CWMs, provide useful probabilistic information about the underlying dynamical relations. In particular, they provide more than just an approximation to the conditional mean.