880 resultados para Evaluation models
Resumo:
Many urban surface energy balance models now exist. These vary in complexity from simple schemes that represent the city as a concrete slab, to those which incorporate detailed representations of momentum and energy fluxes distributed within the atmospheric boundary layer. While many of these schemes have been evaluated against observations, with some models even compared with the same data sets, such evaluations have not been undertaken in a controlled manner to enable direct comparison. For other types of climate model, for instance the Project for Intercomparison of Land-Surface Parameterization Schemes (PILPS) experiments (Henderson-Sellers et al., 1993), such controlled comparisons have been shown to provide important insights into both the mechanics of the models and the physics of the real world. This paper describes the progress that has been made to date on a systematic and controlled comparison of urban surface schemes. The models to be considered, and their key attributes, are described, along with the methodology to be used for the evaluation.
Resumo:
This paper presents single-column model (SCM) simulations of a tropical squall-line case observed during the Coupled Ocean-Atmosphere Response Experiment of the Tropical Ocean/Global Atmosphere Programme. This case-study was part of an international model intercomparison project organized by Working Group 4 ‘Precipitating Convective Cloud Systems’ of the GEWEX (Global Energy and Water-cycle Experiment) Cloud System Study. Eight SCM groups using different deep-convection parametrizations participated in this project. The SCMs were forced by temperature and moisture tendencies that had been computed from a reference cloud-resolving model (CRM) simulation using open boundary conditions. The comparison of the SCM results with the reference CRM simulation provided insight into the ability of current convection and cloud schemes to represent organized convection. The CRM results enabled a detailed evaluation of the SCMs in terms of the thermodynamic structure and the convective mass flux of the system, the latter being closely related to the surface convective precipitation. It is shown that the SCMs could reproduce reasonably well the time evolution of the surface convective and stratiform precipitation, the convective mass flux, and the thermodynamic structure of the squall-line system. The thermodynamic structure simulated by the SCMs depended on how the models partitioned the precipitation between convective and stratiform. However, structural differences persisted in the thermodynamic profiles simulated by the SCMs and the CRM. These differences could be attributed to the fact that the total mass flux used to compute the SCM forcing differed from the convective mass flux. The SCMs could not adequately represent these organized mesoscale circulations and the microphysicallradiative forcing associated with the stratiform region. This issue is generally known as the ‘scale-interaction’ problem that can only be properly addressed in fully three-dimensional simulations. Sensitivity simulations run by several groups showed that the time evolution of the surface convective precipitation was considerably smoothed when the convective closure was based on convective available potential energy instead of moisture convergence. Finally, additional SCM simulations without using a convection parametrization indicated that the impact of a convection parametrization in forced SCM runs was more visible in the moisture profiles than in the temperature profiles because convective transport was particularly important in the moisture budget.
Resumo:
We consider the forecasting performance of two SETAR exchange rate models proposed by Kräger and Kugler [J. Int. Money Fin. 12 (1993) 195]. Assuming that the models are good approximations to the data generating process, we show that whether the non-linearities inherent in the data can be exploited to forecast better than a random walk depends on both how forecast accuracy is assessed and on the ‘state of nature’. Evaluation based on traditional measures, such as (root) mean squared forecast errors, may mask the superiority of the non-linear models. Generalized impulse response functions are also calculated as a means of portraying the asymmetric response to shocks implied by such models.
Resumo:
In this paper we discuss the current state-of-the-art in estimating, evaluating, and selecting among non-linear forecasting models for economic and financial time series. We review theoretical and empirical issues, including predictive density, interval and point evaluation and model selection, loss functions, data-mining, and aggregation. In addition, we argue that although the evidence in favor of constructing forecasts using non-linear models is rather sparse, there is reason to be optimistic. However, much remains to be done. Finally, we outline a variety of topics for future research, and discuss a number of areas which have received considerable attention in the recent literature, but where many questions remain.
Resumo:
As the calibration and evaluation of flood inundation models are a prerequisite for their successful application, there is a clear need to ensure that the performance measures that quantify how well models match the available observations are fit for purpose. This paper evaluates the binary pattern performance measures that are frequently used to compare flood inundation models with observations of flood extent. This evaluation considers whether these measures are able to calibrate and evaluate model predictions in a credible and consistent way, i.e. identifying the underlying model behaviour for a number of different purposes such as comparing models of floods of different magnitudes or on different catchments. Through theoretical examples, it is shown that the binary pattern measures are not consistent for floods of different sizes, such that for the same vertical error in water level, a model of a flood of large magnitude appears to perform better than a model of a smaller magnitude flood. Further, the commonly used Critical Success Index (usually referred to as F<2 >) is biased in favour of overprediction of the flood extent, and is also biased towards correctly predicting areas of the domain with smaller topographic gradients. Consequently, it is recommended that future studies consider carefully the implications of reporting conclusions using these performance measures. Additionally, future research should consider whether a more robust and consistent analysis could be achieved by using elevation comparison methods instead.
Resumo:
We present a benchmark system for global vegetation models. This system provides a quantitative evaluation of multiple simulated vegetation properties, including primary production; seasonal net ecosystem production; vegetation cover; composition and height; fire regime; and runoff. The benchmarks are derived from remotely sensed gridded datasets and site-based observations. The datasets allow comparisons of annual average conditions and seasonal and inter-annual variability, and they allow the impact of spatial and temporal biases in means and variability to be assessed separately. Specifically designed metrics quantify model performance for each process, and are compared to scores based on the temporal or spatial mean value of the observations and a "random" model produced by bootstrap resampling of the observations. The benchmark system is applied to three models: a simple light-use efficiency and water-balance model (the Simple Diagnostic Biosphere Model: SDBM), the Lund-Potsdam-Jena (LPJ) and Land Processes and eXchanges (LPX) dynamic global vegetation models (DGVMs). In general, the SDBM performs better than either of the DGVMs. It reproduces independent measurements of net primary production (NPP) but underestimates the amplitude of the observed CO2 seasonal cycle. The two DGVMs show little difference for most benchmarks (including the inter-annual variability in the growth rate and seasonal cycle of atmospheric CO2), but LPX represents burnt fraction demonstrably more accurately. Benchmarking also identified several weaknesses common to both DGVMs. The benchmarking system provides a quantitative approach for evaluating how adequately processes are represented in a model, identifying errors and biases, tracking improvements in performance through model development, and discriminating among models. Adoption of such a system would do much to improve confidence in terrestrial model predictions of climate change impacts and feedbacks.
Resumo:
We have used the BIOME4 biogeography–biochemistry model and comparison with palaeovegetation data to evaluate the response of six ocean–atmosphere general circulation models to mid-Holocene changes in orbital forcing in the mid- to high-latitudes of the northern hemisphere. All the models produce: (a) a northward shift of the northern limit of boreal forest, in response to simulated summer warming in high-latitudes. The northward shift is markedly asymmetric, with larger shifts in Eurasia than in North America; (b) an expansion of xerophytic vegetation in mid-continental North America and Eurasia, in response to increased temperatures during the growing season; (c) a northward expansion of temperate forests in eastern North America, in response to simulated winter warming. The northward shift of the northern limit of boreal forest and the northward expansion of temperate forests in North America are supported by palaeovegetation data. The expansion of xerophytic vegetation in mid-continental North America is consistent with palaeodata, although the extent may be over-estimated. The simulated expansion of xerophytic vegetation in Eurasia is not supported by the data. Analysis of an asynchronous coupling of one model to an equilibrium-vegetation model suggests vegetation feedback exacerbates this mid-continental drying and produces conditions more unlike the observations. Not all features of the simulations are robust: some models produce winter warming over Europe while others produce winter cooling. As a result, some models show a northward shift of temperate forests (consistent with, though less marked than, the expansion shown by data) and others produce a reduction in temperate forests. Elucidation of the cause of such differences is a focus of the current phase of the Palaeoclimate Modelling Intercomparison Project.
Resumo:
Urban land surface models (LSM) are commonly evaluated for short periods (a few weeks to months) because of limited observational data. This makes it difficult to distinguish the impact of initial conditions on model performance or to consider the response of a model to a range of possible atmospheric conditions. Drawing on results from the first urban LSM comparison, these two issues are considered. Assessment shows that the initial soil moisture has a substantial impact on the performance. Models initialised with soils that are too dry are not able to adjust their surface sensible and latent heat fluxes to realistic values until there is sufficient rainfall. Models initialised with too wet soils are not able to restrict their evaporation appropriately for periods in excess of a year. This has implications for short term evaluation studies and implies the need for soil moisture measurements to improve data assimilation and model initialisation. In contrast, initial conditions influencing the thermal storage have a much shorter adjustment timescale compared to soil moisture. Most models partition too much of the radiative energy at the surface into the sensible heat flux at the probable expense of the net storage heat flux.
Resumo:
Runoff fields over northern Africa (10–25°N, 20°W–30°E) derived from 17 atmospheric general circulation models driven by identical 6 ka BP orbital forcing, sea surface temperatures, and CO2 concentration have been analyzed using a hydrological routing scheme (HYDRA) to simulate changes in lake area. The AGCM-simulated runoff produced six-fold differences in simulated lake area between models, although even the largest simulated changes considerably underestimate the observed changes in lake area during the mid-Holocene. The inter-model differences in simulated lake area are largely due to differences in simulated runoff (the squared correlation coefficient, R2, is 0.84). Most of these differences can be attributed to differences in the simulated precipitation (R2=0.83). The higher correlation between runoff and simulated lake area (R2=0.92) implies that simulated differences in evaporation have a contributory effect. When runoff is calculated using an offline land-surface scheme (BIOME3), the correlation between runoff and simulated lake area is (R2=0.94). Finally, the spatial distribution of simulated precipitation can exert an important control on the overall response.
Resumo:
Variations in lake area and depth reflect climatically induced changes in the water balance of overflowing as well as closed lakes. A new global data base of lake status has been assembled, and is used to compare two simulations for 6 ka (6000 yr ago) made with successive R15 versions of the NCAR Community Climate Model (CCM). Simulated water balance was expressed as anomalies of annual precipitation minus evaporation (P-E); observed water balance as anomalies of lake status. Comparisons were made visually, by comparing regional averages, and by a statistic that compares the signs of simulated P-E anomalies (smoothly interpolated to the lake sites) with the status anomalies. Both CCM0 and CCM1 showed enhanced Northern-Hemisphere monsoons at 6 ka. Both underestimated the effect, but CCM1 fitted the spatial patterns better. In the northern mid- and high-latitudes the two versions differed more, and fitted the data less satisfactorily. CCM1 performed better than CCM0 in North America and central Eurasia, but not in Europe. Both models (especially CCM0) simulated excessive aridity in interior Eurasia. The models were systematically wrong in the southern mid-latitudes. Problems may have been caused by inadequate treatment of changes in sea-surface conditions in both models. Palaeolake status data will continue to provide a benchmark for the evaluation of modelling improvements.
Resumo:
The aim of this study was to investigate the effects of numerous milk compositional factors on milk coagulation properties using Partial Least Squares (PLS). Milk from herds of Jersey and Holstein-Friesian cattle was collected across the year and blended (n=55), to maximize variation in composition and coagulation. The milk was analysed for casein, protein, fat, titratable acidity, lactose, Ca2+, urea content, micelles size, fat globule size, somatic cell count and pH. Milk coagulation properties were defined as coagulation time, curd firmness and curd firmness rate measured by a controlled strain rheometer. The models derived from PLS had higher predictive power than previous models demonstrating the value of measuring more milk components. In addition to the well-established relationships with casein and protein levels, CMS and fat globule size were found to have as strong impact on all of the three models. The study also found a positive impact of fat on milk coagulation properties and a strong relationship between lactose and curd firmness, and urea and curd firmness rate, all of which warrant further investigation due to current lack of knowledge of the underlying mechanism. These findings demonstrate the importance of using a wider range of milk compositional variable for the prediction of the milk coagulation properties, and hence as indicators of milk suitability for cheese making.
Resumo:
Income growth in highly industrialised countries has resulted in consumer choice of foodstuffs no longer being primarily influenced by basic factors such as price and organoleptic features. From this perspective, the present study sets out to evaluate how and to what extent consumer choice is influenced by the possible negative effects on health and environment caused by the consumption of fruit containing deposits of pesticides and chemical products. The study describes the results of a survey which explores and estimates consumer willingness to pay in two forms: a yearly contribution for the abolition of the use of pesticides on fruit, and a premium price for organically grown apples guaranteed by a certified label. The same questionnaire was administered to two samples. The first was a conventional face-to-face survey of customers of large retail outlets located around Bologna (Italy); the second was an Internet sample. The discrete choice data were analysed by means of probit and tobit models to estimate the utility consumers attribute to organically grown fruit and to a pesticide ban. The research also addresses questions of validity and representativeness as a fundamental problem in web-based surveys.
Resumo:
Changes in the water balance of Eurasia and northern Africa in response to insolation forcing at 6000 y BP simulated by five atmospheric general circulation models have been compared with observations of changes in lake status. All of the simulations show enhancement of the Asian summer monsoon and of the high pressure cells over the Pacific and Central Asia and the Middle East, causing wetter conditions in northern India and southern China and drier conditions along the Chinese coast and west of the monsoon core. All of the models show enhancement of the African monsoon, causing wetter conditions in the zone between ca 10–20 °N. Four of the models show conditions wetter than present in southern Europe and drier than present in northern Europe. Three of the models show conditions similar to present in the mid-latitude continental interior, while the remaining models show conditions somewhat drier than present. The extent and location of each of the simulated changes varies between the models, as does the mechanism producing these changes. The lake data confirm some features of the simulations, but indicate discrepancies between observed and simulated climates. For example, the data show: (1) conditions wetter than present in central Asia, from India to northern China and Mongolia, indicating that the simulated Asian monsoon expansion is too small; (2) conditions wetter than present between ca. 10–30 °N in Africa, indicating that the simulated African monsoon expansion is too small; (3) that northern Europe was drier, but the area of significantly drier conditions was more localized (around the Baltic) than shown in the simulations; (4) that southern Europe was wetter than present, apparently consistent with the simulations, but pollen data suggest that this reflects an increase in summer rainfall whereas the models show winter precipitation, and (5) that the mid-latitude continental interior was generally wetter than present.
Resumo:
We analyse the spatial expression of seasonal climates of the Mediterranean and northern Africa in pre-industrial (piControl) and mid-Holocene (midHolocene, 6 yr BP) simulations from the fifth phase of the Coupled Model Intercomparison Project (CMIP5). Modern observations show four distinct precipitation regimes characterized by differences in the seasonal distribution and total amount of precipitation: an equatorial band characterized by a double peak in rainfall, the monsoon zone characterized by summer rainfall, the desert characterized by low seasonality and total precipitation, and the Mediterranean zone characterized by summer drought. Most models correctly simulate the position of the Mediterranean and the equatorial climates in the piControl simulations, but overestimate the extent of monsoon influence and underestimate the extent of desert. However, most models fail to reproduce the amount of precipitation in each zone. Model biases in the simulated magnitude of precipitation are unrelated to whether the models reproduce the correct spatial patterns of each regime. In the midHolocene, the models simulate a reduction in winter rainfall in the equatorial zone, and a northward expansion of the monsoon with a significant increase in summer and autumn rainfall. Precipitation is slightly increased in the desert, mainly in summer and autumn, with northward expansion of the monsoon. Changes in the Mediterranean are small, although there is an increase in spring precipitation consistent with palaeo-observations of increased growing-season rainfall. Comparison with reconstructions shows most models underestimate the mid-Holocene changes in annual precipitation, except in the equatorial zone. Biases in the piControl have only a limited influence on midHolocene anomalies in ocean–atmosphere models; carbon-cycle models show no relationship between piControl bias and midHolocene anomalies. Biases in the prediction of the midHolocene monsoon expansion are unrelated to how well the models simulate changes in Mediterranean climate.
Resumo:
The evaluation of forecast performance plays a central role both in the interpretation and use of forecast systems and in their development. Different evaluation measures (scores) are available, often quantifying different characteristics of forecast performance. The properties of several proper scores for probabilistic forecast evaluation are contrasted and then used to interpret decadal probability hindcasts of global mean temperature. The Continuous Ranked Probability Score (CRPS), Proper Linear (PL) score, and IJ Good’s logarithmic score (also referred to as Ignorance) are compared; although information from all three may be useful, the logarithmic score has an immediate interpretation and is not insensitive to forecast busts. Neither CRPS nor PL is local; this is shown to produce counter intuitive evaluations by CRPS. Benchmark forecasts from empirical models like Dynamic Climatology place the scores in context. Comparing scores for forecast systems based on physical models (in this case HadCM3, from the CMIP5 decadal archive) against such benchmarks is more informative than internal comparison systems based on similar physical simulation models with each other. It is shown that a forecast system based on HadCM3 out performs Dynamic Climatology in decadal global mean temperature hindcasts; Dynamic Climatology previously outperformed a forecast system based upon HadGEM2 and reasons for these results are suggested. Forecasts of aggregate data (5-year means of global mean temperature) are, of course, narrower than forecasts of annual averages due to the suppression of variance; while the average “distance” between the forecasts and a target may be expected to decrease, little if any discernible improvement in probabilistic skill is achieved.