155 resultados para Model evaluation
em CentAUR: Central Archive University of Reading - UK
Resumo:
Previous assessments of the impacts of climate change on heat-related mortality use the "delta method" to create temperature projection time series that are applied to temperature-mortality models to estimate future mortality impacts. The delta method means that climate model bias in the modelled present does not influence the temperature projection time series and impacts. However, the delta method assumes that climate change will result only in a change in the mean temperature but there is evidence that there will also be changes in the variability of temperature with climate change. The aim of this paper is to demonstrate the importance of considering changes in temperature variability with climate change in impacts assessments of future heat-related mortality. We investigate future heatrelated mortality impacts in six cities (Boston, Budapest, Dallas, Lisbon, London and Sydney) by applying temperature projections from the UK Meteorological Office HadCM3 climate model to the temperature-mortality models constructed and validated in Part 1. We investigate the impacts for four cases based on various combinations of mean and variability changes in temperature with climate change. The results demonstrate that higher mortality is attributed to increases in the mean and variability of temperature with climate change rather than with the change in mean temperature alone. This has implications for interpreting existing impacts estimates that have used the delta method. We present a novel method for the creation of temperature projection time series that includes changes in the mean and variability of temperature with climate change and is not influenced by climate model bias in the modelled present. The method should be useful for future impacts assessments. Few studies consider the implications that the limitations of the climate model may have on the heatrelated mortality impacts. Here, we demonstrate the importance of considering this by conducting an evaluation of the daily and extreme temperatures from HadCM3, which demonstrates that the estimates of future heat-related mortality for Dallas and Lisbon may be overestimated due to positive climate model bias. Likewise, estimates for Boston and London may be underestimated due to negative climate model bias. Finally, we briefly consider uncertainties in the impacts associated with greenhouse gas emissions and acclimatisation. The uncertainties in the mortality impacts due to different emissions scenarios of greenhouse gases in the future varied considerably by location. Allowing for acclimatisation to an extra 2°C in mean temperatures reduced future heat-related mortality by approximately half that of no acclimatisation in each city.
Resumo:
As the calibration and evaluation of flood inundation models are a prerequisite for their successful application, there is a clear need to ensure that the performance measures that quantify how well models match the available observations are fit for purpose. This paper evaluates the binary pattern performance measures that are frequently used to compare flood inundation models with observations of flood extent. This evaluation considers whether these measures are able to calibrate and evaluate model predictions in a credible and consistent way, i.e. identifying the underlying model behaviour for a number of different purposes such as comparing models of floods of different magnitudes or on different catchments. Through theoretical examples, it is shown that the binary pattern measures are not consistent for floods of different sizes, such that for the same vertical error in water level, a model of a flood of large magnitude appears to perform better than a model of a smaller magnitude flood. Further, the commonly used Critical Success Index (usually referred to as F<2 >) is biased in favour of overprediction of the flood extent, and is also biased towards correctly predicting areas of the domain with smaller topographic gradients. Consequently, it is recommended that future studies consider carefully the implications of reporting conclusions using these performance measures. Additionally, future research should consider whether a more robust and consistent analysis could be achieved by using elevation comparison methods instead.
Resumo:
Atmospheric pollution over South Asia attracts special attention due to its effects on regional climate, water cycle and human health. These effects are potentially growing owing to rising trends of anthropogenic aerosol emissions. In this study, the spatio-temporal aerosol distributions over South Asia from seven global aerosol models are evaluated against aerosol retrievals from NASA satellite sensors and ground-based measurements for the period of 2000–2007. Overall, substantial underestimations of aerosol loading over South Asia are found systematically in most model simulations. Averaged over the entire South Asia, the annual mean aerosol optical depth (AOD) is underestimated by a range 15 to 44% across models compared to MISR (Multi-angle Imaging SpectroRadiometer), which is the lowest bound among various satellite AOD retrievals (from MISR, SeaWiFS (Sea-Viewing Wide Field-of-View Sensor), MODIS (Moderate Resolution Imaging Spectroradiometer) Aqua and Terra). In particular during the post-monsoon and wintertime periods (i.e., October–January), when agricultural waste burning and anthropogenic emissions dominate, models fail to capture AOD and aerosol absorption optical depth (AAOD) over the Indo–Gangetic Plain (IGP) compared to ground-based Aerosol Robotic Network (AERONET) sunphotometer measurements. The underestimations of aerosol loading in models generally occur in the lower troposphere (below 2 km) based on the comparisons of aerosol extinction profiles calculated by the models with those from Cloud–Aerosol Lidar with Orthogonal Polarization (CALIOP) data. Furthermore, surface concentrations of all aerosol components (sulfate, nitrate, organic aerosol (OA) and black carbon (BC)) from the models are found much lower than in situ measurements in winter. Several possible causes for these common problems of underestimating aerosols in models during the post-monsoon and wintertime periods are identified: the aerosol hygroscopic growth and formation of secondary inorganic aerosol are suppressed in the models because relative humidity (RH) is biased far too low in the boundary layer and thus foggy conditions are poorly represented in current models, the nitrate aerosol is either missing or inadequately accounted for, and emissions from agricultural waste burning and biofuel usage are too low in the emission inventories. These common problems and possible causes found in multiple models point out directions for future model improvements in this important region.
Resumo:
The concentrations of sulfate, black carbon (BC) and other aerosols in the Arctic are characterized by high values in late winter and spring (so-called Arctic Haze) and low values in summer. Models have long been struggling to capture this seasonality and especially the high concentrations associated with Arctic Haze. In this study, we evaluate sulfate and BC concentrations from eleven different models driven with the same emission inventory against a comprehensive pan-Arctic measurement data set over a time period of 2 years (2008–2009). The set of models consisted of one Lagrangian particle dispersion model, four chemistry transport models (CTMs), one atmospheric chemistry-weather forecast model and five chemistry climate models (CCMs), of which two were nudged to meteorological analyses and three were running freely. The measurement data set consisted of surface measurements of equivalent BC (eBC) from five stations (Alert, Barrow, Pallas, Tiksi and Zeppelin), elemental carbon (EC) from Station Nord and Alert and aircraft measurements of refractory BC (rBC) from six different campaigns. We find that the models generally captured the measured eBC or rBC and sulfate concentrations quite well, compared to previous comparisons. However, the aerosol seasonality at the surface is still too weak in most models. Concentrations of eBC and sulfate averaged over three surface sites are underestimated in winter/spring in all but one model (model means for January–March underestimated by 59 and 37 % for BC and sulfate, respectively), whereas concentrations in summer are overestimated in the model mean (by 88 and 44 % for July–September), but with overestimates as well as underestimates present in individual models. The most pronounced eBC underestimates, not included in the above multi-site average, are found for the station Tiksi in Siberia where the measured annual mean eBC concentration is 3 times higher than the average annual mean for all other stations. This suggests an underestimate of BC sources in Russia in the emission inventory used. Based on the campaign data, biomass burning was identified as another cause of the modeling problems. For sulfate, very large differences were found in the model ensemble, with an apparent anti-correlation between modeled surface concentrations and total atmospheric columns. There is a strong correlation between observed sulfate and eBC concentrations with consistent sulfate/eBC slopes found for all Arctic stations, indicating that the sources contributing to sulfate and BC are similar throughout the Arctic and that the aerosols are internally mixed and undergo similar removal. However, only three models reproduced this finding, whereas sulfate and BC are weakly correlated in the other models. Overall, no class of models (e.g., CTMs, CCMs) performed better than the others and differences are independent of model resolution.
Resumo:
[ 1] A rapid increase in the variety, quality, and quantity of observations in polar regions is leading to a significant improvement in the understanding of sea ice dynamic and thermodynamic processes and their representation in global climate models. We assess the simulation of sea ice in the new Hadley Centre Global Environmental Model (HadGEM1) against the latest available observations. The HadGEM1 sea ice component uses elastic-viscous-plastic dynamics, multiple ice thickness categories, and zero-layer thermodynamics. The model evaluation is focused on the mean state of the key variables of ice concentration, thickness, velocity, and albedo. The model shows good agreement with observational data sets. The variability of the ice forced by the North Atlantic Oscillation is also found to agree with observations.
Resumo:
Many studies evaluating model boundary-layer schemes focus either on near-surface parameters or on short-term observational campaigns. This reflects the observational datasets that are widely available for use in model evaluation. In this paper we show how surface and long-term Doppler lidar observations, combined in a way to match model representation of the boundary layer as closely as possible, can be used to evaluate the skill of boundary-layer forecasts. We use a 2-year observational dataset from a rural site in the UK to evaluate a climatology of boundary layer type forecast by the UK Met Office Unified Model. In addition, we demonstrate the use of a binary skill score (Symmetric Extremal Dependence Index) to investigate the dependence of forecast skill on season, horizontal resolution and forecast leadtime. A clear diurnal and seasonal cycle can be seen in the climatology of both the model and observations, with the main discrepancies being the model overpredicting cumulus capped and decoupled stratocumulus capped boundary-layers and underpredicting well mixed boundary-layers. Using the SEDI skill score the model is most skillful at predicting the surface stability. The skill of the model in predicting cumulus capped and stratocumulus capped stable boundary layer forecasts is low but greater than a 24 hr persistence forecast. In contrast, the prediction of decoupled boundary-layers and boundary-layers with multiple cloud layers is lower than persistence. This process based evaluation approach has the potential to be applied to other boundary-layer parameterisation schemes with similar decision structures.
Resumo:
An automatic nonlinear predictive model-construction algorithm is introduced based on forward regression and the predicted-residual-sums-of-squares (PRESS) statistic. The proposed algorithm is based on the fundamental concept of evaluating a model's generalisation capability through crossvalidation. This is achieved by using the PRESS statistic as a cost function to optimise model structure. In particular, the proposed algorithm is developed with the aim of achieving computational efficiency, such that the computational effort, which would usually be extensive in the computation of the PRESS statistic, is reduced or minimised. The computation of PRESS is simplified by avoiding a matrix inversion through the use of the orthogonalisation procedure inherent in forward regression, and is further reduced significantly by the introduction of a forward-recursive formula. Based on the properties of the PRESS statistic, the proposed algorithm can achieve a fully automated procedure without resort to any other validation data set for iterative model evaluation. Numerical examples are used to demonstrate the efficacy of the algorithm.
Resumo:
The climate and natural variability of the large-scale stratospheric circulation simulated by a newly developed general circulation model are evaluated against available global observations. The simulation consisted of a 30-year annual cycle integration performed with a comprehensive model of the troposphere and stratosphere. The observations consisted of a 15-year dataset from global operational analyses of the troposphere and stratosphere. The model evaluation concentrates on the simulation of the evolution of the extratropical stratospheric circulation in both hemispheres. The December–February climatology of the observed zonal mean winter circulation is found to be reasonably well captured by the model, although in the Northern Hemisphere upper stratosphere the simulated westerly winds are systematically stronger and a cold bias is apparent in the polar stratosphere. This Northern Hemisphere stratospheric cold bias virtually disappears during spring (March–May), consistent with a realistic simulation of the spring weakening of the mean westerly winds in the model. A considerable amount of monthly interannual variability is also found in the simulation in the Northern Hemisphere in late winter and early spring. The simulated interannual variability is predominantly caused by polar warmings of the stratosphere, in agreement with observations. The breakdown of the Northern Hemisphere stratospheric polar vortex appears therefore to occur in a realistic way in the model. However, in early winter the model severely underestimates the interannual variability, especially in the upper troposphere. The Southern Hemisphere winter (June–August) zonal mean temperature is systematically colder in the model, and the simulated winds are somewhat too strong in the upper stratosphere. Contrary to the results for the Northern Hemisphere spring, this model cold bias worsens during the Southern Hemisphere spring (September–November). Significant discrepancies between the model results and the observations are therefore found during the breakdown of the Southern Hemisphere polar vortex. For instance, the simulated Southern Hemisphere stratosphere westerly jet continuously decreases in intensity more or less in situ from June to November, while the observed stratospheric jet moves downward and poleward.
Resumo:
Earth system models are increasing in complexity and incorporating more processes than their predecessors, making them important tools for studying the global carbon cycle. However, their coupled behaviour has only recently been examined in any detail, and has yielded a very wide range of outcomes, with coupled climate-carbon cycle models that represent land-use change simulating total land carbon stores by 2100 that vary by as much as 600 Pg C given the same emissions scenario. This large uncertainty is associated with differences in how key processes are simulated in different models, and illustrates the necessity of determining which models are most realistic using rigorous model evaluation methodologies. Here we assess the state-of-the-art with respect to evaluation of Earth system models, with a particular emphasis on the simulation of the carbon cycle and associated biospheric processes. We examine some of the new advances and remaining uncertainties relating to (i) modern and palaeo data and (ii) metrics for evaluation, and discuss a range of strategies, such as the inclusion of pre-calibration, combined process- and system-level evaluation, and the use of emergent constraints, that can contribute towards the development of more robust evaluation schemes. An increasingly data-rich environment offers more opportunities for model evaluation, but it is also a challenge, as more knowledge about data uncertainties is required in order to determine robust evaluation methodologies that move the field of ESM evaluation from "beauty contest" toward the development of useful constraints on model behaviour.
Resumo:
We propose a new sparse model construction method aimed at maximizing a model’s generalisation capability for a large class of linear-in-the-parameters models. The coordinate descent optimization algorithm is employed with a modified l1- penalized least squares cost function in order to estimate a single parameter and its regularization parameter simultaneously based on the leave one out mean square error (LOOMSE). Our original contribution is to derive a closed form of optimal LOOMSE regularization parameter for a single term model, for which we show that the LOOMSE can be analytically computed without actually splitting the data set leading to a very simple parameter estimation method. We then integrate the new results within the coordinate descent optimization algorithm to update model parameters one at the time for linear-in-the-parameters models. Consequently a fully automated procedure is achieved without resort to any other validation data set for iterative model evaluation. Illustrative examples are included to demonstrate the effectiveness of the new approaches.
Resumo:
Earth system models (ESMs) are increasing in complexity by incorporating more processes than their predecessors, making them potentially important tools for studying the evolution of climate and associated biogeochemical cycles. However, their coupled behaviour has only recently been examined in any detail, and has yielded a very wide range of outcomes. For example, coupled climate–carbon cycle models that represent land-use change simulate total land carbon stores at 2100 that vary by as much as 600 Pg C, given the same emissions scenario. This large uncertainty is associated with differences in how key processes are simulated in different models, and illustrates the necessity of determining which models are most realistic using rigorous methods of model evaluation. Here we assess the state-of-the-art in evaluation of ESMs, with a particular emphasis on the simulation of the carbon cycle and associated biospheric processes. We examine some of the new advances and remaining uncertainties relating to (i) modern and palaeodata and (ii) metrics for evaluation. We note that the practice of averaging results from many models is unreliable and no substitute for proper evaluation of individual models. We discuss a range of strategies, such as the inclusion of pre-calibration, combined process- and system-level evaluation, and the use of emergent constraints, that can contribute to the development of more robust evaluation schemes. An increasingly data-rich environment offers more opportunities for model evaluation, but also presents a challenge. Improved knowledge of data uncertainties is still necessary to move the field of ESM evaluation away from a "beauty contest" towards the development of useful constraints on model outcomes.
Resumo:
Population modelling is increasingly recognised as a useful tool for pesticide risk assessment. For vertebrates that may ingest pesticides with their food, such as woodpigeon (Columba palumbus), population models that simulate foraging behaviour explicitly can help predicting both exposure and population-level impact. Optimal foraging theory is often assumed to explain the individual-level decisions driving distributions of individuals in the field, but it may not adequately predict spatial and temporal characteristics of woodpigeon foraging because of the woodpigeons’ excellent memory, ability to fly long distances, and distinctive flocking behaviour. Here we present an individual-based model (IBM) of the woodpigeon. We used the model to predict distributions of foraging woodpigeons that use one of six alternative foraging strategies: optimal foraging, memory-based foraging and random foraging, each with or without flocking mechanisms. We used pattern-oriented modelling to determine which of the foraging strategies is best able to reproduce observed data patterns. Data used for model evaluation were gathered during a long-term woodpigeon study conducted between 1961 and 2004 and a radiotracking study conducted in 2003 and 2004, both in the UK, and are summarised here as three complex patterns: the distributions of foraging birds between vegetation types during the year, the number of fields visited daily by individuals, and the proportion of fields revisited by them on subsequent days. The model with a memory-based foraging strategy and a flocking mechanism was the only one to reproduce these three data patterns, and the optimal foraging model produced poor matches to all of them. The random foraging strategy reproduced two of the three patterns but was not able to guarantee population persistence. We conclude that with the memory-based foraging strategy including a flocking mechanism our model is realistic enough to estimate the potential exposure of woodpigeons to pesticides. We discuss how exposure can be linked to our model, and how the model could be used for risk assessment of pesticides, for example predicting exposure and effects in heterogeneous landscapes planted seasonally with a variety of crops, while accounting for differences in land use between landscapes.
Resumo:
This paper evaluates the current status of global modeling of the organic aerosol (OA) in the troposphere and analyzes the differences between models as well as between models and observations. Thirty-one global chemistry transport models (CTMs) and general circulation models (GCMs) have participated in this intercomparison, in the framework of AeroCom phase II. The simulation of OA varies greatly between models in terms of the magnitude of primary emissions, secondary OA (SOA) formation, the number of OA species used (2 to 62), the complexity of OA parameterizations (gas-particle partitioning, chemical aging, multiphase chemistry, aerosol microphysics), and the OA physical, chemical and optical properties. The diversity of the global OA simulation results has increased since earlier AeroCom experiments, mainly due to the increasing complexity of the SOA parameterization in models, and the implementation of new, highly uncertain, OA sources. Diversity of over one order of magnitude exists in the modeled vertical distribution of OA concentrations that deserves a dedicated future study. Furthermore, although the OA / OC ratio depends on OA sources and atmospheric processing, and is important for model evaluation against OA and OC observations, it is resolved only by a few global models. The median global primary OA (POA) source strength is 56 Tg a−1 (range 34–144 Tg a−1) and the median SOA source strength (natural and anthropogenic) is 19 Tg a−1 (range 13–121 Tg a−1). Among the models that take into account the semi-volatile SOA nature, the median source is calculated to be 51 Tg a−1 (range 16–121 Tg a−1), much larger than the median value of the models that calculate SOA in a more simplistic way (19 Tg a−1; range 13–20 Tg a−1, with one model at 37 Tg a−1). The median atmospheric burden of OA is 1.4 Tg (24 models in the range of 0.6–2.0 Tg and 4 between 2.0 and 3.8 Tg), with a median OA lifetime of 5.4 days (range 3.8–9.6 days). In models that reported both OA and sulfate burdens, the median value of the OA/sulfate burden ratio is calculated to be 0.77; 13 models calculate a ratio lower than 1, and 9 models higher than 1. For 26 models that reported OA deposition fluxes, the median wet removal is 70 Tg a−1 (range 28–209 Tg a−1), which is on average 85% of the total OA deposition. Fine aerosol organic carbon (OC) and OA observations from continuous monitoring networks and individual field campaigns have been used for model evaluation. At urban locations, the model–observation comparison indicates missing knowledge on anthropogenic OA sources, both strength and seasonality. The combined model–measurements analysis suggests the existence of increased OA levels during summer due to biogenic SOA formation over large areas of the USA that can be of the same order of magnitude as the POA, even at urban locations, and contribute to the measured urban seasonal pattern. Global models are able to simulate the high secondary character of OA observed in the atmosphere as a result of SOA formation and POA aging, although the amount of OA present in the atmosphere remains largely underestimated, with a mean normalized bias (MNB) equal to −0.62 (−0.51) based on the comparison against OC (OA) urban data of all models at the surface, −0.15 (+0.51) when compared with remote measurements, and −0.30 for marine locations with OC data. The mean temporal correlations across all stations are low when compared with OC (OA) measurements: 0.47 (0.52) for urban stations, 0.39 (0.37) for remote stations, and 0.25 for marine stations with OC data. The combination of high (negative) MNB and higher correlation at urban stations when compared with the low MNB and lower correlation at remote sites suggests that knowledge about the processes that govern aerosol processing, transport and removal, on top of their sources, is important at the remote stations. There is no clear change in model skill with increasing model complexity with regard to OC or OA mass concentration. However, the complexity is needed in models in order to distinguish between anthropogenic and natural OA as needed for climate mitigation, and to calculate the impact of OA on climate accurately.
Resumo:
Mechanistic catchment-scale phosphorus models appear to perform poorly where diffuse sources dominate. We investigate the reasons for this for one model, INCA-P, testing model output against 18 months of daily data in a small Scottish catchment. We examine key model processes and provide recommendations for model improvement and simplification. Improvements to the particulate phosphorus simulation are especially needed. The model evaluation procedure is then generalised to provide a checklist for identifying why model performance may be poor or unreliable, incorporating calibration, data, structural and conceptual challenges. There needs to be greater recognition that current models struggle to produce positive Nash–Sutcliffe statistics in agricultural catchments when evaluated against daily data. Phosphorus modelling is difficult, but models are not as useless as this might suggest. We found a combination of correlation coefficients, bias, a comparison of distributions and a visual assessment of time series a better means of identifying realistic simulations.