86 resultados para Teachers evaluation performance
Resumo:
The solar and longwave environmental irradiance geometry (SOLWEIG) model simulates spatial variations of 3-D radiation fluxes and mean radiant temperature (T mrt) as well as shadow patterns in complex urban settings. In this paper, a new vegetation scheme is included in SOLWEIG and evaluated. The new shadow casting algorithm for complex vegetation structures makes it possible to obtain continuous images of shadow patterns and sky view factors taking both buildings and vegetation into account. For the calculation of 3-D radiation fluxes and T mrt, SOLWEIG only requires a limited number of inputs, such as global shortwave radiation, air temperature, relative humidity, geographical information (latitude, longitude and elevation) and urban geometry represented by high-resolution ground and building digital elevation models (DEM). Trees and bushes are represented by separate DEMs. The model is evaluated using 5 days of integral radiation measurements at two sites within a square surrounded by low-rise buildings and vegetation in Göteborg, Sweden (57°N). There is good agreement between modelled and observed values of T mrt, with an overall correspondence of R 2 = 0.91 (p < 0.01, RMSE = 3.1 K). A small overestimation of T mrt is found at locations shadowed by vegetation. Given this good performance a number of suggestions for future development are identified for applications which include for human comfort, building design, planning and evaluation of instrument exposure.
Resumo:
We describe here the development and evaluation of an Earth system model suitable for centennial-scale climate prediction. The principal new components added to the physical climate model are the terrestrial and ocean ecosystems and gas-phase tropospheric chemistry, along with their coupled interactions. The individual Earth system components are described briefly and the relevant interactions between the components are explained. Because the multiple interactions could lead to unstable feedbacks, we go through a careful process of model spin up to ensure that all components are stable and the interactions balanced. This spun-up configuration is evaluated against observed data for the Earth system components and is generally found to perform very satisfactorily. The reason for the evaluation phase is that the model is to be used for the core climate simulations carried out by the Met Office Hadley Centre for the Coupled Model Intercomparison Project (CMIP5), so it is essential that addition of the extra complexity does not detract substantially from its climate performance. Localised changes in some specific meteorological variables can be identified, but the impacts on the overall simulation of present day climate are slight. This model is proving valuable both for climate predictions, and for investigating the strengths of biogeochemical feedbacks.
Resumo:
Urbanization, the expansion of built-up areas, is an important yet less-studied aspect of land use/land cover change in climate science. To date, most global climate models used to evaluate effects of land use/land cover change on climate do not include an urban parameterization. Here, the authors describe the formulation and evaluation of a parameterization of urban areas that is incorporated into the Community Land Model, the land surface component of the Community Climate System Model. The model is designed to be simple enough to be compatible with structural and computational constraints of a land surface model coupled to a global climate model yet complex enough to explore physically based processes known to be important in determining urban climatology. The city representation is based upon the “urban canyon” concept, which consists of roofs, sunlit and shaded walls, and canyon floor. The canyon floor is divided into pervious (e.g., residential lawns, parks) and impervious (e.g., roads, parking lots, sidewalks) fractions. Trapping of longwave radiation by canyon surfaces and solar radiation absorption and reflection is determined by accounting for multiple reflections. Separate energy balances and surface temperatures are determined for each canyon facet. A one-dimensional heat conduction equation is solved numerically for a 10-layer column to determine conduction fluxes into and out of canyon surfaces. Model performance is evaluated against measured fluxes and temperatures from two urban sites. Results indicate the model does a reasonable job of simulating the energy balance of cities.
Resumo:
Providing probabilistic forecasts using Ensemble Prediction Systems has become increasingly popular in both the meteorological and hydrological communities. Compared to conventional deterministic forecasts, probabilistic forecasts may provide more reliable forecasts of a few hours to a number of days ahead, and hence are regarded as better tools for taking uncertainties into consideration and hedging against weather risks. It is essential to evaluate performance of raw ensemble forecasts and their potential values in forecasting extreme hydro-meteorological events. This study evaluates ECMWF’s medium-range ensemble forecasts of precipitation over the period 2008/01/01-2012/09/30 on a selected mid-latitude large scale river basin, the Huai river basin (ca. 270,000 km2) in central-east China. The evaluation unit is sub-basin in order to consider forecast performance in a hydrologically relevant way. The study finds that forecast performance varies with sub-basin properties, between flooding and non-flooding seasons, and with the forecast properties of aggregated time steps and lead times. Although the study does not evaluate any hydrological applications of the ensemble precipitation forecasts, its results have direct implications in hydrological forecasts should these ensemble precipitation forecasts be employed in hydrology.
Resumo:
As the calibration and evaluation of flood inundation models are a prerequisite for their successful application, there is a clear need to ensure that the performance measures that quantify how well models match the available observations are fit for purpose. This paper evaluates the binary pattern performance measures that are frequently used to compare flood inundation models with observations of flood extent. This evaluation considers whether these measures are able to calibrate and evaluate model predictions in a credible and consistent way, i.e. identifying the underlying model behaviour for a number of different purposes such as comparing models of floods of different magnitudes or on different catchments. Through theoretical examples, it is shown that the binary pattern measures are not consistent for floods of different sizes, such that for the same vertical error in water level, a model of a flood of large magnitude appears to perform better than a model of a smaller magnitude flood. Further, the commonly used Critical Success Index (usually referred to as F<2 >) is biased in favour of overprediction of the flood extent, and is also biased towards correctly predicting areas of the domain with smaller topographic gradients. Consequently, it is recommended that future studies consider carefully the implications of reporting conclusions using these performance measures. Additionally, future research should consider whether a more robust and consistent analysis could be achieved by using elevation comparison methods instead.
Resumo:
With a wide range of applications benefiting from dense network air temperature observations but with limitations of costs, existing siting guidelines and risk of damage to sensors, new methods are required to gain a high resolution understanding of the spatio-temporal patterns of urban meteorological phenomena such as the urban heat island or precision farming needs. With the launch of a new generation of low cost sensors it is possible to deploy a network to monitor air temperature at finer spatial resolutions. Here we investigate the Aginova Sentinel Micro (ASM) sensor with a bespoke radiation shield (together < US$150) which can provide secure near-real-time air temperature data to a server utilising existing (or user deployed) Wireless Fidelity (Wi-Fi) networks. This makes it ideally suited for deployment where wireless communications readily exist, notably urban areas. Assessment of the performance of the ASM relative to traceable standards in a water bath and atmospheric chamber show it to have good measurement accuracy with mean errors < ± 0.22 °C between -25 and 30 °C, with a time constant in ambient air of 110 ± 15 s. Subsequent field tests of it within the bespoke shield also had excellent performance (root-mean-square error = 0.13 °C) over a range of meteorological conditions relative to a traceable operational UK Met Office platinum resistance thermometer. These results indicate that the ASM and bespoke shield are more than fit-for-purpose for dense network deployment in urban areas at relatively low cost compared to existing observation techniques.
Resumo:
Urban land surface models (LSM) are commonly evaluated for short periods (a few weeks to months) because of limited observational data. This makes it difficult to distinguish the impact of initial conditions on model performance or to consider the response of a model to a range of possible atmospheric conditions. Drawing on results from the first urban LSM comparison, these two issues are considered. Assessment shows that the initial soil moisture has a substantial impact on the performance. Models initialised with soils that are too dry are not able to adjust their surface sensible and latent heat fluxes to realistic values until there is sufficient rainfall. Models initialised with too wet soils are not able to restrict their evaporation appropriately for periods in excess of a year. This has implications for short term evaluation studies and implies the need for soil moisture measurements to improve data assimilation and model initialisation. In contrast, initial conditions influencing the thermal storage have a much shorter adjustment timescale compared to soil moisture. Most models partition too much of the radiative energy at the surface into the sensible heat flux at the probable expense of the net storage heat flux.
Resumo:
Many of the next generation of global climate models will include aerosol schemes which explicitly simulate the microphysical processes that determine the particle size distribution. These models enable aerosol optical properties and cloud condensation nuclei (CCN) concentrations to be determined by fundamental aerosol processes, which should lead to a more physically based simulation of aerosol direct and indirect radiative forcings. This study examines the global variation in particle size distribution simulated by 12 global aerosol microphysics models to quantify model diversity and to identify any common biases against observations. Evaluation against size distribution measurements from a new European network of aerosol supersites shows that the mean model agrees quite well with the observations at many sites on the annual mean, but there are some seasonal biases common to many sites. In particular, at many of these European sites, the accumulation mode number concentration is biased low during winter and Aitken mode concentrations tend to be overestimated in winter and underestimated in summer. At high northern latitudes, the models strongly underpredict Aitken and accumulation particle concentrations compared to the measurements, consistent with previous studies that have highlighted the poor performance of global aerosol models in the Arctic. In the marine boundary layer, the models capture the observed meridional variation in the size distribution, which is dominated by the Aitken mode at high latitudes, with an increasing concentration of accumulation particles with decreasing latitude. Considering vertical profiles, the models reproduce the observed peak in total particle concentrations in the upper troposphere due to new particle formation, although modelled peak concentrations tend to be biased high over Europe. Overall, the multi-model-mean data set simulates the global variation of the particle size distribution with a good degree of skill, suggesting that most of the individual global aerosol microphysics models are performing well, although the large model diversity indicates that some models are in poor agreement with the observations. Further work is required to better constrain size-resolved primary and secondary particle number sources, and an improved understanding of nucleation and growth (e.g. the role of nitrate and secondary organics) will improve the fidelity of simulated particle size distributions.
Resumo:
Algorithms for computer-aided diagnosis of dementia based on structural MRI have demonstrated high performance in the literature, but are difficult to compare as different data sets and methodology were used for evaluation. In addition, it is unclear how the algorithms would perform on previously unseen data, and thus, how they would perform in clinical practice when there is no real opportunity to adapt the algorithm to the data at hand. To address these comparability, generalizability and clinical applicability issues, we organized a grand challenge that aimed to objectively compare algorithms based on a clinically representative multi-center data set. Using clinical practice as the starting point, the goal was to reproduce the clinical diagnosis. Therefore, we evaluated algorithms for multi-class classification of three diagnostic groups: patients with probable Alzheimer's disease, patients with mild cognitive impairment and healthy controls. The diagnosis based on clinical criteria was used as reference standard, as it was the best available reference despite its known limitations. For evaluation, a previously unseen test set was used consisting of 354 T1-weighted MRI scans with the diagnoses blinded. Fifteen research teams participated with a total of 29 algorithms. The algorithms were trained on a small training set (n = 30) and optionally on data from other sources (e.g., the Alzheimer's Disease Neuroimaging Initiative, the Australian Imaging Biomarkers and Lifestyle flagship study of aging). The best performing algorithm yielded an accuracy of 63.0% and an area under the receiver-operating-characteristic curve (AUC) of 78.8%. In general, the best performances were achieved using feature extraction based on voxel-based morphometry or a combination of features that included volume, cortical thickness, shape and intensity. The challenge is open for new submissions via the web-based framework: http://caddementia.grand-challenge.org.
Resumo:
The assessment of chess players is an increasingly attractive opportunity and an unfortunate necessity. The chess community needs to limit potential reputational damage by inhibiting cheating and unjustified accusations of cheating: there has been a recent rise in both. A number of counter-intuitive discoveries have been made by benchmarking the intrinsic merit of players’ moves: these call for further investigation. Is Capablanca actually, objectively the most accurate World Champion? Has ELO rating inflation not taken place? Stimulated by FIDE/ACP, we revisit the fundamentals of the subject to advance a framework suitable for improved standards of computational experiment and more precise results. Other domains look to chess as the demonstrator of good practice, including the rating of professionals making high-value decisions under pressure, personnel evaluation by Multichoice Assessment and the organization of crowd-sourcing in citizen science projects. The ‘3P’ themes of performance, prediction and profiling pervade all these domains.
Resumo:
There is increasing recognition that agricultural landscapes meet multiple societal needs and demands beyond provision of economic and environmental goods and services. Accordingly, there have been significant calls for the inclusion of societal, amenity and cultural values in agri-environmental landscape indicators to assist policy makers in monitoring the wider impacts of land-based policies. However, capturing the amenity and cultural values that rural agrarian areas provide, by use of such indicators, presents significant challenges. The EU social awareness of landscape indicator represents a new class of generalized social indicator using a top-down methodology to capture the social dimensions of landscape without reference to the specific structural and cultural characteristics of individual landscapes. This paper reviews this indicator in the context of existing agri-environmental indicators and their differing design concepts. Using a stakeholder consultation approach in five case study regions, the potential and limitations of the indicator are evaluated, with a particular focus on its perceived meaning, utility and performance in the context of different user groups and at different geographical scales. This analysis supplements previous EU-wide assessments, through regional scale assessment of the limitations and potentialities of the indicator and the need for further data collection. The evaluation finds that the perceived meaning of the indicator does not vary with scale, but in common with all mapped indicators, the usefulness of the indicator, to different user groups, does change with scale of presentation. This indicator is viewed as most useful when presented at the scale of governance at which end users operate. The relevance of the different sub-components of the indicator are also found to vary across regions.
Resumo:
Ground-based remote-sensing observations from Atmospheric Radiation Measurement (ARM) and Cloud-Net sites are used to evaluate the clouds predicted by a weather forecasting and climate model. By evaluating the cloud predictions using separate measures for the errors in frequency of occurrence, amount when present, and timing, we provide a detailed assessment of the model performance, which is relevant to weather and climate time-scales. Importantly, this methodology will be of great use when attempting to develop a cloud parametrization scheme, as it provides a clearer picture of the current deficiencies in the predicted clouds. Using the Met Office Unified Model, it is shown that when cloud fractions produced by a diagnostic and a prognostic cloud scheme are compared, the prognostic cloud scheme shows improvements to the biases in frequency of occurrence of low, medium and high cloud and to the frequency distributions of cloud amount when cloud is present. The mean cloud profiles are generally improved, although it is shown that in some cases the diagnostic scheme produced misleadingly good mean profiles as a result of compensating errors in frequency of occurrence and amount when present. Some biases remain when using the prognostic scheme, notably the underprediction of mean ice cloud fraction due to the amount when present being too low, and the overprediction of mean liquid cloud fraction due to the frequency of occurrence being too high.
Resumo:
We report a straightforward methodology for the fabrication of high-temperature thermoelectric (TE) modules using commercially available solder alloys and metal barriers. This methodology employs standard and accessible facilities that are simple to implement in any laboratory. A TE module formed by nine n-type Yb x Co4Sb12 and p-type Ce x Fe3CoSb12 state-of-the-art skutterudite material couples was fabricated. The physical properties of the synthesized skutterudites were determined, and the module power output, internal resistance, and thermocycling stability were evaluated in air. At a temperature difference of 365 K, the module provides more than 1.5 W cm−3 volume power density. However, thermocycling showed an increase of the internal module resistance and degradation in performance with the number of cycles when the device is operated at a hot-side temperature higher than 573 K. This may be attributed to oxidation of the skutterudite thermoelements.
Resumo:
Recent growth in brain-computer interface (BCI) research has increased pressure to report improved performance. However, different research groups report performance in different ways. Hence, it is essential that evaluation procedures are valid and reported in sufficient detail. In this chapter we give an overview of available performance measures such as classification accuracy, cohen’s kappa, information transfer rate (ITR), and written symbol rate. We show how to distinguish results from chance level using confidence intervals for accuracy or kappa. Furthermore, we point out common pitfalls when moving from offline to online analysis and provide a guide on how to conduct statistical tests on (BCI) results.
Resumo:
The evaluation of forecast performance plays a central role both in the interpretation and use of forecast systems and in their development. Different evaluation measures (scores) are available, often quantifying different characteristics of forecast performance. The properties of several proper scores for probabilistic forecast evaluation are contrasted and then used to interpret decadal probability hindcasts of global mean temperature. The Continuous Ranked Probability Score (CRPS), Proper Linear (PL) score, and IJ Good’s logarithmic score (also referred to as Ignorance) are compared; although information from all three may be useful, the logarithmic score has an immediate interpretation and is not insensitive to forecast busts. Neither CRPS nor PL is local; this is shown to produce counter intuitive evaluations by CRPS. Benchmark forecasts from empirical models like Dynamic Climatology place the scores in context. Comparing scores for forecast systems based on physical models (in this case HadCM3, from the CMIP5 decadal archive) against such benchmarks is more informative than internal comparison systems based on similar physical simulation models with each other. It is shown that a forecast system based on HadCM3 out performs Dynamic Climatology in decadal global mean temperature hindcasts; Dynamic Climatology previously outperformed a forecast system based upon HadGEM2 and reasons for these results are suggested. Forecasts of aggregate data (5-year means of global mean temperature) are, of course, narrower than forecasts of annual averages due to the suppression of variance; while the average “distance” between the forecasts and a target may be expected to decrease, little if any discernible improvement in probabilistic skill is achieved.