901 resultados para distinguishability metrics
Resumo:
This review of recent developments starts with the publication of Harold van der Heijden's Study Database Edition IV, John Nunn's second trilogy on the endgame, and a range of endgame tables (EGTs) to the DTC, DTZ and DTZ50 metrics. It then summarises data-mining work by Eiko Bleicher and Guy Haworth in 2010. This used CQL and pgn2fen to find some 3,000 EGT-faulted studies in the database above, and the Type A (value-critical) and Type B-DTM (DTM-depth-critical) zugzwangs in the mainlines of those studies. The same technique was used to mine Chessbase's BIG DATABASE 2010 to identify Type A/B zugzwangs, and to identify the pattern of value-concession and DTM-depth concession in sub-7-man play.
Resumo:
This paper presents the results of the crowd image analysis challenge of the Winter PETS 2009 workshop. The evaluation is carried out using a selection of the metrics developed in the Video Analysis and Content Extraction (VACE) program and the CLassification of Events, Activities, and Relationships (CLEAR) consortium [13]. The evaluation highlights the detection and tracking performance of the authors’systems in areas such as precision, accuracy and robustness. The performance is also compared to the PETS 2009 submitted results.
Resumo:
This paper presents the results of the crowd image analysis challenge of the PETS2010 workshop. The evaluation was carried out using a selection of the metrics developed in the Video Analysis and Content Extraction (VACE) program and the CLassification of Events, Activities, and Relationships (CLEAR) consortium. The PETS 2010 evaluation was performed using new ground truthing create from each independant two dimensional view. In addition, the performance of the submissions to the PETS 2009 and Winter-PETS 2009 were evaluated and included in the results. The evaluation highlights the detection and tracking performance of the authors’ systems in areas such as precision, accuracy and robustness.
Resumo:
The stratospheric climate and variability from simulations of sixteen chemistry‐climate models is evaluated. On average the polar night jet is well reproduced though its variability is less well reproduced with a large spread between models. Polar temperature biases are less than 5 K except in the Southern Hemisphere (SH) lower stratosphere in spring. The accumulated area of low temperatures responsible for polar stratospheric cloud formation is accurately reproduced for the Antarctic but underestimated for the Arctic. The shape and position of the polar vortex is well simulated, as is the tropical upwelling in the lower stratosphere. There is a wide model spread in the frequency of major sudden stratospheric warnings (SSWs), late biases in the breakup of the SH vortex, and a weak annual cycle in the zonal wind in the tropical upper stratosphere. Quantitatively, “metrics” indicate a wide spread in model performance for most diagnostics with systematic biases in many, and poorer performance in the SH than in the Northern Hemisphere (NH). Correlations were found in the SH between errors in the final warming, polar temperatures, the leading mode of variability, and jet strength, and in the NH between errors in polar temperatures, frequency of major SSWs, and jet strength. Models with a stronger QBO have stronger tropical upwelling and a colder NH vortex. Both the qualitative and quantitative analysis indicate a number of common and long‐standing model problems, particularly related to the simulation of the SH and stratospheric variability.
Resumo:
Recent research has suggested that forecast evaluation on the basis of standard statistical loss functions could prefer models which are sub-optimal when used in a practical setting. This paper explores a number of statistical models for predicting the daily volatility of several key UK financial time series. The out-of-sample forecasting performance of various linear and GARCH-type models of volatility are compared with forecasts derived from a multivariate approach. The forecasts are evaluated using traditional metrics, such as mean squared error, and also by how adequately they perform in a modern risk management setting. We find that the relative accuracies of the various methods are highly sensitive to the measure used to evaluate them. Such results have implications for any econometric time series forecasts which are subsequently employed in financial decisionmaking.
Resumo:
We describe the HadGEM2 family of climate configurations of the Met Office Unified Model, MetUM. The concept of a model "family" comprises a range of specific model configurations incorporating different levels of complexity but with a common physical framework. The HadGEM2 family of configurations includes atmosphere and ocean components, with and without a vertical extension to include a well-resolved stratosphere, and an Earth-System (ES) component which includes dynamic vegetation, ocean biology and atmospheric chemistry. The HadGEM2 physical model includes improvements designed to address specific systematic errors encountered in the previous climate configuration, HadGEM1, namely Northern Hemisphere continental temperature biases and tropical sea surface temperature biases and poor variability. Targeting these biases was crucial in order that the ES configuration could represent important biogeochemical climate feedbacks. Detailed descriptions and evaluations of particular HadGEM2 family members are included in a number of other publications, and the discussion here is limited to a summary of the overall performance using a set of model metrics which compare the way in which the various configurations simulate present-day climate and its variability.
Resumo:
There is a rising demand for the quantitative performance evaluation of automated video surveillance. To advance research in this area, it is essential that comparisons in detection and tracking approaches may be drawn and improvements in existing methods can be measured. There are a number of challenges related to the proper evaluation of motion segmentation, tracking, event recognition, and other components of a video surveillance system that are unique to the video surveillance community. These include the volume of data that must be evaluated, the difficulty in obtaining ground truth data, the definition of appropriate metrics, and achieving meaningful comparison of diverse systems. This chapter provides descriptions of useful benchmark datasets and their availability to the computer vision community. It outlines some ground truth and evaluation techniques, and provides links to useful resources. It concludes by discussing the future direction for benchmark datasets and their associated processes.
Resumo:
A new tropopause definition involving a flow-dependent blending of the traditional thermal tropopause with one based on potential vorticity has been developed and applied to the European Centre for Medium-Range Weather Forecasts (ECMWF) reanalyses (ERA), ERA-40 and ERA-Interim. Global and regional trends in tropopause characteristics for annual and solsticial seasonal means are presented here, with emphasis on significant results for the newer ERA-Interim data for 1989-2007. The global-mean tropopause is rising at a rate of 47 m decade−1 , with pressure falling at 1.0 hPa decade−1 , and temperature falling at 0.18 K decade−1 . The Antarctic tropopause shows decreasing heights,warming,and increasing westerly winds. The Arctic tropopause also shows a warming, but with decreasing westerly winds. In the tropics the trends are small, but at the latitudes of the sub-tropical jets they are almost double the global values. It is found that these changes are mainly concentrated in the eastern hemisphere. Previous and new metrics for the rate of broadening of the tropics, based on both height and wind, give trends in the range 0.9◦ decade−1 to 2.2◦ decade−1 . For ERA-40 the global height and pressure trends for the period 1979-2001 are similar: 39 m decade−1 and -0.8 hPa decade−1. These values are smaller than those found from the thermal tropopause definition with this data set, as was used in most previous studies.
Resumo:
This paper identifies some significant gaps in our knowledge of the configuration and performance of the property asset management sector. It is argued that, as many leading academic property researchers have focussed on financial vehicles and modelling, in-depth analysis of property assets and their management has been neglected. In terms of potential for future in-depth research, three key broad preliminary research themes or questions are identified. First, how do the active management opportunities presented, costs of management and the key management tasks vary with market conditions, asset type and life-cycle stage? Second, how is property asset management delivered and what are the main costs and benefits of different models of procurement? Finally, what are the appropriate metrics for measuring the performance of different property managers and approaches to property management? It is concluded that the lack of published materials addressing these issues has implications for educating property students.
Resumo:
This spreadsheet contains key data about that part of the endgame of Western Chess for which Endgame Tables (EGTs) have been generated by computer. It is derived from the EGT work since 1969 of Thomas Ströhlein, Ken Thompson, Christopher Wirth, Eugene Nalimov, Marc Bourzutschky, John Tamplin and Yakov Konoval. The data includes %s of wins, draws and losses (wtm and btm), the maximum and average depths of win under various metrics (DTC = Depth to Conversion, DTM = Depth to Mate, DTZ = Depth to Conversion or Pawn-push), and examples of positions of maximum depth. It is essentially about sub-7-man Chess but is updated as news comes in of 7-man EGT computations.
Resumo:
1. Species-based indices are frequently employed as surrogates for wider biodiversity health and measures of environmental condition. Species selection is crucial in determining an indicators metric value and hence the validity of the interpretation of ecosystem condition and function it provides, yet an objective process to identify appropriate indicator species is frequently lacking. 2. An effective indicator needs to (i) be representative, reflecting the status of wider biodiversity; (ii) be reactive, acting as early-warning systems for detrimental changes in environmental conditions; (iii) respond to change in a predictable way. We present an objective, niche-based approach for species' selection, founded on a coarse categorisation of species' niche space and key resource requirements, which ensures the resultant indicator has these key attributes. 3. We use UK farmland birds as a case study to demonstrate this approach, identifying an optimal indicator set containing 12 species. In contrast to the 19 species included in the farmland bird index (FBI), a key UK biodiversity indicator that contributes to one of the UK Government's headline indicators of sustainability, the niche space occupied by these species fully encompasses that occupied by the wider community of 62 species. 4. We demonstrate that the response of these 12 species to land-use change is a strong correlate to that of the wider farmland bird community. Furthermore, the temporal dynamics of the index based on their population trends closely matches the population dynamics of the wider community. However, in both analyses, the magnitude of the change in our indicator was significantly greater, allowing this indicator to act as an early-warning system. 5. Ecological indicators are embedded in environmental management, sustainable development and biodiversity conservation policy and practice where they act as metrics against which progress towards national, regional and global targets can be measured. Adopting this niche-based approach for objective selection of indicator species will facilitate the development of sensitive and representative indices for a range of taxonomic groups, habitats and spatial scales.
Resumo:
The estimation of prediction quality is important because without quality measures, it is difficult to determine the usefulness of a prediction. Currently, methods for ligand binding site residue predictions are assessed in the function prediction category of the biennial Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiment, utilizing the Matthews Correlation Coefficient (MCC) and Binding-site Distance Test (BDT) metrics. However, the assessment of ligand binding site predictions using such metrics requires the availability of solved structures with bound ligands. Thus, we have developed a ligand binding site quality assessment tool, FunFOLDQA, which utilizes protein feature analysis to predict ligand binding site quality prior to the experimental solution of the protein structures and their ligand interactions. The FunFOLDQA feature scores were combined using: simple linear combinations, multiple linear regression and a neural network. The neural network produced significantly better results for correlations to both the MCC and BDT scores, according to Kendall’s τ, Spearman’s ρ and Pearson’s r correlation coefficients, when tested on both the CASP8 and CASP9 datasets. The neural network also produced the largest Area Under the Curve score (AUC) when Receiver Operator Characteristic (ROC) analysis was undertaken for the CASP8 dataset. Furthermore, the FunFOLDQA algorithm incorporating the neural network, is shown to add value to FunFOLD, when both methods are employed in combination. This results in a statistically significant improvement over all of the best server methods, the FunFOLD method (6.43%), and one of the top manual groups (FN293) tested on the CASP8 dataset. The FunFOLDQA method was also found to be competitive with the top server methods when tested on the CASP9 dataset. To the best of our knowledge, FunFOLDQA is the first attempt to develop a method that can be used to assess ligand binding site prediction quality, in the absence of experimental data.
Resumo:
Vegetation distribution and state have been measured since 1981 by the AVHRR (Advanced Very High Resolution Radiometer) instrument through satellite remote sensing. In this study a correction method is applied to the Pathfinder NDVI (Normalized Difference Vegetation Index) data to create a continuous European vegetation phenology dataset of a 10-day temporal and 0.1° spatial resolution; additionally, land surface parameters for use in biosphere–atmosphere modelling are derived. The analysis of time-series from this dataset reveals, for the years 1982–2001, strong seasonal and interannual variability in European land surface vegetation state. Phenological metrics indicate a late and short growing season for the years 1985–1987, in addition to early and prolonged activity in the years 1989, 1990, 1994 and 1995. These variations are in close agreement with findings from phenological measurements at the surface; spring phenology is also shown to correlate particularly well with anomalies in winter temperature and winter North Atlantic Oscillation (NAO) index. Nevertheless, phenological metrics, which display considerable regional differences, could only be determined for vegetation with a seasonal behaviour. Trends in the phenological phases reveal a general shift to earlier (−0.54 days year−1) and prolonged (0.96 days year−1) growing periods which are statistically significant, especially for central Europe.
Resumo:
The estimation of the long-term wind resource at a prospective site based on a relatively short on-site measurement campaign is an indispensable task in the development of a commercial wind farm. The typical industry approach is based on the measure-correlate-predict �MCP� method where a relational model between the site wind velocity data and the data obtained from a suitable reference site is built from concurrent records. In a subsequent step, a long-term prediction for the prospective site is obtained from a combination of the relational model and the historic reference data. In the present paper, a systematic study is presented where three new MCP models, together with two published reference models �a simple linear regression and the variance ratio method�, have been evaluated based on concurrent synthetic wind speed time series for two sites, simulating the prospective and the reference site. The synthetic method has the advantage of generating time series with the desired statistical properties, including Weibull scale and shape factors, required to evaluate the five methods under all plausible conditions. In this work, first a systematic discussion of the statistical fundamentals behind MCP methods is provided and three new models, one based on a nonlinear regression and two �termed kernel methods� derived from the use of conditional probability density functions, are proposed. All models are evaluated by using five metrics under a wide range of values of the correlation coefficient, the Weibull scale, and the Weibull shape factor. Only one of all models, a kernel method based on bivariate Weibull probability functions, is capable of accurately predicting all performance metrics studied.
Resumo:
Decadal predictions have a high profile in the climate science community and beyond, yet very little is known about their skill. Nor is there any agreed protocol for estimating their skill. This paper proposes a sound and coordinated framework for verification of decadal hindcast experiments. The framework is illustrated for decadal hindcasts tailored to meet the requirements and specifications of CMIP5 (Coupled Model Intercomparison Project phase 5). The chosen metrics address key questions about the information content in initialized decadal hindcasts. These questions are: (1) Do the initial conditions in the hindcasts lead to more accurate predictions of the climate, compared to un-initialized climate change projections? and (2) Is the prediction model’s ensemble spread an appropriate representation of forecast uncertainty on average? The first question is addressed through deterministic metrics that compare the initialized and uninitialized hindcasts. The second question is addressed through a probabilistic metric applied to the initialized hindcasts and comparing different ways to ascribe forecast uncertainty. Verification is advocated at smoothed regional scales that can illuminate broad areas of predictability, as well as at the grid scale, since many users of the decadal prediction experiments who feed the climate data into applications or decision models will use the data at grid scale, or downscale it to even higher resolution. An overall statement on skill of CMIP5 decadal hindcasts is not the aim of this paper. The results presented are only illustrative of the framework, which would enable such studies. However, broad conclusions that are beginning to emerge from the CMIP5 results include (1) Most predictability at the interannual-to-decadal scale, relative to climatological averages, comes from external forcing, particularly for temperature; (2) though moderate, additional skill is added by the initial conditions over what is imparted by external forcing alone; however, the impact of initialization may result in overall worse predictions in some regions than provided by uninitialized climate change projections; (3) limited hindcast records and the dearth of climate-quality observational data impede our ability to quantify expected skill as well as model biases; and (4) as is common to seasonal-to-interannual model predictions, the spread of the ensemble members is not necessarily a good representation of forecast uncertainty. The authors recommend that this framework be adopted to serve as a starting point to compare prediction quality across prediction systems. The framework can provide a baseline against which future improvements can be quantified. The framework also provides guidance on the use of these model predictions, which differ in fundamental ways from the climate change projections that much of the community has become familiar with, including adjustment of mean and conditional biases, and consideration of how to best approach forecast uncertainty.