22 resultados para Score metric

em CentAUR: Central Archive University of Reading - UK


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Across Europe, elevated phosphorus (P) concentrations in lowland rivers have made them particularly susceptible to eutrophication. This is compounded in southern and central UK by increasing pressures on water resources, which may be further enhanced by the potential effects of climate change. The EU Water Framework Directive requires an integrated approach to water resources management at the catchment scale and highlights the need for modelling tools that can distinguish relative contributions from multiple nutrient sources and are consistent with the information content of the available data. Two such models are introduced and evaluated within a stochastic framework using daily flow and total phosphorus concentrations recorded in a clay catchment typical of many areas of the lowland UK. Both models disaggregate empirical annual load estimates, derived from land use data, as a function of surface/near surface runoff, generated using a simple conceptual rainfall-runoff model. Estimates of the daily load from agricultural land, together with those from baseflow and point sources, feed into an in-stream routing algorithm. The first model assumes constant concentrations in runoff via surface/near surface pathways and incorporates an additional P store in the river-bed sediments, depleted above a critical discharge, to explicitly simulate resuspension. The second model, which is simpler, simulates P concentrations as a function of surface/near surface runoff, thus emphasising the influence of non-point source loads during flow peaks and mixing of baseflow and point sources during low flows. The temporal consistency of parameter estimates and thus the suitability of each approach is assessed dynamically following a new approach based on Monte-Carlo analysis. (c) 2004 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a novel method for scoring the accuracy of protein binding site predictions – the Binding-site Distance Test (BDT) score. Recently, the Matthews Correlation Coefficient (MCC) has been used to evaluate binding site predictions, both by developers of new methods and by the assessors for the community wide prediction experiment – CASP8. Whilst being a rigorous scoring method, the MCC does not take into account the actual 3D location of the predicted residues from the observed binding site. Thus, an incorrectly predicted site that is nevertheless close to the observed binding site will obtain an identical score to the same number of nonbinding residues predicted at random. The MCC is somewhat affected by the subjectivity of determining observed binding residues and the ambiguity of choosing distance cutoffs. By contrast the BDT method produces continuous scores ranging between 0 and 1, relating to the distance between the predicted and observed residues. Residues predicted close to the binding site will score higher than those more distant, providing a better reflection of the true accuracy of predictions. The CASP8 function predictions were evaluated using both the MCC and BDT methods and the scores were compared. The BDT was found to strongly correlate with the MCC scores whilst also being less susceptible to the subjectivity of defining binding residues. We therefore suggest that this new simple score is a potentially more robust method for future evaluations of protein-ligand binding site predictions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper a robust method is developed for the analysis of data consisting of repeated binary observations taken at up to three fixed time points on each subject. The primary objective is to compare outcomes at the last time point, using earlier observations to predict this for subjects with incomplete records. A score test is derived. The method is developed for application to sequential clinical trials, as at interim analyses there will be many incomplete records occurring in non-informative patterns. Motivation for the methodology comes from experience with clinical trials in stroke and head injury, and data from one such trial is used to illustrate the approach. Extensions to more than three time points and to allow for stratification are discussed. Copyright © 2005 John Wiley & Sons, Ltd.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A score test is developed for binary clinical trial data, which incorporates patient non-compliance while respecting randomization. It is assumed in this paper that compliance is all-or-nothing, in the sense that a patient either accepts all of the treatment assigned as specified in the protocol, or none of it. Direct analytic comparisons of the adjusted test statistic for both the score test and the likelihood ratio test are made with the corresponding test statistics that adhere to the intention-to-treat principle. It is shown that no gain in power is possible over the intention-to-treat analysis, by adjusting for patient non-compliance. Sample size formulae are derived and simulation studies are used to demonstrate that the sample size approximation holds. Copyright © 2003 John Wiley & Sons, Ltd.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents an efficient construction algorithm for obtaining sparse kernel density estimates based on a regression approach that directly optimizes model generalization capability. Computational efficiency of the density construction is ensured using an orthogonal forward regression, and the algorithm incrementally minimizes the leave-one-out test score. A local regularization method is incorporated naturally into the density construction process to further enforce sparsity. An additional advantage of the proposed algorithm is that it is fully automatic and the user is not required to specify any criterion to terminate the density construction procedure. This is in contrast to an existing state-of-art kernel density estimation method using the support vector machine (SVM), where the user is required to specify some critical algorithm parameter. Several examples are included to demonstrate the ability of the proposed algorithm to effectively construct a very sparse kernel density estimate with comparable accuracy to that of the full sample optimized Parzen window density estimate. Our experimental results also demonstrate that the proposed algorithm compares favorably with the SVM method, in terms of both test accuracy and sparsity, for constructing kernel density estimates.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

There is growing interest, especially for trials in stroke, in combining multiple endpoints in a single clinical evaluation of an experimental treatment. The endpoints might be repeated evaluations of the same characteristic or alternative measures of progress on different scales. Often they will be binary or ordinal, and those are the cases studied here. In this paper we take a direct approach to combining the univariate score statistics for comparing treatments with respect to each endpoint. The correlations between the score statistics are derived and used to allow a valid combined score test to be applied. A sample size formula is deduced and application in sequential designs is discussed. The method is compared with an alternative approach based on generalized estimating equations in an illustrative analysis and replicated simulations, and the advantages and disadvantages of the two approaches are discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the forecasting of binary events, verification measures that are “equitable” were defined by Gandin and Murphy to satisfy two requirements: 1) they award all random forecasting systems, including those that always issue the same forecast, the same expected score (typically zero), and 2) they are expressible as the linear weighted sum of the elements of the contingency table, where the weights are independent of the entries in the table, apart from the base rate. The authors demonstrate that the widely used “equitable threat score” (ETS), as well as numerous others, satisfies neither of these requirements and only satisfies the first requirement in the limit of an infinite sample size. Such measures are referred to as “asymptotically equitable.” In the case of ETS, the expected score of a random forecasting system is always positive and only falls below 0.01 when the number of samples is greater than around 30. Two other asymptotically equitable measures are the odds ratio skill score and the symmetric extreme dependency score, which are more strongly inequitable than ETS, particularly for rare events; for example, when the base rate is 2% and the sample size is 1000, random but unbiased forecasting systems yield an expected score of around −0.5, reducing in magnitude to −0.01 or smaller only for sample sizes exceeding 25 000. This presents a problem since these nonlinear measures have other desirable properties, in particular being reliable indicators of skill for rare events (provided that the sample size is large enough). A potential way to reconcile these properties with equitability is to recognize that Gandin and Murphy’s two requirements are independent, and the second can be safely discarded without losing the key advantages of equitability that are embodied in the first. This enables inequitable and asymptotically equitable measures to be scaled to make them equitable, while retaining their nonlinearity and other properties such as being reliable indicators of skill for rare events. It also opens up the possibility of designing new equitable verification measures.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cloud radar and lidar can be used to evaluate the skill of numerical weather prediction models in forecasting the timing and placement of clouds, but care must be taken in choosing the appropriate metric of skill to use due to the non- Gaussian nature of cloud-fraction distributions. We compare the properties of a number of different verification measures and conclude that of existing measures the Log of Odds Ratio is the most suitable for cloud fraction. We also propose a new measure, the Symmetric Extreme Dependency Score, which has very attractive properties, being equitable (for large samples), difficult to hedge and independent of the frequency of occurrence of the quantity being verified. We then use data from five European ground-based sites and seven forecast models, processed using the ‘Cloudnet’ analysis system, to investigate the dependence of forecast skill on cloud fraction threshold (for binary skill scores), height, horizontal scale and (for the Met Office and German Weather Service models) forecast lead time. The models are found to be least skillful at predicting the timing and placement of boundary-layer clouds and most skilful at predicting mid-level clouds, although in the latter case they tend to underestimate mean cloud fraction when cloud is present. It is found that skill decreases approximately inverse-exponentially with forecast lead time, enabling a forecast ‘half-life’ to be estimated. When considering the skill of instantaneous model snapshots, we find typical values ranging between 2.5 and 4.5 days. Copyright c 2009 Royal Meteorological Society

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Unless the benefits to society of measures to protect and improve the welfare of animals are made transparent by means of their valuation they are likely to go unrecognised and cannot easily be weighed against the costs of such measures as required, for example, by policy-makers. A simple single measure scoring system, based on the Welfare Quality® index, is used, together with a choice experiment economic valuation method, to estimate the value that people place on improvements to the welfare of different farm animal species measured on a continuous (0-100) scale. Results from using the method on a survey sample of some 300 people show that it is able to elicit apparently credible values. The survey found that 96% of respondents thought that we have a moral obligation to safeguard the welfare of animals and that over 72% were concerned about the way farm animals are treated. Estimated mean annual willingness to pay for meat from animals with improved welfare of just one point on the scale was £5.24 for beef cattle, £4.57 for pigs and £5.10 for meat chickens. Further development of the method is required to capture the total economic value of animal welfare benefits. Despite this, the method is considered a practical means for obtaining economic values that can be used in the cost-benefit appraisal of policy measures intended to improve the welfare of animals.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

References (20)Cited By (1)Export CitationAboutAbstract Proper scoring rules provide a useful means to evaluate probabilistic forecasts. Independent from scoring rules, it has been argued that reliability and resolution are desirable forecast attributes. The mathematical expectation value of the score allows for a decomposition into reliability and resolution related terms, demonstrating a relationship between scoring rules and reliability/resolution. A similar decomposition holds for the empirical (i.e. sample average) score over an archive of forecast–observation pairs. This empirical decomposition though provides a too optimistic estimate of the potential score (i.e. the optimum score which could be obtained through recalibration), showing that a forecast assessment based solely on the empirical resolution and reliability terms will be misleading. The differences between the theoretical and empirical decomposition are investigated, and specific recommendations are given how to obtain better estimators of reliability and resolution in the case of the Brier and Ignorance scoring rule.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The continuous ranked probability score (CRPS) is a frequently used scoring rule. In contrast with many other scoring rules, the CRPS evaluates cumulative distribution functions. An ensemble of forecasts can easily be converted into a piecewise constant cumulative distribution function with steps at the ensemble members. This renders the CRPS a convenient scoring rule for the evaluation of ‘raw’ ensembles, obviating the need for sophisticated ensemble model output statistics or dressing methods prior to evaluation. In this article, a relation between the CRPS score and the quantile score is established. The evaluation of ‘raw’ ensembles using the CRPS is discussed in this light. It is shown that latent in this evaluation is an interpretation of the ensemble as quantiles but with non-uniform levels. This needs to be taken into account if the ensemble is evaluated further, for example with rank histograms.