992 resultados para Continuous ranked probability score
Resumo:
The continuous ranked probability score (CRPS) is a frequently used scoring rule. In contrast with many other scoring rules, the CRPS evaluates cumulative distribution functions. An ensemble of forecasts can easily be converted into a piecewise constant cumulative distribution function with steps at the ensemble members. This renders the CRPS a convenient scoring rule for the evaluation of ‘raw’ ensembles, obviating the need for sophisticated ensemble model output statistics or dressing methods prior to evaluation. In this article, a relation between the CRPS score and the quantile score is established. The evaluation of ‘raw’ ensembles using the CRPS is discussed in this light. It is shown that latent in this evaluation is an interpretation of the ensemble as quantiles but with non-uniform levels. This needs to be taken into account if the ensemble is evaluated further, for example with rank histograms.
Resumo:
The skill of a forecast can be assessed by comparing the relative proximity of both the forecast and a benchmark to the observations. Example benchmarks include climatology or a naïve forecast. Hydrological ensemble prediction systems (HEPS) are currently transforming the hydrological forecasting environment but in this new field there is little information to guide researchers and operational forecasters on how benchmarks can be best used to evaluate their probabilistic forecasts. In this study, it is identified that the forecast skill calculated can vary depending on the benchmark selected and that the selection of a benchmark for determining forecasting system skill is sensitive to a number of hydrological and system factors. A benchmark intercomparison experiment is then undertaken using the continuous ranked probability score (CRPS), a reference forecasting system and a suite of 23 different methods to derive benchmarks. The benchmarks are assessed within the operational set-up of the European Flood Awareness System (EFAS) to determine those that are ‘toughest to beat’ and so give the most robust discrimination of forecast skill, particularly for the spatial average fields that EFAS relies upon. Evaluating against an observed discharge proxy the benchmark that has most utility for EFAS and avoids the most naïve skill across different hydrological situations is found to be meteorological persistency. This benchmark uses the latest meteorological observations of precipitation and temperature to drive the hydrological model. Hydrological long term average benchmarks, which are currently used in EFAS, are very easily beaten by the forecasting system and the use of these produces much naïve skill. When decomposed into seasons, the advanced meteorological benchmarks, which make use of meteorological observations from the past 20 years at the same calendar date, have the most skill discrimination. They are also good at discriminating skill in low flows and for all catchment sizes. Simpler meteorological benchmarks are particularly useful for high flows. Recommendations for EFAS are to move to routine use of meteorological persistency, an advanced meteorological benchmark and a simple meteorological benchmark in order to provide a robust evaluation of forecast skill. This work provides the first comprehensive evidence on how benchmarks can be used in evaluation of skill in probabilistic hydrological forecasts and which benchmarks are most useful for skill discrimination and avoidance of naïve skill in a large scale HEPS. It is recommended that all HEPS use the evidence and methodology provided here to evaluate which benchmarks to employ; so forecasters can have trust in their skill evaluation and will have confidence that their forecasts are indeed better.
Resumo:
The evaluation of forecast performance plays a central role both in the interpretation and use of forecast systems and in their development. Different evaluation measures (scores) are available, often quantifying different characteristics of forecast performance. The properties of several proper scores for probabilistic forecast evaluation are contrasted and then used to interpret decadal probability hindcasts of global mean temperature. The Continuous Ranked Probability Score (CRPS), Proper Linear (PL) score, and IJ Good’s logarithmic score (also referred to as Ignorance) are compared; although information from all three may be useful, the logarithmic score has an immediate interpretation and is not insensitive to forecast busts. Neither CRPS nor PL is local; this is shown to produce counter intuitive evaluations by CRPS. Benchmark forecasts from empirical models like Dynamic Climatology place the scores in context. Comparing scores for forecast systems based on physical models (in this case HadCM3, from the CMIP5 decadal archive) against such benchmarks is more informative than internal comparison systems based on similar physical simulation models with each other. It is shown that a forecast system based on HadCM3 out performs Dynamic Climatology in decadal global mean temperature hindcasts; Dynamic Climatology previously outperformed a forecast system based upon HadGEM2 and reasons for these results are suggested. Forecasts of aggregate data (5-year means of global mean temperature) are, of course, narrower than forecasts of annual averages due to the suppression of variance; while the average “distance” between the forecasts and a target may be expected to decrease, little if any discernible improvement in probabilistic skill is achieved.
Resumo:
We have considered a Bayesian approach for the nonlinear regression model by replacing the normal distribution on the error term by some skewed distributions, which account for both skewness and heavy tails or skewness alone. The type of data considered in this paper concerns repeated measurements taken in time on a set of individuals. Such multiple observations on the same individual generally produce serially correlated outcomes. Thus, additionally, our model does allow for a correlation between observations made from the same individual. We have illustrated the procedure using a data set to study the growth curves of a clinic measurement of a group of pregnant women from an obstetrics clinic in Santiago, Chile. Parameter estimation and prediction were carried out using appropriate posterior simulation schemes based in Markov Chain Monte Carlo methods. Besides the deviance information criterion (DIC) and the conditional predictive ordinate (CPO), we suggest the use of proper scoring rules based on the posterior predictive distribution for comparing models. For our data set, all these criteria chose the skew-t model as the best model for the errors. These DIC and CPO criteria are also validated, for the model proposed here, through a simulation study. As a conclusion of this study, the DIC criterion is not trustful for this kind of complex model.
Resumo:
PURPOSE. To assess whether baseline Glaucoma Probability Score (GPS; HRT-3; Heidelberg Engineering, Dossenheim, Germany) results are predictive of progression in patients with suspected glaucoma. The GPS is a new feature of the confocal scanning laser ophthalmoscope that generates an operator-independent, three-dimensional model of the optic nerve head and gives a score for the probability that this model is consistent with glaucomatous damage. METHODS. The study included 223 patients with suspected glaucoma during an average follow-up of 63.3 months. Included subjects had a suspect optic disc appearance and/or elevated intraocular pressure, but normal visual fields. Conversion was defined as development of either repeatable abnormal visual fields or glaucomatous deterioration in the appearance of the optic disc during the study period. The association between baseline GPS and conversion was investigated by Cox regression models. RESULTS. Fifty-four (24.2%) eyes converted. In multivariate models, both higher values of GPS global and subjective stereophotograph assessment ( larger cup-disc ratio and glaucomatous grading) were predictive of conversion: adjusted hazard ratios (95% CI): 1.31 (1.15 - 1.50) per 0.1 higher global GPS, 1.34 (1.12 - 1.62) per 0.1 higher CDR, and 2.34 (1.22 - 4.47) for abnormal grading, respectively. No significant differences ( P > 0.05 for all comparisons) were found between the c-index values ( equivalent to area under ROC curve) for the multivariate models (0.732, 0.705, and 0.699, respectively). CONCLUSIONS. GPS values were predictive of conversion in our population of patients with suspected glaucoma. Further, they performed as well as subjective assessment of the optic disc. These results suggest that GPS could potentially replace stereophotograph as a tool for estimating the likelihood of conversion to glaucoma.
Resumo:
Reliability analysis of probabilistic forecasts, in particular through the rank histogram or Talagrand diagram, is revisited. Two shortcomings are pointed out: Firstly, a uniform rank histogram is but a necessary condition for reliability. Secondly, if the forecast is assumed to be reliable, an indication is needed how far a histogram is expected to deviate from uniformity merely due to randomness. Concerning the first shortcoming, it is suggested that forecasts be grouped or stratified along suitable criteria, and that reliability is analyzed individually for each forecast stratum. A reliable forecast should have uniform histograms for all individual forecast strata, not only for all forecasts as a whole. As to the second shortcoming, instead of the observed frequencies, the probability of the observed frequency is plotted, providing and indication of the likelihood of the result under the hypothesis that the forecast is reliable. Furthermore, a Goodness-Of-Fit statistic is discussed which is essentially the reliability term of the Ignorance score. The discussed tools are applied to medium range forecasts for 2 m-temperature anomalies at several locations and lead times. The forecasts are stratified along the expected ranked probability score. Those forecasts which feature a high expected score turn out to be particularly unreliable.
Resumo:
Spatial prediction of hourly rainfall via radar calibration is addressed. The change of support problem (COSP), arising when the spatial supports of different data sources do not coincide, is faced in a non-Gaussian setting; in fact, hourly rainfall in Emilia-Romagna region, in Italy, is characterized by abundance of zero values and right-skeweness of the distribution of positive amounts. Rain gauge direct measurements on sparsely distributed locations and hourly cumulated radar grids are provided by the ARPA-SIMC Emilia-Romagna. We propose a three-stage Bayesian hierarchical model for radar calibration, exploiting rain gauges as reference measure. Rain probability and amounts are modeled via linear relationships with radar in the log scale; spatial correlated Gaussian effects capture the residual information. We employ a probit link for rainfall probability and Gamma distribution for rainfall positive amounts; the two steps are joined via a two-part semicontinuous model. Three model specifications differently addressing COSP are presented; in particular, a stochastic weighting of all radar pixels, driven by a latent Gaussian process defined on the grid, is employed. Estimation is performed via MCMC procedures implemented in C, linked to R software. Communication and evaluation of probabilistic, point and interval predictions is investigated. A non-randomized PIT histogram is proposed for correctly assessing calibration and coverage of two-part semicontinuous models. Predictions obtained with the different model specifications are evaluated via graphical tools (Reliability Plot, Sharpness Histogram, PIT Histogram, Brier Score Plot and Quantile Decomposition Plot), proper scoring rules (Brier Score, Continuous Rank Probability Score) and consistent scoring functions (Root Mean Square Error and Mean Absolute Error addressing the predictive mean and median, respectively). Calibration is reached and the inclusion of neighbouring information slightly improves predictions. All specifications outperform a benchmark model with incorrelated effects, confirming the relevance of spatial correlation for modeling rainfall probability and accumulation.
Resumo:
PURPOSE. We previously demonstrated that most eyes have regionally variable extensions of Bruch's membrane (BM) inside the clinically identified disc margin (DM) that are clinically and photographically invisible. We studied the impact of these findings on DM- and BM opening (BMO)-derived neuroretinal rim parameters. METHODS. Disc stereo-photography and spectral domain optical coherence tomography (SD-OCT, 24 radial B-scans centered on the optic nerve head) were performed on 30 glaucoma patients and 10 age-matched controls. Photographs were colocalized to SD-OCT data such that the DM and BMO could be visualized in each B-scan. Three parameters were computed: (1) DM-horizontal rim width (HRW), the distance between the DM and internal limiting membrane (ILM) along the DM reference plane; (2) BMO-HRW, the distance between BMO and ILM along the BMO reference plane; and (3) BMO-minimum rim width (MRW), the minimum distance between BMO and ILM. Rank-order correlations of sectors ranked by rim width and spatial concordance measured as angular distances between equivalently ranked sectors were derived. RESULTS. The average DM position was external to BMO in all quadrants, except inferotemporally. There were significant sectoral differences among all three rim parameters. DM- HRW and BMO-HRW sector ranks were better correlated (median rho = 0.84) than DM- HRW and BMO-MRW (median rho = 0.55), or BMO-HRW and BMO-MRW (median rho = 0.60) ranks. Sectors with the narrowest BMO-MRW were infrequently the same as those with the narrowest DM-HRW or BMO-HRW. CONCLUSIONS. BMO-MRW quantifies the neuroretinal rim from a true anatomical outer border and accounts for its variable trajectory at the point of measurement. (Invest Ophthalmol Vis Sci. 2012;53:1852-1860) DOI:10.1167/iovs.11-9309
Resumo:
In previous papers, the type-I intermittent phenomenon with continuous reinjection probability density (RPD) has been extensively studied. However, in this paper type-I intermittency considering discontinuous RPD function in one-dimensional maps is analyzed. To carry out the present study the analytic approximation presented by del Río and Elaskar (Int. J. Bifurc. Chaos 20:1185-1191, 2010) and Elaskar et al. (Physica A. 390:2759-2768, 2011) is extended to consider discontinuous RPD functions. The results of this analysis show that the characteristic relation only depends on the position of the lower bound of reinjection (LBR), therefore for the LBR below the tangent point the relation {Mathematical expression}, where {Mathematical expression} is the control parameter, remains robust regardless the form of the RPD, although the average of the laminar phases {Mathematical expression} can change. Finally, the study of discontinuous RPD for type-I intermittency which occurs in a three-wave truncation model for the derivative nonlinear Schrodinger equation is presented. In all tests the theoretical results properly verify the numerical data
Resumo:
In questo studio, un multi-model ensemble è stato implementato e verificato, seguendo una delle priorità di ricerca del Subseasonal to Seasonal Prediction Project (S2S). Una regressione lineare è stata applicata ad un insieme di previsioni di ensemble su date passate, prodotte dai centri di previsione mensile del CNR-ISAC e ECMWF-IFS. Ognuna di queste contiene un membro di controllo e quattro elementi perturbati. Le variabili scelte per l'analisi sono l'altezza geopotenziale a 500 hPa, la temperatura a 850 hPa e la temperatura a 2 metri, la griglia spaziale ha risoluzione 1 ◦ × 1 ◦ lat-lon e sono stati utilizzati gli inverni dal 1990 al 2010. Le rianalisi di ERA-Interim sono utilizzate sia per realizzare la regressione, sia nella validazione dei risultati, mediante stimatori nonprobabilistici come lo scarto quadratico medio (RMSE) e la correlazione delle anomalie. Successivamente, tecniche di Model Output Statistics (MOS) e Direct Model Output (DMO) sono applicate al multi-model ensemble per ottenere previsioni probabilistiche per la media settimanale delle anomalie di temperatura a 2 metri. I metodi MOS utilizzati sono la regressione logistica e la regressione Gaussiana non-omogenea, mentre quelli DMO sono il democratic voting e il Tukey plotting position. Queste tecniche sono applicate anche ai singoli modelli in modo da effettuare confronti basati su stimatori probabilistici, come il ranked probability skill score, il discrete ranked probability skill score e il reliability diagram. Entrambe le tipologie di stimatori mostrano come il multi-model abbia migliori performance rispetto ai singoli modelli. Inoltre, i valori più alti di stimatori probabilistici sono ottenuti usando una regressione logistica sulla sola media di ensemble. Applicando la regressione a dataset di dimensione ridotta, abbiamo realizzato una curva di apprendimento che mostra come un aumento del numero di date nella fase di addestramento non produrrebbe ulteriori miglioramenti.
Resumo:
Purpose: To compare the ability of Subjective assessment of optic nerve head (ONH) and retinal nerve fiber layer (RNFL) by general ophthalmologists and by a glaucoma expert with objective measurements by optical coherence tomography (Stratus OCT, Carl Zeiss Meditec Inc), confocal scanning laser ophthalmoscope (HRT III; Heidelberg Engineering, Heidelberg. Germany), and scanning laser polarimetry (GDx enhanced corneal compensation; Carl Zeiss Meditec Inc, Dublin, CA) in discriminating glaucomatous and normal eyes. Methods: Sixty-one glaucomatous and 57 normal eyes or 118 subjects Were included in the study. Three independent general ophthalmologists and I glaucoma expert evaluated ONH stereo-photographs. Receiver operating characteristic curves were constructed for each imaging technique and sensitivity at fixed specificity was estimated. Comparisons or areas under these curves (aROCs) and agreement (k) were determined between stereophoto grading and best parameter from each technique. Results: Best parameter from each technique showed larger aROC (Stratus OCT RNFL 0.92; Stratus OCT ONH vertical integrated area = 0.86; Stratus OCT macular thickness = 0.82; GDx enhanced corneal compensation = 0.91, HRT3 global cup-to-disc ratio = 0.83; HRT3 glaucoma probability score numeric area score 0.83) compared with stereophotograph grading by general ophthalmologists (0.80) in separating glaucomatous and normal eyes. Glaucoma expert stereophoto grading provided equal or larger aROC (0.92) than best parameter of each computerized imaging device. Stereophoto evaluated by a glaucoma expert showed better agreement with best parameter of each quantitative imaging technique in classifying eyes either as glaucomatous or normal compared with stereophoto grading by general ophthalmologists, The combination Of Subjective assessment of the optic disc by general ophthalmologists with RNFL objective parameters improved identification of glaucoma patients in a larger proportion than the combination of these objective parameters with Subjective assessment of the optic disc by a glaucoma expert (29.5% vs. 19.7%, respectively). Conclusions: Diagnostic ability of all imaging techniques showed better performance than subjective assessment of the ONH by general ophthalmologists, but not by It glaucoma expert, Objective RNFL measurements may provide improvement in glaucoma detection when combined with subjective assessment of the optic disc by general ophthalmologists or by a glaucoma expert.
Resumo:
PURPOSE. To evaluate the effect of disease severity and optic disc size on the diagnostic accuracies of optic nerve head (ONH), retinal nerve fiber layer (RNFL), and macular parameters with RTVue (Optovue, Fremont, CA) spectral domain optical coherence tomography (SDOCT) in glaucoma. METHODS. 110 eyes of 62 normal subjects and 193 eyes of 136 glaucoma patients from the Diagnostic Innovations in Glaucoma Study underwent ONH, RNFL, and macular imaging with RTVue. Severity of glaucoma was based on visual field index (VFI) values from standard automated perimetry. Optic disc size was based on disc area measurement using the Heidelberg Retina Tomograph II (Heidelberg Engineering, Dossenheim, Germany). Influence of disease severity and disc size on the diagnostic accuracy of RTVue was evaluated by receiver operating characteristic (ROC) and logistic regression models. RESULTS. Areas under ROC curve (AUC) of all scanning areas increased (P < 0.05) as disease severity increased. For a VFI value of 99%, indicating early damage, AUCs for rim area, average RNLI thickness, and ganglion cell complex-root mean square were 0.693, 0.799, and 0.779, respectively. For a VFI of 70%, indicating severe damage, corresponding AUCs were 0.828, 0.985, and 0.992, respectively. Optic disc size did not influence the AUCs of any of the SDOCT scanning protocols of RTVue (P > 0.05). Sensitivity of the rim area increased and specificity decreased in large optic discs. CONCLUSIONS. Diagnostic accuracies of RTVue scanning protocols for glaucoma were significantly influenced by disease severity. Sensitivity of the rim area increased in large optic discs at the expense of specificity. (Invest Ophthalmol Vis Sci. 2011;92:1290-1296) DOI:10.1167/iovs.10-5516
Resumo:
BACKGROUND: The availability of the P. falciparum genome has led to novel ways to identify potential vaccine candidates. A new approach for antigen discovery based on the bioinformatic selection of heptad repeat motifs corresponding to alpha-helical coiled coil structures yielded promising results. To elucidate the question about the relationship between the coiled coil motifs and their sequence conservation, we have assessed the extent of polymorphism in putative alpha-helical coiled coil domains in culture strains, in natural populations and in the single nucleotide polymorphism data available at PlasmoDB. METHODOLOGY/PRINCIPAL FINDINGS: 14 alpha-helical coiled coil domains were selected based on preclinical experimental evaluation. They were tested by PCR amplification and sequencing of different P. falciparum culture strains and field isolates. We found that only 3 out of 14 alpha-helical coiled coils showed point mutations and/or length polymorphisms. Based on promising immunological results 5 of these peptides were selected for further analysis. Direct sequencing of field samples from Papua New Guinea and Tanzania showed that 3 out of these 5 peptides were completely conserved. An in silico analysis of polymorphism was performed for all 166 putative alpha-helical coiled coil domains originally identified in the P. falciparum genome. We found that 82% (137/166) of these peptides were conserved, and for one peptide only the detected SNPs decreased substantially the probability score for alpha-helical coiled coil formation. More SNPs were found in arrays of almost perfect tandem repeats. In summary, the coiled coil structure prediction was rarely modified by SNPs. The analysis revealed a number of peptides with strictly conserved alpha-helical coiled coil motifs. CONCLUSION/SIGNIFICANCE: We conclude that the selection of alpha-helical coiled coil structural motifs is a valuable approach to identify potential vaccine targets showing a high degree of conservation.
Resumo:
A new Bayesian algorithm for retrieving surface rain rate from Tropical Rainfall Measuring Mission (TRMM) Microwave Imager (TMI) over the ocean is presented, along with validations against estimates from the TRMM Precipitation Radar (PR). The Bayesian approach offers a rigorous basis for optimally combining multichannel observations with prior knowledge. While other rain-rate algorithms have been published that are based at least partly on Bayesian reasoning, this is believed to be the first self-contained algorithm that fully exploits Bayes’s theorem to yield not just a single rain rate, but rather a continuous posterior probability distribution of rain rate. To advance the understanding of theoretical benefits of the Bayesian approach, sensitivity analyses have been conducted based on two synthetic datasets for which the “true” conditional and prior distribution are known. Results demonstrate that even when the prior and conditional likelihoods are specified perfectly, biased retrievals may occur at high rain rates. This bias is not the result of a defect of the Bayesian formalism, but rather represents the expected outcome when the physical constraint imposed by the radiometric observations is weak owing to saturation effects. It is also suggested that both the choice of the estimators and the prior information are crucial to the retrieval. In addition, the performance of the Bayesian algorithm herein is found to be comparable to that of other benchmark algorithms in real-world applications, while having the additional advantage of providing a complete continuous posterior probability distribution of surface rain rate.
Resumo:
Preparing for episodes with risks of anomalous weather a month to a year ahead is an important challenge for governments, non-governmental organisations, and private companies and is dependent on the availability of reliable forecasts. The majority of operational seasonal forecasts are made using process-based dynamical models, which are complex, computationally challenging and prone to biases. Empirical forecast approaches built on statistical models to represent physical processes offer an alternative to dynamical systems and can provide either a benchmark for comparison or independent supplementary forecasts. Here, we present a simple empirical system based on multiple linear regression for producing probabilistic forecasts of seasonal surface air temperature and precipitation across the globe. The global CO2-equivalent concentration is taken as the primary predictor; subsequent predictors, including large-scale modes of variability in the climate system and local-scale information, are selected on the basis of their physical relationship with the predictand. The focus given to the climate change signal as a source of skill and the probabilistic nature of the forecasts produced constitute a novel approach to global empirical prediction. Hindcasts for the period 1961–2013 are validated against observations using deterministic (correlation of seasonal means) and probabilistic (continuous rank probability skill scores) metrics. Good skill is found in many regions, particularly for surface air temperature and most notably in much of Europe during the spring and summer seasons. For precipitation, skill is generally limited to regions with known El Niño–Southern Oscillation (ENSO) teleconnections. The system is used in a quasi-operational framework to generate empirical seasonal forecasts on a monthly basis.