Decomposing scores into miscalibration, discrimination and uncertainty — summary.reliabilitydiag • reliabilitydiag

An object of class reliabilitydiag contains the observations, the original forecasts, and recalibrated forecasts given by isotonic regression. The function summary.reliabilitydiag calculates quantitative measures of predictive performance, miscalibration, discrimination, and uncertainty, for each of the prediction methods in relation to their recalibrated version.

# S3 method for reliabilitydiag
summary(object, ..., score = "brier")

Arguments

object: an object inheriting from the class 'reliabilitydiag'.
...: further arguments to be passed to or from methods.
score: currently only "brier" or a vectorized scoring function, that is, function(observation, prediction).

Value

A 'summary.reliability' object, which is also a tibble (see tibble::tibble()) with columns:

`forecast`	the name of the prediction method.
`mean_score`	the mean score of the original forecast values.
`miscalibration`	a measure of miscalibration (how reliable is the prediction method?), smaller is better.
`discrimination`	a measure of discrimination (how variable are the recalibrated predictions?), larger is better.
`uncertainty`	the mean score of a constant prediction at the value of the average observation.

Details

Predictive performance is measured by the mean score of the original forecast values, denoted by $S$.

Uncertainty, denoted by $UNC$, is the mean score of a constant prediction at the value of the average observation. It is the highest possible mean score of a calibrated prediction method.

Discrimination, denoted by $DSC$, is $UNC$ minus the mean score of the PAV-recalibrated forecast values. A small value indicates a low information content (low signal) in the original forecast values.

Miscalibration, denoted by $MCB$, is $S$ minus the mean score of the PAV-recalibrated forecast values. A high value indicates that predictive performance of the prediction method can be improved by recalibration.

These measures are related by the following equation, $$S = MCB - DSC + UNC.$$ Score decompositions of this type have been studied extensively, but the optimality of the PAV solution ensures that $MCB$ is nonnegative, regardless of the chosen (admissible) scoring function. This is a unique property achieved by choosing PAV-recalibration.

If deviating from the Brier score as performance metric, make sure to choose a proper scoring rule for binary events, or equivalently, a scoring function with outcome space {0, 1} that is consistent for the expectation functional.

Examples

data("precip_Niamey_2016", package = "reliabilitydiag")
r <- reliabilitydiag(
  precip_Niamey_2016[c("Logistic", "EMOS", "ENS", "EPC")],
  y = precip_Niamey_2016$obs,
  region.level = NA
)
summary(r)
#> 'brier' score decomposition (see also ?summary.reliabilitydiag)
#> # A tibble: 4 × 5
#>   forecast mean_score miscalibration discrimination uncertainty
#>   <chr>         <dbl>          <dbl>          <dbl>       <dbl>
#> 1 Logistic      0.206         0.0171         0.0555       0.244
#> 2 EMOS          0.232         0.0183         0.0305       0.244
#> 3 ENS           0.266         0.0661         0.0441       0.244
#> 4 EPC           0.234         0.0223         0.0323       0.244
summary(r, score = function(y, x) (x - y)^2)
#> 'function(y, x) (x - y)^2' score decomposition (see also ?summary.reliabilitydiag)
#> # A tibble: 4 × 5
#>   forecast mean_score miscalibration discrimination uncertainty
#>   <chr>         <dbl>          <dbl>          <dbl>       <dbl>
#> 1 Logistic      0.206         0.0171         0.0555       0.244
#> 2 EMOS          0.232         0.0183         0.0305       0.244
#> 3 ENS           0.266         0.0661         0.0441       0.244
#> 4 EPC           0.234         0.0223         0.0323       0.244