Example data set of binary observations and probability forecasts

The forecasts X01 to X10 are generated in such a way that their discrimination ability is neatly decreasing. In addition, X01 and X06 are "calibrated", X02 and X07 are "underconfident", X03 and X08 are "overconfident", X04 and X09 exhibit "negative bias", and X05 and X10 exhibit "positive bias".

Usage

ex_binary

Format

A data frame with 1,000 rows and 11 columns, generated as described in 'Details':

y: observations
X01: forecasts, full information, calibrated: $a = 1$, $b = 1$
X02: forecasts, less information than X01, underconfident: $a = 1/4$, $b = 1/4$
X03: forecasts, less information than X02, overconfident: $a = 4$, $b = 4$
X04: forecasts, less information than X03, negative bias: $a = 5/3$, $b = 3/5$
X05: forecasts, less information than X04, positive bias: $a = 3/5$, $b = 5/3$
X06: forecasts, less information than X05, calibrated: $a = 1$, $b = 1$
X07: forecasts, less information than X06, underconfident: $a = 1/4$, $b = 1/4$
X08: forecasts, less information than X07, overconfident: $a = 4$, $b = 4$
X09: forecasts, less information than X08, negative bias: $a = 5/3$, $b = 3/5$
X10: forecasts, least information, positive bias: $a = 2/3$, $b = 3/2$

Details

The observations are generated from a Bernoulli distribution, where the success probability is determined by ten sources of information. That is, the probability is given by $$p = \Phi(\sum_{i = 1}^{10} Z_i),$$ where $Z_i$, $i = 1, ..., 10,$ are independent standard Gaussian random variables, and $\Phi$ denotes the cumulative distribution function of the standard Gaussian distribution.

The corresponding forecasts are named in decreasing order of access to these latent Gaussian variables (that is, information content). In a first step, calibrated forecasts are generated by $p[j] = \Phi(\frac{1}{j}\sum_{i = j}^{10} Z_i)$. Subsequently, these probabilities are perturbed to introduce miscalibration using the cumulative distribution function $F$ of the beta distribution, yielding the final forecasts $$X[j] = F(p[j]; a, b),$$ where $a$ and $b$ are the positive shape parameters (see pbeta()).