RelDiag — Attributes Diagram

Igor Kröner, Henning W. Rust, Tim Kruschke, Madlen Fischer, Uwe Ulbrich
Institut für Meteorologie, Freie Universität Berlin
Andreas Dobler
Potsdam-Institut für Klimafolgenforschung

Version from April 9, 2015
Please note that this documentation is still under construction. Comments or feedback is highly appreciated by an e-mail to the authors.

1 Introduction

A reliability diagram is able to present the evaluation of probabilistic predictions of a dichotomous event including a lot of information in just one plot. It shows the full joint distribution of probability forecasts as well as the relative frequency of observation of the binary predictand (e.g. dry – rain or below threshold – above threshold). Compared to scalar quantities, such as Brier Skill Score (BSS), the reliability diagram allows diagnosis of particular strengths and weaknesses in a verification data set.

BS = 1 n k=1n(y k ok)2 (1)

The most common scalar accuracy measures used in the verification of probabilistic forecasts is the Brier Score (Equation 1). Obvioulsy the BS exhibits 0 for perfect forecasts and up to 1 for poor forecasts. For a statement according to the reliability a decomposition of the BS into terms of the quantities is shown in Equation 2

BS = 1 n i=1IN i(yi ōi)2 1 n i=1IN i(ōi ō)2 + ō(1 ō) (2)

with n events, I unique forecasts – or bins –, ōi = p(oyi) the observed frequency depending on the forecast probability and ō = 1 N t=1Nok climatological occurence rate of the event.

The first term of Equation 2 belongs to the “reliability” of the forecast. It is calculated as a weighted average of the squared differences between the forecast probabilities yi an the relative frequencies of the observed event ōi. It is interpretable as a measure how close the forecast probabilities are to the corresponding observed frequencies. In gerneral the lower the better, as the term is interpretable as “violence of reliability”. But low reliability is only of value if the second term – the “resolution” – is high. This term measures how much the observed frequencies differ from the climatological occurence rate of the event in dependence to the forecast probabilities. The “uncertainty” measures the event uncertainty. As it can be seen the regarding term is only based on the observed climatology and therefore maximum when the event occurs 50% of time and zero if the event occurs never or always.

2 the diagram

RelDiag takes advantage of the R package ”verification” developed at the NCAR to plot attributes diagrams. Attributes diagrams (?) are reliability diagrams extended with useful information like no-resolution or no-skill line. In the following each element of the diagram should be briefly descriebed and discussed.


Figure 1: Prototype (12 ORA-S4, 5 GECCO2), mean number of frost days above median, lead years 2-5, 20CR Ensemble Mean as reference

In this part of the documentation each element in the diagram should get a brief description.

1:1 line The diogonal line or 1:1 line indicates prefect reliability as the observed frequencies of the event equal the forecast probabilities.

”no skill” line The no-skill line is defined through to the Brier Skill Score (BSS) with the climatological prediction as reference. The single point of the climatological prediction would be located at the intersection of the 1:1 line, no-resolution and no-skill line. Since the forecast and the observed relative frequency are both equal to the climatological probability, the climatological forecast has perfect reliability but zero resolution (?). Thus the Brier score (equation 2) for the climatological forecast is the uncertainty. As the BSS is defined as BSS = 1 BSBSref, the BSS with climatological forecast as reference results in

BSS = Resolution Reliability Uncertainty . (3)

”no resolution” line


Figure 2: Resolution

grey shaded area Points in the grey shaded area, bounded by the no–skill line and the horizontal climatology line, contribute positively to the brier skill score, based on a climatological forecast as reference. For enlarging the area a bias correction suggested by ? has be done. The plotted hyperbola is estimated out of the bias corrected decomposition of the BS, in fact of the bias corrected terms of reliability and resolution. The hyperbola function is

ok = yk2 α 2yk β (4)

refinement diagram The refinement diagram at the bottom corner to the right hand delivers important information about the distribution of predictions, expressing the frequency of occuring predictions. Out of the refinement a statement about the sharpness can be made.

uncertainty Even if the uncertainty is not explicite plotted, it is determainable directly out of the attributes diagram. Its magnitude is defined as the area of the rectangular in the left top as well as in the right bottom enclosed by the horizontal (no resolution line) and vertical climatology line.

the slope The blue line in figure 1 represents the weighted linear regression. Its slope can be used as a key indicator for usefulness of the probabilistic prediction combining reliability and resolution.

3 RelDiag parameters

The RelDiag plug-in shall offer the possibility to plot an attributes diagram. This is an extended reliability diagram, which compares the forecasted probability to the observed relative frequency.

In fact there are two possibilities to run the tool. First, with your own prepared prediction and observation data. Second, with the “leadtimeselector” in extension.

3.1 data

RelDiag works with own data as well as with data searched out of the MiKlip system using the “leadtimeselector”. If you like to use the leadtimeselector it needs to be set to “TRUE”.

  • LEADTIMESELECTOR (mandatory, True or False)

3.1.1 own data
  • mandatory if leadtimeselector=False

Here the LEADTIMESELECT0R is need to be set to ’False’. For input one NetCDF file for each ensemble member declared with a prefix (FILEHEAD) and containing the variable as a time series for the lead time and time period of interest is needed. Same for the reference data (REFHEAD).

3.1.2 leadtimeselector

See documentation “leadtimeselector”

  • mandatory if leadtimeselector=True

If the LEADTIMESELECTOR is set to ’True’ a connection to the same-named CES plug-in is realised. In routine the access to the MiKlip database is anabled with setting the parameters VARIABLE, MODEL, PROJECT, INSTITUTE, PRODUCT, ENSEMBLES, LEADTIMES, TIME FREQUENCY, DECADALS and OBSERVATION.
To guarantee stability of RelDiag, it is not a direct link but an outchecked version of the leadtimeselector-tool. This does not influence the work of RelDiag but its further developement. With ongoing progress of the leadtimeselector, further abilities in RelDiag are expected.

3.2 lonlatbox

  • LONLATBOX (mandatory, -180,180,-90,90 (default)) For each ensemble member and reference a field mean is calculated over this given region.

3.3 threshold

  • THRESHOLD (mandatory)

The threshold is calculated with a CDO [?] based syntax separately for model and reference. Cdo command for percentile calculation could be runpctl,50,TSTEPS IFILE for median. TSTEPS is indicator for the sum of time steps in the file used for calculation. If the string IFILE is used the model threshold is estimated with respect to all ensemble members of the prediction. Setting the threshold to a constant value is done by const,VALUE,IFILE with VALUE as a self defined value.

The dichotomous event is estimated by an “greater than”.

3.4 binning

  • BINS (mandatory, integer, default = 0)

Should probabilities be binned or treated as unique predictions? The larger the ensemble size, the more important is binning. No binning will be done, if set to 0 (default).

3.5 refinement

  • REFINEMENT (mandatory, histogram, pointsize, both (default) or numbers)

This option defines a graphical parameter weather the refinement

4 workflow

5 Things to consider

  • up to now only three dimensional fields (lon x lat x time) are supported. Some kind of levelselector will be integrated soon
  • there is no explicite bias-correction included! The bias is removed implicite with separatley calculated thresholds for model and reference. For thresholds out of averaging (timmean, etc.) normal distributed data is needed to remove the bias completly. Out of this it is adviced to use percentile-based thresholds.
  • Model drifts are neglected at the moment
  • at the moment there are two further development stages. One wich takes spatial pooling into account, to increase the sampling size. The other aims to analyse the reliability on a grid regarding the slope of reliability in the attributes diagram (?)


Figure 3: Structure tree of RelDiag