Calibration: Difference between revisions

From Forecasting Wiki
Content added Content deleted
(Created page with "{{Banner|WIP}}<!--- Change 'WIP' to 'Help wanted' then 'Review wanted' when you're at those stages, remove when review is passed---> Calibration refers to the propensity of a forecaster's forecasts to occur at the approximate frequency of their prediction. For example, a forecaster who forecasts 10 events at 40% each and 4 of those events ultimately occur exhibits good calibration. If 3 or 5 of these events occur, the forecaster may still be exhibiting reasonable calibr...")
 
(Update description of calibration to reflect different forms of calibration)
 
(2 intermediate revisions by one other user not shown)
Line 1: Line 1:
{{Banner|WIP}}<!--- Change 'WIP' to 'Help wanted' then 'Review wanted' when you're at those stages, remove when review is passed--->
{{Banner|WIP}}<!--- Change 'WIP' to 'Help wanted' then 'Review wanted' when you're at those stages, remove when review is passed--->



Calibration refers to the propensity of a forecaster's forecasts to occur at the approximate frequency of their prediction. For example, a forecaster who forecasts 10 events at 40% each and 4 of those events ultimately occur exhibits good calibration. If 3 or 5 of these events occur, the forecaster may still be exhibiting reasonable calibration and merely have been slightly unlucky. Greater deviations indicate that the forecaster is less calibrated. Accurately judging a forecaster's calibration requires the resolution of many forecasts across the spectrum of probabilities.
Calibration describes the statistical consistency between a probabilistic forecast and the observed values <ref>[https://psycnet.apa.org/record/2011-26535-000] Gneiting, T., Balabdaoui, F. and Raftery, A.E. (2007), Probabilistic forecasts, calibration and sharpness. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69: 243-268. <nowiki>https://doi.org/10.1111/j.1467-9868.2007.00587.x</nowiki></ref>. One can distinguish different forms of calibration, most importantly probabilistic calibration, marginal calibration and exceedance calibration. Among these, probabilistic calibration is by far the most used.

== Probabilistic calibration ==
Probabilistic calibration refers to the propensity of a forecaster's forecasts to occur at the approximate frequency of their prediction. For example, a forecaster who forecasts 10 events at 40% each and 4 of those events ultimately occur exhibits good calibration. If 3 or 5 of these events occur, the forecaster may still be exhibiting reasonable calibration and merely have been slightly unlucky. Greater deviations indicate that the forecaster is less calibrated. Accurately judging a forecaster's calibration requires the resolution of many forecasts across the spectrum of probabilities.


== Assessing Calibration ==
== Assessing Calibration ==
Line 7: Line 11:
One common approach for doing so visually is the ''Calibration Plot''. Calibration plots are, roughly, vertical box-and-whisker diagrams showing the distribution of resolution frequencies for a given forecaster's track record. For example:
One common approach for doing so visually is the ''Calibration Plot''. Calibration plots are, roughly, vertical box-and-whisker diagrams showing the distribution of resolution frequencies for a given forecaster's track record. For example:


[[File:Metaculus Calibration Plot.png|frame|center|400px]]
[[File:Metaculus Calibration Plot.png|frameless|center|700px]]

Here we can see a clear correlation between Metaculus' predictions and the resolutions. Moreover, we can see that error isn't systematically consistent (i.e., boxes aren't consistently above or below the dotted "perfect calibration" line), meaning we can't use a simple linear correction to improve upon Metaculus' forecasts.


== References ==
== References ==
Line 14: Line 20:
<comments />
<comments />


[[Category:]]
[[Category:Stub]]

Latest revision as of 12:11, 10 June 2022

This article is work in progess. The author is working on it and it is not yet ready for review.


Calibration describes the statistical consistency between a probabilistic forecast and the observed values [1]. One can distinguish different forms of calibration, most importantly probabilistic calibration, marginal calibration and exceedance calibration. Among these, probabilistic calibration is by far the most used.

Probabilistic calibration[edit]

Probabilistic calibration refers to the propensity of a forecaster's forecasts to occur at the approximate frequency of their prediction. For example, a forecaster who forecasts 10 events at 40% each and 4 of those events ultimately occur exhibits good calibration. If 3 or 5 of these events occur, the forecaster may still be exhibiting reasonable calibration and merely have been slightly unlucky. Greater deviations indicate that the forecaster is less calibrated. Accurately judging a forecaster's calibration requires the resolution of many forecasts across the spectrum of probabilities.

Assessing Calibration[edit]

One common approach for doing so visually is the Calibration Plot. Calibration plots are, roughly, vertical box-and-whisker diagrams showing the distribution of resolution frequencies for a given forecaster's track record. For example:

Here we can see a clear correlation between Metaculus' predictions and the resolutions. Moreover, we can see that error isn't systematically consistent (i.e., boxes aren't consistently above or below the dotted "perfect calibration" line), meaning we can't use a simple linear correction to improve upon Metaculus' forecasts.

References[edit]

  1. [1] Gneiting, T., Balabdaoui, F. and Raftery, A.E. (2007), Probabilistic forecasts, calibration and sharpness. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69: 243-268. https://doi.org/10.1111/j.1467-9868.2007.00587.x

<comments />