Aggregation of Binary Predictions: Difference between revisions

Content added Content deleted

Inline

Revision as of 18:55, 11 July 2022

The author would be happy about help on this article.

Combining several predictions from several forecasters or models consistently improves the accuracy of forecasts. Methods may differ depending on whether forecasts are for a binary (yes/no) or a non-binary target. This page only deals with aggregating predictions for binary outcomes.

Untrained aggregation methods

Untrained methods do not rely on learning any parameters or weights from the data, but instead aggregate the forecasts using a simple function.

Mean Forecast

The simplest approach is to just take the mean of all existing probability forecasts.

Median Forecast

Instead of the mean, one can also take the median. The median is more robust to outlier forecasts, but may use information less efficiently than the mean.

Geometric mean of odds

Instead of taking the mean of the probability forecasts, it may be better to convert the probabilities to odds first, then take the geometric mean of these odds and then convert back to probabilities^[1].

Trained aggregation methods

Trained aggregation methods learn features such as for example weights for different forecasters from the data. Many of the approaches use a concept called "extremizing", which shifts the overall probability towards one of the extremes (0 or 1).

Skew Adjusted Extremized Mean (Sk-E-Mean)

Proposed by Ben Powell: Skew-Adjusted Extremized Mean (Sk-E Mean)^[2].

Extremized Mean Log Odds (EMLO)

Proposed by Jaime Sevilla: Extremized Mean Log Odds^[3].

The Metaculus Prediction Algorithm

The current Metaculus Prediction weights forecasts by the reputation of the forecaster and recency of the forecast (in terms of the overall question lifetime).

Performance of existing aggregation approaches

In an evaluation performed by Sevilla, EMLO gives the best performance on Metaculus data from among a number of established approaches. Powell's algorithm is state of the art on Good Judgment data.

References

[[Category:]]

[1] ttps://forum.effectivealtruism.org/posts/sMjcjnnpoAQCcedL2/when-pooling-forecasts-use-the-geometric-mean-of-odds

[2] ttps://papers.ssrn.com/sol3/papers.cfm?abstract_id=4004029

[3] ttps://forum.effectivealtruism.org/s/hjiBqAJNKhfJFq7kf/p/biL94PKfeHmgHY6qe

[1]

[2]

[3]

@@ Line 18: / Line 18: @@
 Trained aggregation methods learn features such as for example weights for different forecasters from the data. Many of the approaches use a concept called "extremizing", which shifts the overall probability towards one of the extremes (0 or 1).
-=== Skew Adjusted Extremized Mean (SK-E-Mean) ===
+=== Skew Adjusted Extremized Mean (Sk-E-Mean) ===
 Proposed by Ben Powell: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4004029 Skew-Adjusted Extremized Mean (Sk-E Mean)]<ref>https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4004029</ref>.
@@ Line 28: / Line 28: @@
 == Performance of existing aggregation approaches ==
 In an evaluation performed by Sevilla, EMLO gives the best performance on Metaculus data from among a number of established approaches. Powell's algorithm is state of the art on Good Judgment data.
 == References ==