Aggregation of Binary Predictions: Difference between revisions

Content added Content deleted

Inline

Latest revision as of 06:09, 19 July 2022

The author would be happy about help on this article.

Combining several predictions from several forecasters or models consistently improves the accuracy of forecasts. Methods may differ depending on whether forecasts are for a binary (yes/no) or a non-binary target. This page only deals with aggregating predictions for binary outcomes.

Untrained aggregation methods[edit]

Untrained methods do not rely on learning any parameters or weights from the data, but instead aggregate the forecasts using a simple function.

Mean Forecast[edit]

The simplest approach is to just take the mean of all existing probability forecasts.

Median Forecast[edit]

Instead of the mean, one can also take the median. The median is more robust to outlier forecasts, but may use information less efficiently than the mean.

Geometric mean of odds[edit]

Instead of taking the mean of the probability forecasts, it may be better to convert the probabilities to odds first, then take the geometric mean of these odds and then convert back to probabilities^[1].

Trained aggregation methods[edit]

Trained aggregation methods learn features such as for example weights for different forecasters from the data. Many of the approaches use a concept called "extremizing", which shifts the overall probability towards one of the extremes (0 or 1).

Logistic Regression[edit]

Outcome is modeled as Bernoulli(p), where logit(p) is a linear combination of the logit(p_i) for each forecaster i. By default, one can expect that the weights add to 1 (~roughly unbiased), and that more predictive forecasters will tend to get a larger coefficient, but since this is a regression model, what this measures is predictiveness conditional on every other forecast. So that e.g. if two forecasters are identical, one could get zero weight since the other one is doing all the work (a phenomenon known as "collinearity").

Skew Adjusted Extremized Mean (Sk-E-Mean)[edit]

Proposed by Ben Powell: Skew-Adjusted Extremized Mean (Sk-E Mean)^[2].

Extremized Mean Log Odds (EMLO)[edit]

Proposed by Jaime Sevilla: Extremized Mean Log Odds^[3].

The Metaculus Prediction Algorithm[edit]

The current Metaculus Prediction weights forecasts by the reputation of the forecaster and recency of the forecast (in terms of the overall question lifetime).

Performance of existing aggregation approaches[edit]

In an evaluation performed by Sevilla, EMLO gives the best performance on Metaculus data from among a number of established approaches. Powell's algorithm is state of the art on Good Judgment data.

References[edit]

[1] ttps://forum.effectivealtruism.org/posts/sMjcjnnpoAQCcedL2/when-pooling-forecasts-use-the-geometric-mean-of-odds

[2] ttps://papers.ssrn.com/sol3/papers.cfm?abstract_id=4004029

[3] ttps://forum.effectivealtruism.org/s/hjiBqAJNKhfJFq7kf/p/biL94PKfeHmgHY6qe

[1]

[2]

[3]

@@ Line 16: / Line 16: @@
 == Trained aggregation methods ==
 Trained aggregation methods learn features such as for example weights for different forecasters from the data. Many of the approaches use a concept called "extremizing", which shifts the overall probability towards one of the extremes (0 or 1).
+=== Logistic Regression ===
-=== Skew Adjusted Extremized Mean (SK-E-Mean) ===
+Outcome is modeled as Bernoulli(p), where ''logit(p)'' is a linear combination of the ''logit(p_i)'' for each forecaster ''i''.  By default, one can expect that the weights add to 1 (~roughly unbiased), and that more predictive forecasters will tend to get a larger coefficient, but since this is a regression model, what this measures is predictiveness conditional on every other forecast.  So that e.g. if two forecasters are identical, one could get zero weight since the other one is doing all the work (a phenomenon known as "collinearity").
+=== Skew Adjusted Extremized Mean (Sk-E-Mean) ===
 Proposed by Ben Powell: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4004029 Skew-Adjusted Extremized Mean (Sk-E Mean)]<ref>https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4004029</ref>.
@@ Line 28: / Line 31: @@
 == Performance of existing aggregation approaches ==
 In an evaluation performed by Sevilla, EMLO gives the best performance on Metaculus data from among a number of established approaches. Powell's algorithm is state of the art on Good Judgment data.
 == References ==
@@ Line 35: / Line 40: @@
 <comments />
-[[Category:]]
+[[Category:Forecast_Aggregation]]