Aggregation of Binary Predictions: Difference between revisions

From Forecasting Wiki
Content added Content deleted
(Created page with "{{Banner|help wanted}}<!--- Change 'Help wanted' to 'WIP' if you don't want others to edit, then 'Review wanted' when you want feedback and approval, remove banner when review is passed.---> The content of your new article<ref>Reference here</ref> is great. == First heading == You can add LaTeX math by writing it in between the tags <nowiki><math></nowiki> and <nowiki></math></nowiki>: <math>\sigma</math> == References == <references /> <comments /> [[Category:]]")
 
 
(3 intermediate revisions by one other user not shown)
Line 1: Line 1:
{{Banner|help wanted}}<!--- Change 'Help wanted' to 'WIP' if you don't want others to edit, then 'Review wanted' when you want feedback and approval, remove banner when review is passed.--->
{{Banner|help wanted}}<!--- Change 'Help wanted' to 'WIP' if you don't want others to edit, then 'Review wanted' when you want feedback and approval, remove banner when review is passed.--->


Combining several predictions from several forecasters or models consistently improves the accuracy of forecasts. Methods may differ depending on whether forecasts are for a binary (yes/no) or a non-binary target. This page only deals with aggregating predictions for binary outcomes.
The content of your new article<ref>Reference here</ref> is great.

== Untrained aggregation methods ==
Untrained methods do not rely on learning any parameters or weights from the data, but instead aggregate the forecasts using a simple function.

=== Mean Forecast ===
The simplest approach is to just take the mean of all existing probability forecasts.

=== Median Forecast ===
Instead of the mean, one can also take the median. The median is more robust to outlier forecasts, but may use information less efficiently than the mean.

=== Geometric mean of odds ===
Instead of taking the mean of the probability forecasts, it may be better to convert the probabilities to odds first, then take the geometric mean of these odds and then convert back to probabilities<ref>https://forum.effectivealtruism.org/posts/sMjcjnnpoAQCcedL2/when-pooling-forecasts-use-the-geometric-mean-of-odds</ref>.

== Trained aggregation methods ==
Trained aggregation methods learn features such as for example weights for different forecasters from the data. Many of the approaches use a concept called "extremizing", which shifts the overall probability towards one of the extremes (0 or 1).

=== Logistic Regression ===
Outcome is modeled as Bernoulli(p), where ''logit(p)'' is a linear combination of the ''logit(p_i)'' for each forecaster ''i''. By default, one can expect that the weights add to 1 (~roughly unbiased), and that more predictive forecasters will tend to get a larger coefficient, but since this is a regression model, what this measures is predictiveness conditional on every other forecast. So that e.g. if two forecasters are identical, one could get zero weight since the other one is doing all the work (a phenomenon known as "collinearity").

=== Skew Adjusted Extremized Mean (Sk-E-Mean) ===
Proposed by Ben Powell: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4004029 Skew-Adjusted Extremized Mean (Sk-E Mean)]<ref>https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4004029</ref>.

=== Extremized Mean Log Odds (EMLO) ===
Proposed by Jaime Sevilla: [https://forum.effectivealtruism.org/s/hjiBqAJNKhfJFq7kf/p/biL94PKfeHmgHY6qe Extremized Mean Log Odds]<ref>https://forum.effectivealtruism.org/s/hjiBqAJNKhfJFq7kf/p/biL94PKfeHmgHY6qe</ref>.

=== The Metaculus Prediction Algorithm ===
The current Metaculus Prediction weights forecasts by the reputation of the forecaster and recency of the forecast (in terms of the overall question lifetime).

== Performance of existing aggregation approaches ==
In an evaluation performed by Sevilla, EMLO gives the best performance on Metaculus data from among a number of established approaches. Powell's algorithm is state of the art on Good Judgment data.


== First heading ==


You can add LaTeX math by writing it in between the tags <nowiki><math></nowiki> and <nowiki></math></nowiki>: <math>\sigma</math>


== References ==
== References ==
Line 12: Line 40:
<comments />
<comments />


[[Category:]]
[[Category:Forecast_Aggregation]]

Latest revision as of 06:09, 19 July 2022

The author would be happy about help on this article.

Combining several predictions from several forecasters or models consistently improves the accuracy of forecasts. Methods may differ depending on whether forecasts are for a binary (yes/no) or a non-binary target. This page only deals with aggregating predictions for binary outcomes.

Untrained aggregation methods[edit]

Untrained methods do not rely on learning any parameters or weights from the data, but instead aggregate the forecasts using a simple function.

Mean Forecast[edit]

The simplest approach is to just take the mean of all existing probability forecasts.

Median Forecast[edit]

Instead of the mean, one can also take the median. The median is more robust to outlier forecasts, but may use information less efficiently than the mean.

Geometric mean of odds[edit]

Instead of taking the mean of the probability forecasts, it may be better to convert the probabilities to odds first, then take the geometric mean of these odds and then convert back to probabilities[1].

Trained aggregation methods[edit]

Trained aggregation methods learn features such as for example weights for different forecasters from the data. Many of the approaches use a concept called "extremizing", which shifts the overall probability towards one of the extremes (0 or 1).

Logistic Regression[edit]

Outcome is modeled as Bernoulli(p), where logit(p) is a linear combination of the logit(p_i) for each forecaster i. By default, one can expect that the weights add to 1 (~roughly unbiased), and that more predictive forecasters will tend to get a larger coefficient, but since this is a regression model, what this measures is predictiveness conditional on every other forecast. So that e.g. if two forecasters are identical, one could get zero weight since the other one is doing all the work (a phenomenon known as "collinearity").

Skew Adjusted Extremized Mean (Sk-E-Mean)[edit]

Proposed by Ben Powell: Skew-Adjusted Extremized Mean (Sk-E Mean)[2].

Extremized Mean Log Odds (EMLO)[edit]

Proposed by Jaime Sevilla: Extremized Mean Log Odds[3].

The Metaculus Prediction Algorithm[edit]

The current Metaculus Prediction weights forecasts by the reputation of the forecaster and recency of the forecast (in terms of the overall question lifetime).

Performance of existing aggregation approaches[edit]

In an evaluation performed by Sevilla, EMLO gives the best performance on Metaculus data from among a number of established approaches. Powell's algorithm is state of the art on Good Judgment data.


References[edit]

<comments />