Aggregation of Binary Predictions

Combining several predictions from several forecasters or models consistently improves the accuracy of forecasts. Methods may differ depending on whether forecasts are for a binary (yes/no) or a non-binary target. This page only deals with aggregating predictions for binary outcomes.

Untrained aggregation methods
Untrained methods do not rely on learning any parameters or weights from the data, but instead aggregate the forecasts using a simple function.

Mean Forecast
The simplest approach is to just take the mean of all existing probability forecasts.

Median Forecast
Instead of the mean, one can also take the median. The median is more robust to outlier forecasts, but may use information less efficiently than the mean.

Geometric mean of odds
Instead of taking the mean of the probability forecasts, it may be better to convert the probabilities to odds first, then take the geometric mean of these odds and then convert back to probabilities.

Trained aggregation methods
Trained aggregation methods learn features such as for example weights for different forecasters from the data. Many of the approaches use a concept called "extremizing", which shifts the overall probability towards one of the extremes (0 or 1).

Logistic Regression
Outcome is modeled as Bernoulli(p), where logit(p) is a linear combination of the logit(p_i) for each forecaster i. By default, one can expect that the weights add to 1 (~roughly unbiased), and that more predictive forecasters will tend to get a larger coefficient, but since this is a regression model, what this measures is predictiveness conditional on every other forecast. So that e.g. if two forecasters are identical, one could get zero weight since the other one is doing all the work (a phenomenon known as "collinearity").

Skew Adjusted Extremized Mean (Sk-E-Mean)
Proposed by Ben Powell: Skew-Adjusted Extremized Mean (Sk-E Mean).

Extremized Mean Log Odds (EMLO)
Proposed by Jaime Sevilla: Extremized Mean Log Odds.

The Metaculus Prediction Algorithm
The current Metaculus Prediction weights forecasts by the reputation of the forecaster and recency of the forecast (in terms of the overall question lifetime).

Performance of existing aggregation approaches
In an evaluation performed by Sevilla, EMLO gives the best performance on Metaculus data from among a number of established approaches. Powell's algorithm is state of the art on Good Judgment data.