Aggregation of Non-Binary Predictions

Combining several predictions from several forecasters or models consistently improves the accuracy of forecasts. Methods may differ depending on whether forecasts are for a binary (yes/no) or a non-binary target (like world GDP in year X). This page only deals with aggregating probabilistic forecasts (i.e. predictive distributions) for non-binary outcomes. Aggregation techniques for binary forecasts can be found here.

Ways of combining probability distributions


Probability distributions can be combined either based on their probability density functions (PDF) or their cumulative density functions (CDF). Usually, forecasts are combined using the CDF.

Vertical combinations of several CDF
A vertical combination of the CDF is equal to a mixture distribution that combines the cumulative densities of the individual forecasts. If the density of the ensemble CDF is a linear combination of the densities of the individual forecast CDF, then this ensemble is called a linear pool.

One known issue with linear pools is that they are not necessarily calibrated, given that all individual member forecasts are calibrated (see Theorem 3.1(c) of the linked paper), but rather are systematically over-confident. Linear pools may, of course, be well calibrated in instances where member forecasts are under-confident.

Horizontal combinations of several CDF
A horizontal combination of several CDF is equal to a combination of the quantiles of the CDF. These quantile ensembles are often used for example in Epidemiology. There exists no such theoretical argument that would state that a quantile ensemble needs to be miscalibrated, if all members forecasts are well calibrated.

Combinations of PDF
When combining forecasts based on their PDF, then only a vertical combination is sensible. When combining using the mean, then it does not matter whether we combine functions based on their PDF or CDF, as the sum of integrals is the same as the integral of a sum of two distributions. For combinations based e.g. on the median of the cumulative or non-cumulative density at a given point, differences may occur (although these will typically not be very large).

The forecast combination puzzle
Empirically, it is very difficult to improve on unweighted ensembles by estimating weights for individual forecasters from the data.