Log score

The Log score (also sometimes referred to as surprisal) is a strictly proper scoring rule used to evaluate a forecast in light of the actually observed outcome. It was first proposed by I. J. Good in 1952 to evaluate binary forecasts. The score is commonly to score predictions and evaluate how good a distribution captures the observed data, for example in Bayesian statistics. The log score is the only local scoring rule, meaning that the score depends only on the probability (or probability density) assigned to the actually observed outcome, rather than on the entire predictive distribution.

Definition
The log score is usually computed as the negative logarithm of the predictive density evaluated at the observed value $$y$$, i.e.

$$\text{log score}(y) = -\log f(y)$$,

where $$f$$ is the predictive probability density function. Usually the natural logarithm is used, but the log score remains strictly proper for any base $$> 1$$ used for the logarithm.

In the formulation presented above, the score is negatively oriented, meaning that smaller values are better. Sometimes the sign of the log score is inversed and it is simply given as the log predictive density. If this is the case, then larger values are better.

The log score is applicable to binary outcomes as well as discrete or continuous outcomes. In the case of binary outcomes, the formula above simplifies to

$$\text{log score}(y) = -\log P(y)$$,

where $$P(y)$$ is the probability assigned to the binary outcome $$y$$. If a forecaster for example assigned 70% probability that team A would win a soccer match, then the resulting log score would be $$-\log 0.7 \approx 0.36 $$ if team A wins and $$-\log 0.3 \approx -1.20 $$ if team A loses.

Locality
The log score is a local scoring rule, meaning that the score only depends on the probability (or probability density) assigned to the actually observed values. The score therefore does not depend on the probability (or probability density) assigned to values not observed. This is in contrast to so called global proper scoring rules, which take the entire predictive distribution into account.

Penalization of Over- and Underconfidence
The log score penalizes overconfidence (i.e. a forecast that is too certain) stronger than underconfidence and therefore incentivizes forecasters to err on the side of caution.