Chapter 9 of Kahneman et al. (2021) discusses how predictions made by humans can be less accurate than predictions made using statistical models. Part of the chapter describes research by Goldberg (1970) and subsequent authors showing that models of human predictions can out-perform the humans on which those models are based.

For example, suppose I’m asked to make predictions in a range of contexts i{1,2,,n}. My goal is to use some contextual data xiRk to predict the value of a context-specific outcome yi. I generate predictions y¯i=yi+ui, where the ui are context-specific errors. The accuracy of my predictions can be measured via their mean squared error (MSE) 1ni=1n(y¯iyi)2=1ni=1nui2, where a lower MSE implies higher accuracy. Another way to generate predictions could be to posit a linear model yi=θxi+ϵi, where θ is a row vector of coefficients and the ϵi are random errors. But I don’t know the true outcomes yi—hence needing to predict them—and so I can’t just use ordinary least squares (OLS) to estimate θ. Instead, Goldberg (1970) suggests replacing this linear model with y¯i=βxi+εi, where β is a (possibly different) vector of coefficients and the εi are (possibly different) random errors. This second model describes the linearized relationship between my (possibly incorrect) predictions y¯i and the data xi on which those predictions are based. Since I know my predictions y¯i, I can use OLS to obtain an estimate β^ of β and produce a set of “modeled predictions” y^i=β^xi. The difference between the y¯i and y^i is that the latter ignore the non-linearities in my method for generating predictions. Intuitively, the y^i represent what I would predict using a simple, linear formula; my predictions y¯i may be generated using a formula that is much more complex, or may not be generated using a formula at all.

So, how do my raw predictions y¯i and their modeled counterparts y^i compare? The chart below plots the y¯i and y^i against the true values yi when

  1. the xi and ui are iid standard normal, and
  2. yi=(xi+zi)/2 with zi iid standard normal.

The modeled predictions are far more accurate: they have an MSE of 0.22, whereas my raw predictions have an MSE of 0.76. In this case, the true relationship between the yi and xi is linear, and so a linear model of my predictions is well-placed to out-perform those predictions.

However, modeling predictions does not always improve their accuracy. For example, suppose the contextual data xi are scalars, and the xi, yi, and ui have zero means. Then the MSE of the modeled predictions turns out to be 1ni=1n(y^iyi)2=σy2+ρux2σu2ρxy2σy2, where σy2 and σu2 are the variances of the yi and ui, where ρux is the correlation of the ui and xi, and where ρxy is the correlation of the xi and yi. Consequently, replacing my raw predictions y¯i with their modeled counterparts y^i leads to an accuracy improvement if and only if σy2(1ρxy2)<σu2(1ρux2). This condition holds in the example plotted above: both σu2 and σy2 equal unity, but ρxy=0.69 is much larger in absolute value than ρux=0.09. In general, the condition is most likely to hold when

  1. σu2 is larger than σy2 (i.e., my raw predictions are relatively noisy);
  2. |ρxy| is large (i.e., the relationship between the yi and xi is approximately linear and deterministic); and
  3. |ρux| is small (i.e., the errors ui in my raw predictions are relatively uncorrelated with the xi).

Intuitively, if the outcomes yi are a linear function of the xi (i.e., if |ρxy|=1) then linearizing my predictions improves their accuracy by removing non-linear errors. On the other hand, if my prediction errors ui are a linear function of the xi (i.e., if |ρux|=1) then linearizing my predictions cannot improve their accuracy because there are no non-linear errors to remove.