Ordinary and total least squares

Suppose $X$ and $Y$ are random variables with $Y = β X + u,$ where $u$ has zero mean and zero correlation with $X$ . The coefficient $β$ can be estimated by collecting data $(Y_{i}, X_{i})_{i = 1}^{n}$ and regressing the $Y_{i}$ on the $X_{i}$ . Now suppose our data collection procedure is flawed: instead of observing $X_{i}$ , we observe $Z_{i} = X_{i} + v_{i}$ , where the $v_{i}$ are iid with zero mean and zero correlation with the $X_{i}$ . Then the ordinary least squares (OLS) estimate ${\hat{β}}_{OLS}$ of $β$ obtained by regressing the $Y_{i}$ on the $Z_{i}$ suffers from attenuation bias: $\begin{aligned} \underset{n \to \infty}{plim} {\hat{β}}_{OLS} & = \frac{Cov (Y, Z)}{Var (Z)} \\ = \frac{Cov (β X + u, X + v)}{Var (X + v)} \\ = \frac{β Var (X)}{Var (X) + Var (v)} \\ = \frac{β}{1 + Var (v) / Var (X)} \end{aligned}$ and so $| {\hat{β}}_{OLS} | < | β |$ asympotically whenever $Var (v) > 0$ . Intuitively, the measurement errors $v_{i}$ spread out the independent variable, flattening the fitted regression line.

One way to reduce attenuation bias is to replace OLS with total least squares (TLS), which accounts for noise in the dependent and independent variables. As a demonstration, the chart below compares the OLS and TLS lines of best fit through some randomly generated data $(Y_{i}, Z_{i})_{i = 1}^{n}$ with $β = 1$ . The OLS estimate ${\hat{β}}_{OLS} = 0.43$ minimizes the sum of squared vertical deviations of the data from the fitted line. In contrast, the TLS estimate ${\hat{β}}_{TLS} = 0.95$ minimizes the sum of squared perpendicular deviations of the data from the fitted line. For these data, the TLS estimate is unbiased because $u$ and $v$ have the same variance.

However, if $u$ and $v$ have different variances then the TLS estimate of $β$ is biased. I demonstrate this phenomenon in the chart below, which compares the OLS and TLS estimates of $β = 1$ for varying $Var (u)$ and $Var (v)$ when $X$ is standard normal. I plot the bias $E [\hat{β} - β]$ and mean squared error $E [(\hat{β} - β)^{2}]$ of each estimate $\hat{β} \in {{\hat{β}}_{OLS}, {\hat{β}}_{TLS}}$ , obtained by simulating the data-generating process 100 times for each $(Var (u), Var (v))$ pair.

If $Var (u) > Var (v)$ then the TLS estimate ${\hat{β}}_{TLS}$ is biased upward because the data are relatively stretched vertically; if $Var (u) < Var (v)$ then ${\hat{β}}_{TLS}$ is biased downward because the data are relatively stretched horizontally. The OLS estimate is biased downward whenever $Var (u) > 0$ due to attenuation. The TLS estimate is less biased and has smaller mean squared error than the OLS estimate when $Var (u) < Var (v)$ , suggesting that TLS generates “better” estimates than OLS when the measurement errors $v_{i}$ are relatively large.

One problem with TLS estimates is that they depend on the units in which variables are measured. For example, suppose $Y_{i}$ is person $i$ 's weight and $Z_{i}$ is their height. If I measure $Y_{i}$ in pounds, generate a TLS estimate ${\hat{β}}_{TLS}$ , use this estimate to predict the weight in pounds of someone six feet tall, and then convert my prediction to kilograms, I get a different result than if I had measured $Y_{i}$ in kilograms initially. This unit-dependence arises because rescaling the dependent variable affects each perpendicular deviation differently.

In contrast, OLS-based predictions do not depend on the units in which I measure $Y_{i}$ . Rescaling the dependent variable multiplies each vertical deviation by the same constant, leaving the squared deviation-minimizing coefficient unchanged.