Suppose X and Y are random variables with Y=βX+u, where u has zero mean and zero correlation with X. The coefficient β can be estimated by collecting data (Yi,Xi)i=1n and regressing the Yi on the Xi. Now suppose our data collection procedure is flawed: instead of observing Xi, we observe Zi=Xi+vi, where the vi are iid with zero mean and zero correlation with the Xi. Then the ordinary least squares (OLS) estimate β^OLS of β obtained by regressing the Yi on the Zi suffers from attenuation bias: plimnβ^OLS=Cov(Y,Z)Var(Z)=Cov(βX+u,X+v)Var(X+v)=βVar(X)Var(X)+Var(v)=β1+Var(v)/Var(X) and so |β^OLS|<|β| asympotically whenever Var(v)>0. Intuitively, the measurement errors vi spread out the independent variable, flattening the fitted regression line.

One way to reduce attenuation bias is to replace OLS with total least squares (TLS), which accounts for noise in the dependent and independent variables. As a demonstration, the chart below compares the OLS and TLS lines of best fit through some randomly generated data (Yi,Zi)i=1n with β=1. The OLS estimate β^OLS=0.43 minimizes the sum of squared vertical deviations of the data from the fitted line. In contrast, the TLS estimate β^TLS=0.95 minimizes the sum of squared perpendicular deviations of the data from the fitted line. For these data, the TLS estimate is unbiased because u and v have the same variance.

However, if u and v have different variances then the TLS estimate of β is biased. I demonstrate this phenomenon in the chart below, which compares the OLS and TLS estimates of β=1 for varying Var(u) and Var(v) when X is standard normal. I plot the bias E[β^β] and mean squared error E[(β^β)2] of each estimate β^{β^OLS,β^TLS}, obtained by simulating the data-generating process 100 times for each (Var(u),Var(v)) pair.

If Var(u)>Var(v) then the TLS estimate β^TLS is biased upward because the data are relatively stretched vertically; if Var(u)<Var(v) then β^TLS is biased downward because the data are relatively stretched horizontally. The OLS estimate is biased downward whenever Var(u)>0 due to attenuation. The TLS estimate is less biased and has smaller mean squared error than the OLS estimate when Var(u)<Var(v), suggesting that TLS generates “better” estimates than OLS when the measurement errors vi are relatively large.

One problem with TLS estimates is that they depend on the units in which variables are measured. For example, suppose Yi is person i's weight and Zi is their height. If I measure Yi in pounds, generate a TLS estimate β^TLS, use this estimate to predict the weight in pounds of someone six feet tall, and then convert my prediction to kilograms, I get a different result than if I had measured Yi in kilograms initially. This unit-dependence arises because rescaling the dependent variable affects each perpendicular deviation differently.

In contrast, OLS-based predictions do not depend on the units in which I measure Yi. Rescaling the dependent variable multiplies each vertical deviation by the same constant, leaving the squared deviation-minimizing coefficient unchanged.