Linear models cannot be estimated when regressors are perfectly correlated, and their coefficients have large variances when regressors are almost-perfectly correlated. But how does coefficients’ correlation depend on regressors’ correlation?
To answer this question, suppose I have data (yi,xi,zi)ni=1 generated by the process \newcommand{\abs}[1]{\lvert#1\rvert} \DeclareMathOperator{\Cor}{Cor} \DeclareMathOperator{\Cov}{Cov} \DeclareMathOperator{\Var}{Var} \renewcommand{\epsilon}{\varepsilon} y_i=\beta_1x_i+\beta_2z_i+\epsilon_i, where the x_i and z_i are normalized to have zero mean and unit variance, and where the \epsilon_i are iid with zero mean and zero correlation with the x_i and z_i. If the x_i and z_i are not perfectly correlated then the OLS estimator \hat\beta of the coefficient vector (\beta_1,\beta_2) has variance \DeclareMathOperator{\Var}{Var} \Var(\hat\beta)=\frac{\sigma^2}{n(1-\rho^2)}\begin{bmatrix}1&-\rho\\-\rho&1\end{bmatrix}, where \sigma^2 is the variance of the \epsilon_i, and where \rho is the (empirical) correlation of the x_i and z_i. It follows that \Cor(\hat\beta_1,\hat\beta_2)=-\Cor(x_i,z_i) whenever the x_i and z_i are not perfectly correlated. As their correlation grows, the mean slope of the data in the directions spanned by the x_i and z_i approaches (\beta_1+\beta_2), and so the OLS estimates \hat\beta_1 and \hat\beta_2 increasingly “compete” for contributions to their sum: if sampling error leads to one coefficient being over-estimated then the other coefficient must be under-estimated to preserve the sum. This competition drives the decreasing correlation of \hat\beta_1 and \hat\beta_2 as the x_i and z_i become more correlated.
The correlation of the x_i and z_i also determines the precision with which (\beta_1\pm\beta_2) can be estimated. In particular, the expression for \Var(\hat\beta) above implies \Var(\hat\beta_1\pm\hat\beta_2)=\frac{2\sigma^2}{n(1\pm\rho)} for \abs{\rho}<1. As the x_i and z_i become more correlated (i.e., \rho rises), over-estimates of \beta_1 must increasingly coincide with under-estimates of \beta_2, and so the estimate of (\beta_1+\beta_2) becomes more precise because the errors cancel out. Conversely, the estimate of (\beta_1-\beta_2) becomes less precise as \rho rises because the errors in \hat\beta_1 and \hat\beta_2 amplify each other.
One application of this relationship between \Var(\hat\beta_1\pm\hat\beta_2) and \rho is to experimental design. Suppose I want to estimate the effect of receiving two treatments—say, doses of a single vaccine—on some outcome of interest. The x_i and z_i indicate whether individual i receives each dose, the coefficients \beta_1 and \beta_2 are the average treatment effects (ATEs) of receiving each dose, and the sum (\beta_1+\beta_2) is the ATE of receiving both doses. The most precise estimate of (\beta_1+\beta_2) obtains when the treatments are perfectly positively correlated: that is, when people receive either zero or two doses, but no-one receives only one. Intuitively, I learn more about the effect of receiving two doses from people who receive both than from people who receive only one, so the most informative experiment cannot have anyone who receives a single dose.
On the other hand, suppose I want to compare the effect of two distinct treatments—say, doses of different vaccines—on my outcome of interest. Then I want to estimate (\beta_1-\beta_2), which I can do most precisely when the treatments are perfectly negatively correlated: that is, when people receive one type of vaccine or the other, but no-one receives both. Intuitively, I learn more about the vaccines’ relative effects from people who receive one type than from people who receive both types because the two vaccines may have confounding effects.
Thanks to Lautaro Chittaro for inspiring this post and commenting on a draft.