Linear models cannot be estimated when regressors are perfectly correlated, and their coefficients have large variances when regressors are almost-perfectly correlated. But how does coefficients’ correlation depend on regressors’ correlation?
To answer this question, suppose I have data (yi,xi,zi)ni=1 generated by the process yi=β1xi+β2zi+εi, where the xi and zi are normalized to have zero mean and unit variance, and where the εi are iid with zero mean and zero correlation with the xi and zi. If the xi and zi are not perfectly correlated then the OLS estimator ˆβ of the coefficient vector (β1,β2) has variance Var(ˆβ)=σ2n(1−ρ2)[1−ρ−ρ1], where σ2 is the variance of the εi, and where ρ is the (empirical) correlation of the xi and zi. It follows that Cor(ˆβ1,ˆβ2)=−Cor(xi,zi) whenever the xi and zi are not perfectly correlated. As their correlation grows, the mean slope of the data in the directions spanned by the xi and zi approaches (β1+β2), and so the OLS estimates ˆβ1 and ˆβ2 increasingly “compete” for contributions to their sum: if sampling error leads to one coefficient being over-estimated then the other coefficient must be under-estimated to preserve the sum. This competition drives the decreasing correlation of ˆβ1 and ˆβ2 as the xi and zi become more correlated.
The correlation of the xi and zi also determines the precision with which (β1±β2) can be estimated. In particular, the expression for Var(ˆβ) above implies Var(ˆβ1±ˆβ2)=2σ2n(1±ρ) for |ρ|<1. As the xi and zi become more correlated (i.e., ρ rises), over-estimates of β1 must increasingly coincide with under-estimates of β2, and so the estimate of (β1+β2) becomes more precise because the errors cancel out. Conversely, the estimate of (β1−β2) becomes less precise as ρ rises because the errors in ˆβ1 and ˆβ2 amplify each other.
One application of this relationship between Var(ˆβ1±ˆβ2) and ρ is to experimental design. Suppose I want to estimate the effect of receiving two treatments—say, doses of a single vaccine—on some outcome of interest. The xi and zi indicate whether individual i receives each dose, the coefficients β1 and β2 are the average treatment effects (ATEs) of receiving each dose, and the sum (β1+β2) is the ATE of receiving both doses. The most precise estimate of (β1+β2) obtains when the treatments are perfectly positively correlated: that is, when people receive either zero or two doses, but no-one receives only one. Intuitively, I learn more about the effect of receiving two doses from people who receive both than from people who receive only one, so the most informative experiment cannot have anyone who receives a single dose.
On the other hand, suppose I want to compare the effect of two distinct treatments—say, doses of different vaccines—on my outcome of interest. Then I want to estimate (β1−β2), which I can do most precisely when the treatments are perfectly negatively correlated: that is, when people receive one type of vaccine or the other, but no-one receives both. Intuitively, I learn more about the vaccines’ relative effects from people who receive one type than from people who receive both types because the two vaccines may have confounding effects.
Thanks to Lautaro Chittaro for inspiring this post and commenting on a draft.