Let $$[n]\equiv\{1,2,\ldots,n\}$$ be a set of individuals. Suppose I have data $$\{(y_{ij},x_{ij}):i,j\in[n]\ \text{with}\ i<j\}$$ on pairs in $$[n]$$ generated by the process $$\renewcommand{\epsilon}{\varepsilon} y_{ij}=x_{ij}\beta+\epsilon_{ij},$$ where $$x_{ij}$$ is a row vector of pair $$\{i,j\}$$'s characteristics, $$\beta$$ is a vector of coefficients to be estimated, and $$\epsilon_{ij}$$ is a random error term with zero mean and zero correlation with the $$x_{ij}$$. For example, $$[n]$$ could be the nodes in a network, $$x_{ij}$$ the dimensions along which nodes $$i$$ and $$j$$ interact, and $$y_{ij}$$ the outcome of such interaction.

We can rewrite the data-generating process (DGP) in matrix form as $$y=X\beta+\epsilon,$$ where $$y$$ is the vector of outcomes, $$X$$ is the design matrix, and $$\epsilon$$ is the vector of errors. Here $$X$$ has $$N\equiv\frac{n(n-1)}{2}$$ rows, each corresponding to a(n unordered) pair of individuals in $$[n]$$. Since the $$x_{ij}$$ and $$\epsilon_{ij}$$ are uncorrelated, the ordinary least squares estimator $$\hat\beta=(X^T\!X)^{-1}X^T\!y$$ of $$\beta$$ is unbiased. However, $$\hat\beta$$ may not be efficient because the errors $$\epsilon_{ij}$$ may be correlated. For example, if $$\epsilon_{ij}=u_i+u_j+v_{ij}$$ with $$u_i$$, $$u_j$$, and $$v_{ij}$$ independent then $$\DeclareMathOperator{\Cov}{Cov} \DeclareMathOperator{\Var}{Var} \Cov(\epsilon_{ij},\epsilon_{jk})=\Var(u_j).$$ Intuitively, the pairs $$\{i,j\}$$ and $$\{j,k\}$$ are linked through individual $$j$$, and so any errors specific to that individual affect the errors for both pairs. Consequently, the homoskedastic estimator $$\widehat{\Var}_{\text{Hom.}}(\hat\beta)=\hat\sigma^2(X^T\!X)^{-1}$$ with $$\hat\sigma^2=\frac{1}{N}\sum_{ij}\hat\epsilon_{ij}^2$$ and $$\hat\epsilon_{ij}=y_{ij}-x_{ij}\hat\beta$$ will typically under-estimate the variance in $$\hat\beta$$ by failing to account for linked pairs having dependent errors.

So, how can we account for such dependence? Consider the “sandwich” form $$\Var(\hat\beta)=BMB$$ of the (co)variance matrix for $$\hat\beta$$, where $$B=(X^T\!X)^{-1}$$ is the “bread” matrix and $$M=X^T\!VX$$ is the “meat” matrix with $$V=\Var(\epsilon)$$ the error (co)variance matrix. We need to estimate $$M$$ because we don’t observe the $$\epsilon_{ij}$$. Indexing pairs by $$p$$, the homoskedastic estimator defined above uses \begin{align} \hat{M}_{\text{Hom.}} &= \hat\sigma^2X^T\!X \\ &= \hat\sigma^2\sum_{p=1}^Nx_p^T\!x_p, \end{align} which assumes all errors have equal variance. In contrast, White (1980) suggests using \begin{align} \hat{M}_{\text{White}} &= X^T\!\mathrm{diag}\left(\hat\epsilon_p^2\right)X \\ &= \sum_{p=1}^N\hat\epsilon_p^2x_p^T\!x_p, \end{align} which allows for unequal error variances (heteroskedasticity). But neither $$\hat{M}_{\text{Hom.}}$$ nor $$\hat{M}_{\text{White}}$$ allow for dyadic dependence among the errors. To that end, Aronow et al. (2017) suggest augmenting White’s estimator via \begin{align} \hat{M}_{\text{Aronow}} &= \hat{M}_{\text{White}}+\sum_{p=1}^N\sum_{q\in\mathcal{D}(p)}\hat\epsilon_p\hat\epsilon_qx_p^T\!x_q, \end{align} where $$\mathcal{D}(p)$$ is the set of pairs $$q\not=p$$ linked to $$p$$ by a shared individual. We can express $$\hat{M}_{\text{Aronow}}$$ in matrix form as $$\hat{M}_{\text{Aronow}}=X^T\!\left(D\odot\hat\epsilon\hat\epsilon^T\!\right)X,$$ where $$D=(d_{pq})$$ is the dyadic dependence matrix with $$d_{pq}=\begin{cases} 1 & \text{if pairs}\ p\ \text{and}\ q\ \text{are linked}\\ 0 & \text{otherwise}, \end{cases}$$ and where $$\odot$$ denotes element-wise multiplication. Aronow et al. show that, under mild conditions, $$B\hat{M}_{\text{Aronow}}B$$ is a consistent estimator for $$\Var(\hat\beta)$$ when the data exhibit dyadic dependence.1

To see Aronow et al.‘s estimator in action, suppose the DGP is given by the system \begin{align} y_{ij} &= \beta x_{ij}+\epsilon_{ij} \\ x_{ij} &= z_i+z_j \\ \epsilon_{ij} &= u_i+u_j+v_{ij}, \end{align} where $$z_i$$, $$z_j$$, $$u_i$$, $$u_j$$ and $$v_{ij}$$ are iid standard normal, and $$\beta=1$$ is the (scalar) coefficient to be estimated. Both the $$x_{ij}$$ and the $$\epsilon_{ij}$$ exhibit dyadic dependence, so we expect the homoskedastic and White estimators to under-estimate the true variance in $$\hat\beta$$. Indeed, the box plots below show that Aronow et al.‘s estimator is less biased than the homoskedastic and White estimators, and gets more accurate as the number of individuals $$n$$ grows. Aronow et al.‘s estimator can also be applied to generalized linear models. For example, suppose $$y_{ij}=\begin{cases} 1 & \text{if nodes}\ i\ \text{and}\ j\ \text{are adjacent} \\ 0 & \text{otherwise} \end{cases}$$ is an indicator for the event in which nodes $$i$$ and $$j$$ are adjacent in a network. We can model the link formation process as $$\Pr(y_{ij}=1)=\Lambda^{-1}(x_{ij}\beta+\epsilon_{ij}),$$ where $$\Lambda(x)\equiv\log(x/(1-x))$$ is the logit link function. The logistic regression estimate $$\hat\beta$$ of $$\beta$$ reveals how the observable characteristics $$x_{ij}$$ of nodes $$i$$ and $$j$$ determine their probability of being adjacent. We can estimate the variance of $$\hat\beta$$ consistently by letting $$\hat{P}_{ij}=\Lambda^{-1}(x_{ij}\hat\beta)$$ be the predicted probability for pair $$\{i,j\}$$, replacing the bread matrix $$B=(X^T\!X)^{-1}$$ with $$\hat{B}=\left(X^T\mathrm{diag}\left(\hat{P}_{ij}\left(1-\hat{P}_{ij}\right)\right)X\right)^{-1},$$ and computing $$\hat{B}\hat{M}_{\text{Aronow}}\hat{B}$$. My co-authors and I use this approach in “Research Funding and Collaboration:” we estimate how grant proposal outcomes determine the probability with which pairs of researchers co-author, and we compare $$\hat\sigma^2\hat{B}$$ and $$\hat{B}\hat{M}_{\text{Aronow}}\hat{B}$$ to show that our inferences are robust to dyadic dependence.

1. Fafchamps and Gubert (2007) describe a similar variance estimator to Aronow et al. but do not establish its consistency. ↩︎