Let be a set of individuals. Suppose I have data on pairs in generated by the process where is a row vector of pair 's characteristics, is a vector of coefficients to be estimated, and is a random error term with zero mean and zero correlation with the . For example, could be the nodes in a network, the dimensions along which nodes and interact, and the outcome of such interaction.
We can rewrite the data-generating process (DGP) in matrix form as where is the vector of outcomes, is the design matrix, and is the vector of errors. Here has rows, each corresponding to a(n unordered) pair of individuals in . Since the and are uncorrelated, the ordinary least squares estimator of is unbiased. However, may not be efficient because the errors may be correlated. For example, if with , , and independent then Intuitively, the pairs and are linked through individual , and so any errors specific to that individual affect the errors for both pairs. Consequently, the homoskedastic estimator with and will typically under-estimate the variance in by failing to account for linked pairs having dependent errors.
So, how can we account for such dependence? Consider the “sandwich” form of the (co)variance matrix for , where is the “bread” matrix and is the “meat” matrix with the error (co)variance matrix. We need to estimate because we don’t observe the . Indexing pairs by , the homoskedastic estimator defined above uses which assumes all errors have equal variance. In contrast, White (1980) suggests using which allows for unequal error variances (heteroskedasticity). But neither nor allow for dyadic dependence among the errors. To that end, Aronow et al. (2017) suggest augmenting White’s estimator via where is the set of pairs linked to by a shared individual. We can express in matrix form as where is the dyadic dependence matrix with and where denotes element-wise multiplication. Aronow et al. show that, under mild conditions, is a consistent estimator for when the data exhibit dyadic dependence.1
To see Aronow et al.‘s estimator in action, suppose the DGP is given by the system where , , , and are iid standard normal, and is the (scalar) coefficient to be estimated. Both the and the exhibit dyadic dependence, so we expect the homoskedastic and White estimators to under-estimate the true variance in . Indeed, the box plots below show that Aronow et al.‘s estimator is less biased than the homoskedastic and White estimators, and gets more accurate as the number of individuals grows.
Aronow et al.‘s estimator can also be applied to generalized linear models. For example, suppose is an indicator for the event in which nodes and are adjacent in a network. We can model the link formation process as where is the logit link function. The logistic regression estimate of reveals how the observable characteristics of nodes and determine their probability of being adjacent. We can estimate the variance of consistently by letting be the predicted probability for pair , replacing the bread matrix with and computing . My co-authors and I use this approach in “Research Funding and Collaboration:” we estimate how grant proposal outcomes determine the probability with which pairs of researchers co-author, and we compare and to show that our inferences are robust to dyadic dependence.
-
Fafchamps and Gubert (2007) describe a similar variance estimator to Aronow et al. but do not establish its consistency. ↩︎