This is a technical follow-up to a previous post on assortative mixing in networks. In a footnote, I claimed that Newman’s (2003) assortativity coefficient equals the Pearson correlation coefficient when there are two possible node types. This post proves that claim.
Notation
Consider an undirected network in which each node has a type belonging to a (finite) set . The assortativity coefficient is defined as where is the proportion of edges joining nodes of type to nodes of type , and where is the proportion of edges incident with nodes of type . The Pearson correlation of adjacent nodes’ types is given by where and are the types of nodes and , and where (co)variances are computed with respect to the frequency at which nodes of type and are adjacent in .
Proof
Let with . I show that the correlation coefficient and assortativity coefficient can be expressed as the same function of and , implying .
Consider . It can be understood by presenting the mixing matrix in tabular form:
The first two columns enumerate the possible type pairs and the third column stores the proportion of adjacent node pairs with each type pair. This third column defines the joint distribution of types across adjacent nodes. Thus equals the correlation of the first two columns, weighted by the third column. (Here since is undirected.) Now has mean and second moment and similar calculations reveal and . Thus has variance and similarly . We can simplify this expression for the variance by noticing that which implies and therefore We next express the covariance in terms of and . Now because . It follows that where the last line uses the fact that . Putting everything together, we have a function of and .
Now consider . Its numerator equals and its denominator equals where and both use the fact that . Thus the same function of and , and so as claimed.
Writing in terms of and makes it easy to check the boundary cases: if there are no within-type edges then and so ; if there are no between-type edges then and so .
Appendix: Constructing the mixing matrix
The proof relies on noticing that , which comes from undirectedness of the network and from how the mixing matrix is constructed. I often forget this construction, so here’s a simple algorithm: Consider some type pair . Look at the edges beginning at type nodes and count how many end at type nodes. Call this count . Do the same for all type pairs to obtain a matrix of edge counts. Divide the entries in by their sum to obtain .