This is a technical follow-up to a previous post on assortative mixing in networks. In a footnote, I claimed that Newman’s (2003) assortativity coefficient equals the Pearson correlation coefficient when there are two possible node types. This post proves that claim.

Notation

Consider an undirected network N in which each node has a type belonging to a (finite) set T. The assortativity coefficient is defined as r=tTxtttTyt21tTyt2, where xst is the proportion of edges joining nodes of type s to nodes of type t, and where yt=sTxst is the proportion of edges incident with nodes of type t. The Pearson correlation of adjacent nodes’ types is given by ρ=Cov(ti,tj)Var(ti)Var(tj), where tiT and tjT are the types of nodes i and j, and where (co)variances are computed with respect to the frequency at which nodes of type ti and tj are adjacent in N.

Proof

Let T={a,b}R with ab. I show that the correlation coefficient ρ and assortativity coefficient r can be expressed as the same function of ya and xab, implying ρ=r.

Consider ρ. It can be understood by presenting the mixing matrix X=(xst) in tabular form:

ti tj xtitj
a a xaa
a b xab
b a xba
b b xbb

The first two columns enumerate the possible type pairs (ti,tj) and the third column stores the proportion of adjacent node pairs (i,j) with each type pair. This third column defines the joint distribution of types across adjacent nodes. Thus ρ equals the correlation of the first two columns, weighted by the third column. (Here xab=xba since N is undirected.) Now ti has mean E[ti]=xaaa+xaba+xbab+xbbb=yaa+ybb and second moment E[ti2]=xaaa2+xaba2+xbab2+xbbb2=yaa2+ybb2, and similar calculations reveal E[tj]=E[ti] and E[tj2]=E[ti2]. Thus ti has variance Var(ti)=E[ti2]E[ti]2=yaa2+ybb2(yaa+ybb)2=ya(1ya)a2+yb(1yb)b22yaybab and similarly Var(tj)=Var(ti). We can simplify this expression for the variance by noticing that xaa+xab+xba+xbb=1, which implies yb=xab+xbb=1xaaxba=1ya and therefore Var(ti)=ya(1ya)a2+(1ya)yab22ya(1ya)ab=ya(1ya)(ab)2. We next express the covariance Cov(ti,tj)=E[titj]E[ti]E[tj] in terms of ya and xab. Now E[titj]=xaaa2+xabab+xbaab+xbbb2=(yaxab)a2+2xabab+(ybxab)b2=yaa2+ybb2xab(ab)2 because xab=xba. It follows that Cov(ti,tj)=yaa2+ybb2xab(ab)2(yaa+ybb)2=ya(1ya)a2+yb(1yb)b22yaybabxab(ab)2=ya(1ya)(ab)2xab(ab)2, where the last line uses the fact that yb=1ya. Putting everything together, we have ρ=Cov(ti,tj)Var(ti)Var(tj)=ya(1ya)xabya(1ya), a function of ya and xab.

Now consider r. Its numerator equals tTxtttTyt2=xaa+xbbya2yb2=(yaxab)+(ybxab)ya2yb2=ya(1ya)+yb(1yb)2xab=2ya(1ya)2xab and its denominator equals 1tTyt2=1ya2yb2=1ya2(1ya)2=2ya(1ya), where and both use the fact that yb=1ya. Thus r=ya(1ya)xabya(1ya), the same function of ya and xab, and so ρ=r as claimed.

Writing ρ=r in terms of ya and xab makes it easy to check the boundary cases: if there are no within-type edges then ya=xab=1/2 and so ρ=r=1; if there are no between-type edges then xab=0 and so ρ=r=1.

Appendix: Constructing the mixing matrix

The proof relies on noticing that xab=xba, which comes from undirectedness of the network N and from how the mixing matrix X is constructed. I often forget this construction, so here’s a simple algorithm: Consider some type pair (s,t). Look at the edges beginning at type s nodes and count how many end at type t nodes. Call this count mst. Do the same for all type pairs to obtain a matrix M=(mst) of edge counts. Divide the entries in M by their sum to obtain X.