This is a technical follow-up to a previous post on assortative mixing in networks. In a footnote, I claimed that Newman’s (2003) assortativity coefficient equals the Pearson correlation coefficient when there are two possible node types. This post proves that claim.

Notation

Consider an undirected network NN in which each node has a type belonging to a (finite) set TT. The assortativity coefficient is defined as r=tTxtttTy2t1tTy2t,r=tTxtttTy2t1tTy2t, where xstxst is the proportion of edges joining nodes of type ss to nodes of type tt, and where yt=sTxstyt=sTxst is the proportion of edges incident with nodes of type tt. The Pearson correlation of adjacent nodes’ types is given by ρ=Cov(ti,tj)Var(ti)Var(tj),ρ=Cov(ti,tj)Var(ti)Var(tj), where tiTtiT and tjTtjT are the types of nodes ii and jj, and where (co)variances are computed with respect to the frequency at which nodes of type titi and tjtj are adjacent in NN.

Proof

Let T={a,b}R with ab. I show that the correlation coefficient ρ and assortativity coefficient r can be expressed as the same function of ya and xab, implying ρ=r.

Consider ρ. It can be understood by presenting the mixing matrix X=(xst) in tabular form:

ti tj xtitj
a a xaa
a b xab
b a xba
b b xbb

The first two columns enumerate the possible type pairs (ti,tj) and the third column stores the proportion of adjacent node pairs (i,j) with each type pair. This third column defines the joint distribution of types across adjacent nodes. Thus ρ equals the correlation of the first two columns, weighted by the third column. (Here xab=xba since N is undirected.) Now ti has mean E[ti]=xaaa+xaba+xbab+xbbb=yaa+ybb and second moment E[t2i]=xaaa2+xaba2+xbab2+xbbb2=yaa2+ybb2, and similar calculations reveal E[tj]=E[ti] and E[t2j]=E[t2i]. Thus ti has variance Var(ti)=E[t2i]E[ti]2=yaa2+ybb2(yaa+ybb)2=ya(1ya)a2+yb(1yb)b22yaybab and similarly Var(tj)=Var(ti). We can simplify this expression for the variance by noticing that xaa+xab+xba+xbb=1, which implies yb=xab+xbb=1xaaxba=1ya and therefore Var(ti)=ya(1ya)a2+(1ya)yab22ya(1ya)ab=ya(1ya)(ab)2. We next express the covariance Cov(ti,tj)=E[titj]E[ti]E[tj] in terms of ya and xab. Now E[titj]=xaaa2+xabab+xbaab+xbbb2=(yaxab)a2+2xabab+(ybxab)b2=yaa2+ybb2xab(ab)2 because xab=xba. It follows that Cov(ti,tj)=yaa2+ybb2xab(ab)2(yaa+ybb)2=ya(1ya)a2+yb(1yb)b22yaybabxab(ab)2=ya(1ya)(ab)2xab(ab)2, where the last line uses the fact that yb=1ya. Putting everything together, we have ρ=Cov(ti,tj)Var(ti)Var(tj)=ya(1ya)xabya(1ya), a function of ya and xab.

Now consider r. Its numerator equals tTxtttTy2t=xaa+xbby2ay2b=(yaxab)+(ybxab)y2ay2b=ya(1ya)+yb(1yb)2xab=2ya(1ya)2xab and its denominator equals 1tTy2t=1y2ay2b=1y2a(1ya)2=2ya(1ya), where and both use the fact that yb=1ya. Thus r=ya(1ya)xabya(1ya), the same function of ya and xab, and so ρ=r as claimed.

Writing ρ=r in terms of ya and xab makes it easy to check the boundary cases: if there are no within-type edges then ya=xab=1/2 and so ρ=r=1; if there are no between-type edges then xab=0 and so ρ=r=1.

Appendix: Constructing the mixing matrix

The proof relies on noticing that xab=xba, which comes from undirectedness of the network N and from how the mixing matrix X is constructed. I often forget this construction, so here’s a simple algorithm: Consider some type pair (s,t). Look at the edges beginning at type s nodes and count how many end at type t nodes. Call this count mst. Do the same for all type pairs to obtain a matrix M=(mst) of edge counts. Divide the entries in M by their sum to obtain X.