Let be a network with nodes, each of which has a “type” belonging to some set . We say that is “assortatively mixed” if nodes tend to have the same types as their neighbors. For example, if is a social network and is a set of interests, then assortative mixing could arise because friends tend to share interests.
How can we measure the extent of assortative mixing in ? Newman (2003) suggests the “assortativity coefficient” where is the proportion of edges joining nodes of type to nodes of type , and where is the proportion of edges incident with nodes of type . The coefficient varies between -1 and 1, and takes larger values when is more assortatively mixed. We say that is “positively sorted” if and “negatively sorted” if .
We can interpret by thinking about the “mixing matrix” . The numerator of equals the sum of diagonal entries of minus what that sum would be if the distributions of entries across rows and columns were independent. The denominator of is a normalizing constant ensuring . Thus indexes the frequency of within-type edges in relative to the frequency we would expect in a random network with the same proportion of edges incident with each type.
As an example, suppose is a realization of the planted partition model with nodes of type 1, nodes of type 2, and some proportion of edges joining nodes of type to nodes of type . Then has assortativity coefficient which equals -1 if and (i.e., there are no within-type edges), and equals 1 if and (i.e., there are no between-type edges). If then which converges to zero from below as becomes large. Intuitively, if then within-type and between-type edges occur at the same rate, but the network is slightly negatively sorted because there are slightly fewer potential within-type edges than potential between-type edges.
If then which converges to as becomes large. The figure below demonstrates this case with . The network on the left has edge frequencies and assortativity coefficient ; the network on the right has edge frequencies and assortativity coefficient . Both networks are drawn so that adjacent nodes are closer together. Nodes in the positively sorted network tend to have neighbors with the same type, while nodes in the negatively sorted network tend to have neighbors with a different type.
The assortativity coefficient can be used when is a set of categorical types. In contrast, if is set of scalar quantities then we can measure the extent of assortative mixing via the Pearson correlation coefficient where and are the “types” of nodes and , and where (co)variances are computed with respect to the frequency at which nodes of type and are adjacent in the network.1 To see how this works, let be the adjacency matrix for and let be the “weighting matrix” with entries where denotes the sum of elements in . Then the vector of node types has mean where is the vector of row sums Intuitively, describes the probability mass function for the (marginal) distribution of node types. Treating and as draws from this distribution, we have and similarly where is the matrix with principal diagonal equal to and off-diagonal entries equal to zero. Then For example, if the nodes in are arranged such that then and so —that is, if all adjacent nodes have the same scalar type then the coefficient obtains its maximum value of unity.
One common use of the correlation coefficient is to measure assortativity with respect to nodes’ degrees (see, e.g., Newman, 2002). For example, the left-hand network in the figure above has : although nodes are sorted strongly by color, they are approximately unsorted by degree because the planted partition model from which the network is generated has no mechanism for connecting high-degree nodes. Performing a degree-preserving randomization of the network changes its assortativity with respect to nodes’ degrees by changing the joint distribution of those degrees across node pairs: