Correlation and concordance

Let $X = (X_{1}, X_{2})$ be a random vector in $R^{2}$ . Two realizations $x$ and $x^{'}$ of $X$ form a concordant pair if $(x_{2}^{'} - x_{2})$ and $(x_{1}^{'} - x_{1})$ have the same sign. What’s the probability of sampling a concordant pair when $X$ is bivariate normal?

For example, suppose $X_{1}$ and $X_{2}$ have zero means, unit variances, and a correlation of $ρ$ . The scatter plots below show 100 realizations of $(X_{1}, X_{2})$ when $ρ \in {- 0.5, 0, 0.5}$ . These realizations contain $(\binom{100}{2}) = 4, 950$ pairs, of which 36% are concordant when $ρ = - 0.5$ . This percentage rises to 48% when $ρ = 0$ and to 71% when $ρ = 0.5$ . Increasing $ρ$ makes concordance more likely because it makes $(X_{2} - X_{1})$ larger and less noisy.

Different samples give different concordance rates due to sampling variation. We can remove this variation by deriving the concordance rate analytically. To begin, suppose $X$ has mean $E [X] = (μ_{1}, μ_{2})$ and covariance matrix $V a r (X) = [\begin{matrix} σ_{1}^{2} & ρ σ_{1} σ_{2} \\ ρ σ_{1} σ_{2} & σ_{2}^{2} \end{matrix}] .$ Then $X_{2} ∣ X_{1}$ is normal with mean $E [X_{2} ∣ X_{1}] = μ_{2} + \frac{ρ σ_{2}}{σ_{1}} (X_{1} - μ_{1})$ and variance $V a r (X_{2} ∣ X_{1}) = (1 - ρ^{2}) σ_{2}^{2} .$ So for any two realizations $x$ and $x^{'}$ of $X$ we can write $x_{2}^{'} - x_{2} = \frac{ρ σ_{2}}{σ_{1}} (x_{1}^{'} - x_{1}) + ε$ with $ε \sim N (0, 2 (1 - ρ^{2}) σ_{2}^{2})$ . Now $x_{1}^{'} - x_{1} \sim N (0, 2 σ_{1}^{2})$ is normal, and so $z \equiv \frac{x_{1}^{'} - x_{1}}{σ_{1} \sqrt{2}}$ is standard normal and exceeds zero if and only if $x_{1}^{'} > x_{1}$ . Letting $f$ and $ϕ$ be the density functions for $ε$ and $z$ then gives $\begin{aligned} Pr (x_{2}^{'} > x_{2} and x_{1}^{'} > x_{1}) & = Pr (\sqrt{2} ρ σ_{2} z + ε > 0 and z > 0) \\ = \int_{0}^{\infty} (\int_{- \sqrt{2} ρ σ_{2} z}^{\infty} f (ε) d ε) ϕ (z) d z \\ \overset{⋆}{=} \int_{0}^{\infty} (\int_{\frac{- ρ z}{\sqrt{1 - ρ^{2}}}}^{\infty} ϕ (w) d w) ϕ (z) d z \\ = \int_{0}^{\infty} (1 - Φ (\frac{- ρ z}{\sqrt{1 - ρ^{2}}})) ϕ (z) d z \\ \overset{⋆ ⋆}{=} \frac{1}{2} - \int_{0}^{\infty} Φ (\frac{- ρ z}{\sqrt{1 - ρ^{2}}}) ϕ (z) d z, \end{aligned}$ where $Φ$ is the standard normal CDF, where $⋆$ uses the change of variables $w \equiv \frac{ε}{σ_{2} \sqrt{2 (1 - ρ^{2})}},$ and where $⋆ ⋆$ uses the symmetry of $ϕ$ about $z = 0$ . But $f$ is symmetric about $ε = 0$ , which implies $Pr (x_{2}^{'} > x_{2} and x_{1}^{'} > x_{1}) = Pr (x_{2}^{'} < x_{1} and x_{1}^{'} < x_{1}),$ and therefore $\begin{aligned} C (ρ) & \equiv Pr (x and x^{'} are concordant) \\ = Pr (x_{2}^{'} > x_{2} and x_{1}^{'} > x_{1}) + Pr (x_{2}^{'} < x_{1} and x_{1}^{'} < x_{1}) \\ = 1 - 2 \int_{0}^{\infty} Φ (\frac{- ρ z}{\sqrt{1 - ρ^{2}}}) ϕ (z) d z . \end{aligned}$ The concordance rate $C (ρ)$ depends on the correlation $ρ$ of $X_{1}$ and $X_{2}$ , but not their means or variances. It has value $C (0) = 0.5$ when $ρ = 0$ because $Φ (0) = 0.5$ is constant. Intuitively, if $X_{1}$ and $X_{2}$ are uncorrelated then we can’t use $(x_{1}^{'} - x_{1})$ to predict $(x_{2}^{'} - x_{2})$ , which is equally likely to be positive or negative. Whereas if $| ρ | = 1$ then $(x_{1}^{'} - x_{1})$ predicts $(x_{2}^{'} - x_{2})$ perfectly, and so $lim_{ρ \to 1} C (ρ) = 1$ and $lim_{ρ \to - 1} C (ρ) = 0.$ The chart below verifies that the concordance rate $C (ρ)$ grows with $ρ$ . It also shows that $C (ρ) + C (1 - ρ) = 1.$ Thus, for example, we have $C (- 0.5) = 1 / 3$ and $C (0.5) = 2 / 3$ . These values remove the sampling error from the estimates 0.36 and 0.71 obtained using the 100 realizations above.