A measure of the association between a binary variable, X, taking values 0 and 1, and a continuous random variable, Y. If it is assumed that for each value of X the distribution of Y is normal, with different means but the same variance, then an appropriate measure is the point biserial correlation coefficient. This is estimated from a sample as rpb (-1≤rpb≤1), given by , where ȳ1 and ȳ0 are the mean Y-values corresponding to the two values of X, is the sample variance (using the n−1 divisor) of the combined set of nY-values, and p is the proportion of X values equal to 1.
If it can be assumed that X is a dichotomous representation of an underlying continuous random variable, W, with W and Y having a bivariate normal distribution, then an appropriate measure is the biserial correlation coefficient. This is estimated as rb, given by where and h is the value defined by P(Z≥h)=p, for a standard normal variable Z.
Subjects: Probability and Statistics.