In linear discriminant analysis, there are two classes and their distributions are assumed to be Gaussian with the same covariance. The means of the two distributions are and and the shared covariance is . The maximum likelihood estimates of the conditional means are simply the empirical means

and the maximum likelihood estimate of the shared covariance matrix is the emprical “within-class” covariance matrix

Here is the set of examples in class , is the number of examples in each class, is the total number of examples, and is the empirical covariance of each class. The within-class covariance matrix can also be expressed

The unsupervised covariance is

where is the mean of all examples. Combining these gives

and then with some manipulation

The system of equations has a special form. If is a solution to the square system of equations , then for any scalar

This means that a solution to is

This allows us to draw the conclusion that

and therefore the LDA solutions using the within-class covariance and the unsupervised covariance are equivalent.