In linear discriminant analysis, there are two classes and their distributions are assumed to be Gaussian with the same covariance. The means of the two distributions are $\mu_{1}$ and $\mu_{2}$ and the shared covariance is $\Sigma$. The maximum likelihood estimates of the conditional means are simply the empirical means

and the maximum likelihood estimate of the shared covariance matrix is the emprical “within-class” covariance matrix

Here $\mathcal{X}_{k}$ is the set of examples in class $k$, $n_{k} = |\mathcal{X}_{k}|$ is the number of examples in each class, $n = n_{1} + n_{2}$ is the total number of examples, and $S_{k}$ is the empirical covariance of each class. The within-class covariance matrix can also be expressed

The unsupervised covariance is

where $\bar{x}$ is the mean of all examples. Combining these gives

and then with some manipulation

The system of equations $S w = \bar{x}_{1} - \bar{x}_{2}$ has a special form. If $x$ is a solution to the square system of equations $A x = b$, then for any scalar $\alpha$

This means that a solution to $(A + \alpha b b^{T}) z = b$ is

This allows us to draw the conclusion that

and therefore the LDA solutions using the within-class covariance and the unsupervised covariance are equivalent.