In linear discriminant analysis, there are two classes and their distributions are assumed to be Gaussian with the same covariance. The means of the two distributions are $\mu_{1}$ and $\mu_{2}$ and the shared covariance is $\Sigma$. The maximum likelihood estimates of the conditional means are simply the empirical means

$\mu_{k} = \bar{x}_{k} = \frac{1}{n_{k}} \sum_{i \in \mathcal{X}_{k}} x_{i}$

and the maximum likelihood estimate of the shared covariance matrix is the emprical “within-class” covariance matrix

$\Sigma = S_{W} = \frac{1}{n} \sum_{k \in \{1, 2\}} \sum_{i \in \mathcal{X}_{k}} (x_{i} - \bar{x}_{k}) (x_{i} - \bar{x}_{k})^{T} = \frac{1}{n} (n_{1} S_{1} + n_{2} S_{2}) .$

Here $\mathcal{X}_{k}$ is the set of examples in class $k$, $n_{k} = |\mathcal{X}_{k}|$ is the number of examples in each class, $n = n_{1} + n_{2}$ is the total number of examples, and $S_{k}$ is the empirical covariance of each class. The within-class covariance matrix can also be expressed

$S_{W} = \frac{1}{n} \sum_{i = 1}^{n} x_{i} x_{i}^{T} - \frac{n_{1}}{n} \bar{x}_{1} \bar{x}_{1}^{T} - \frac{n_{2}}{n} \bar{x}_{2} \bar{x}_{2}^{T} .$

The unsupervised covariance is

$S = \frac{1}{n} \sum_{i = 1}^{n} (x_{i} - \bar{x}) (x_{i} - \bar{x})^{T} = \frac{1}{n} \sum_{i = 1}^{n} x_{i} x_{i}^{T} - \bar{x} \bar{x}^{T}$

where $\bar{x}$ is the mean of all examples. Combining these gives

$\frac{1}{n} \sum_{i = 1}^{n} x_{i} x_{i}^{T} = S_{W} + \frac{n_{1}}{n} \bar{x}_{1} \bar{x}_{1}^{T} + \frac{n_{2}}{n} \bar{x}_{2} \bar{x}_{2}^{T} = S + \bar{x} \bar{x}^{T}$

and then with some manipulation

\begin{align} n^{2} S_{W} + n_{1} (n_{1} + n_{2}) \bar{x}_{1} \bar{x}_{1}^{T} + n_{2} (n_{1} + n_{2}) \bar{x}_{2} \bar{x}_{2}^{T} & = n^{2} S + (n_{1} \bar{x}_{1} + n_{2} \bar{x}_{2}) (n_{1} \bar{x}_{1} + n_{2} \bar{x}_{2})^{T} \\ n^{2} S_{W} + n_{1} n_{2} \bar{x}_{1} \bar{x}_{1}^{T} + n_{1} n_{2} \bar{x}_{2} \bar{x}_{2}^{T} & = n^{2} S + n_{1} n_{2} \bar{x}_{1} \bar{x}_{2}^{T} + n_{1} n_{2} \bar{x}_{2} \bar{x}_{1}^{T} \\ S_{W} + \frac{n_{1} n_{2}}{n^{2}} (\bar{x}_{1} - \bar{x}_{2}) (\bar{x}_{1} - \bar{x}_{2})^{T} & = S . \end{align}

The system of equations $S w = \bar{x}_{1} - \bar{x}_{2}$ has a special form. If $x$ is a solution to the square system of equations $A x = b$, then for any scalar $\alpha$

$(A + \alpha b b^{T}) x = b + \alpha b b^{T} x = (1 + \alpha b^{T} x) b .$

This means that a solution to $(A + \alpha b b^{T}) z = b$ is

$z = \frac{1}{1 + \alpha b^{T} x} x .$

This allows us to draw the conclusion that

$S^{-1} (\bar{x}_{1} - \bar{x}_{2}) \propto S_{W}^{-1} (\bar{x}_{1} - \bar{x}_{2})$

and therefore the LDA solutions using the within-class covariance and the unsupervised covariance are equivalent.