Processing math: 100%

The \epsilon-insensitive objective is

\min_{w} \frac{1}{2} \|w\|^{2} + C \sum_{i = 1}^{n} \max(0, |w^{T} x_{i} - y_{i}| - \epsilon) \enspace .

An equivalent expression can be formed using a slack variable.

\begin{aligned} \min_{w, \xi} \; & \frac{1}{2} \|w\|^{2} + C \sum_{i = 1}^{n} \xi_{i} \\ \text{s.t.} \; & \xi_{i} \ge w^{T} x_{i} - y_{i} - \epsilon, \\ & \xi_{i} \ge -w^{T} x_{i} + y_{i} - \epsilon, \\ & \xi_{i} \ge 0 \end{aligned}

Switching to vector notation, where x_{i} are the columns of X, the Lagrangian is

\begin{align} L(w, \xi, \alpha, \beta, \gamma) & = \frac{1}{2} \|w\|^{2} + C 1^{T} \xi + \alpha^{T} (X^{T} w - y - \epsilon 1 - \xi) \\ & \quad + \beta^{T} (-X^{T} w + y - \epsilon 1 - \xi) - \gamma^{T} \xi \enspace . \end{align}

Minimising with respect to the primal variables,

\begin{align} \frac{\partial L}{\partial w} & = w + X \alpha - X \beta = 0, & w & = X (\beta - \alpha) \\ \frac{\partial L}{\partial \xi} & = C 1 - \alpha - \beta - \gamma = 0 \enspace . \end{align}

The dual problem is therefore

\begin{aligned} \max_{\alpha, \beta, \gamma} \; & -\frac{1}{2} (\beta - \alpha)^{T} X^{T} X (\beta - \alpha) + y^{T} (\beta - \alpha) - \epsilon 1^{T} (\beta + \alpha) \\ \text{s.t.} \; & \alpha \ge 0, \; \beta \ge 0, \; \gamma \ge 0, \; \alpha + \beta + \gamma = C 1 \enspace , \end{aligned}

which is equivalent to

\begin{aligned} \min_{\upsilon} \; & \frac{1}{2} \upsilon^{T} X^{T} X \upsilon - y^{T} \upsilon + \epsilon \|\upsilon\|_{1} \\ \text{s.t.} \; & |\upsilon_{i}| \le C \end{aligned}

with the solution given w = X \upsilon.