Ridge regression is the name given to least-squares regression with squared Euclidean norm regularisation added. Given example vectors of dimension with scalar labels , the problem is expressed as finding the weight vector and scalar bias which minimise the objective function
Eliminating the bias
Setting the derivative of with respect to to zero yields
and therefore the problem is to find the minimiser of
From this point on we will assume that the example vectors and the labels have been pre-processed to have zero-mean, leading to the simplified form
Let us introduce the notation that is an matrix whose columns are the example vectors and is a vector comprising the corresponding labels, writing the objective as .
Solving for the weights in the primal
The problem above can be re-written as
where is the covariance matrix. The solution to this unconstrained quadratic program is simply .
The dual problem
The problem can be converted into a constrained minimisation problem
whose Lagrangian is
Setting derivatives with respect to the primal variables to zero, we obtain
Making these substitutions to eliminate and gives the dual function
and the dual problem is
where is the kernel matrix. The solution is obtained and then .
Primal vs dual
We now have two equivalent solutions, one using the covariance matrix and the other the kernel matrix.