Strong Duality
Primal and dual problemsIn this section, we consider a convex optimization problem
To the problem we associate the Lagrangian , with values
Strong duality via Slater’s conditionDuality gap and strong dualityWe have seen how weak duality allows to form a convex optimization problem that provides a lower bound on the original (primal) problem, even when the latter is non-convex. The duality gap is the non-negative number . We say that strong duality holds for the above problem if the duality gap is zero: . Slater’s conditionWe say that the problem satisfies Slater’s condition if it is strictly feasible, that is:
We then have the Theorem: Strong duality via Slater condition
If the primal problem is convex, and satisfies the weak Slater’s condition, then strong duality holds, that is, . Note that there are many other similar results that guarantee a zero duality gap. For example: If is quadratic convex, and the functions are all affine, then the duality gap is always zero, provided one of the primal or dual problems is feasible. In particular, strong duality holds for any feasible linear optimization problem. Geometric interpretationAssume that there is only one inequality constraint in the primal problem (), and let
The problem is feasible if and only if intersects the left-half plane. Furthermore, we have
If the problem is convex, then is also convex. If Slater’s condition holds, then the interior of intersects the left-half plane, and strong duality holds. (See Fig. 5.6 in BV,p.236.) ExamplesA convex counterexampleConvexity alone is not enough to guarantee strong duality. Consider for example the convex problem
Minimum Euclidean distance problemThe minimum distance to an affine set mentioned in XXX is
We can also find a corresponding optimal point: for every , the point achieves the minimum in the definition of the dual function . Let us set , where denotes the optimal dual variable. The point is optimal for the primal problem. Indeed, it is feasible, since , and its objective value equals to the optimal value . Hence, is optimal, as claimed. Linear optimization problemConsider the LP in inequality form:
Duality is another way to convert any LP in inequality form into a standard form, and vice-versa. (The other method is via the introduction of new variables.) Support vector machine classificationReturn to the example seen here, which involved a binary classification problem. Given data points , each of which is associated with a label , the problem is to find a hyperplane that separates, as much as possible, the two classes. Let us denote . Ideally, we would like to minimize the number of errors on the training set . This is hard as it involves a non-convex function. An upper bound on the number of errors is provided by the so-called hinge loss function
The above can be written as a QP, by introducing slack variables:
The dual function is given by
Note that the result depends only on the so-called kernel matrix , and the dual problem involves only variables and constraints. Hence, the only dependence on the number of dimensions (features), , is via the required computation of the kernel matrix, that is, on scalar products , . Thus, duality allows a great reduction in the computational effort, compared to solving the original QP in variables and constraints. This is known as the ‘‘kernel trick’’. Note also that duality allows to show that the optimal value of the problem is a convex function of the kernel matrix, which allows to optimize over it. |