# Strong Duality

• Primal and dual problems

• Strong duality via Slater’s condition

• Geometric interpretation

• Examples

## Primal and dual problems

In this section, we consider a convex optimization problem

where the functions are convex, and are affine. We denote by the domain of the problem (which is the intersection of the domains of all the functions involved), and by its feasible set.

To the problem we associate the Lagrangian , with values

The dual function is , with values

The associated dual problem is

Note that the dual variables associated with the affine equality constraints (in vector ) are not constrained in sign.

## Strong duality via Slater’s condition

### Duality gap and strong duality

We have seen how weak duality allows to form a convex optimization problem that provides a lower bound on the original (primal) problem, even when the latter is non-convex. The duality gap is the non-negative number .

We say that strong duality holds for the above problem if the duality gap is zero: .

### Slater’s condition

We say that the problem satisfies Slater’s condition if it is strictly feasible, that is:

We can replace the above by a weak form of Slater’s condition, where strict feasibility is not required whenever the function is affine.

We then have the

Theorem: Strong duality via Slater condition

If the primal problem is convex, and satisfies the weak Slater’s condition, then strong duality holds, that is, .

Note that there are many other similar results that guarantee a zero duality gap. For example:

If is quadratic convex, and the functions are all affine, then the duality gap is always zero, provided one of the primal or dual problems is feasible. In particular, strong duality holds for any feasible linear optimization problem.

## Geometric interpretation

Assume that there is only one inequality constraint in the primal problem (), and let

The problem is feasible if and only if intersects the left-half plane. Furthermore, we have

and

If the minimum is finite, then the inequality defines a supporting hyperplane, with slope , of at . (See Figs. 5.3 and 5.4 in BV,p.233.)

If the problem is convex, then is also convex. If Slater’s condition holds, then the interior of intersects the left-half plane, and strong duality holds. (See Fig. 5.6 in BV,p.236.)

## Examples

### A convex counterexample

Convexity alone is not enough to guarantee strong duality. Consider for example the convex problem

with variables and , and domain . We have . The Lagrangian is , and the dual function is

so we can write the dual problem as

with optimal value . The optimal duality gap is . In this problem, Slater’s condition is not satisfied, since for any feasible pair .

### Minimum Euclidean distance problem

The minimum distance to an affine set mentioned in XXX is

where , . The problem is convex, and satisfies Slater’s condition (in fact, strong duality always holds for this convex quadratic problem). Hence, we know that . This allows us to compute the optimal value of the problem analytically: .

We can also find a corresponding optimal point: for every , the point achieves the minimum in the definition of the dual function . Let us set , where denotes the optimal dual variable. The point is optimal for the primal problem. Indeed, it is feasible, since , and its objective value equals to the optimal value . Hence, is optimal, as claimed.

### Linear optimization problem

Consider the LP in inequality form:

where , . Assume that the above problem is feasible, so that strong duality holds. Then the problem can be equivalently written in the dual form, as an LP:

The above LP is in standard form, with the number of constraints and variables exchanged.

Duality is another way to convert any LP in inequality form into a standard form, and vice-versa. (The other method is via the introduction of new variables.)

### Support vector machine classification

Return to the example seen here, which involved a binary classification problem. Given data points , each of which is associated with a label , the problem is to find a hyperplane that separates, as much as possible, the two classes. Let us denote .

Ideally, we would like to minimize the number of errors on the training set . This is hard as it involves a non-convex function. An upper bound on the number of errors is provided by the so-called hinge loss function

We’d also like to control robustness of the resulting linear classifier, and at the same time guarantee unicity. It turns out that these objectives can be achieved via the following problem:

where is a parameter that controls the trade-off between robustness and performance on the training set (a greater encourages performance at the expense of robustness).

The above can be written as a QP, by introducing slack variables:

or, more compactly:

The corresponding Lagrangian is

where corresponds to the sign constraints on .

The dual function is given by

We can readily solve for by taking derivatives, which leads to . Taking derivatives with respect to yields the constraint , while taking derivatives with respect to leads to the dual constraint . We obtain

We obtain the dual problem

Strong duality holds, since the primal problem is a QP.

Note that the result depends only on the so-called kernel matrix , and the dual problem involves only variables and constraints. Hence, the only dependence on the number of dimensions (features), , is via the required computation of the kernel matrix, that is, on scalar products , . Thus, duality allows a great reduction in the computational effort, compared to solving the original QP in variables and constraints. This is known as the ‘‘kernel trick’’.

Note also that duality allows to show that the optimal value of the problem is a convex function of the kernel matrix, which allows to optimize over it.