A Simple Convex Optimization Algorithm

Convexity > Convex Sets | Convex Functions | Convex Problems | Simple Algorithm

In this section, we aim to show that convex optimization is a computationally tractable problem, by presenting a simple method for general-purpose convex optimization, the subgradient method.

The subgradient method
Convergence analysis and complexity
Projected subgradient method

The subgradient method

The subgradient method is a simple algorithm for minimizing a non-differentiable convex function, and more generally, solving convex optimization problems. Its complexity in terms of problem size is very good (each iteration is cheap), but in terms of accuracy, very poor (the algorithm typically requires thousands or millions of iterations). The convergence analysis of the algorithm has an important theoretical consequence: it shows that convex optimization is essentially ‘‘tractable’’, provided we can compute subgradients of the objective and constraints functions efficiently.

The method is part of a large class of methods known as first-order methods, which are based on subgradients or closely related objects. In constrast, second-order methods typically use Hessian information; each iteration is expensive (both in flops and memory) but convergence is obtained very quickly.

Setup

The setup is as follows. Let $f:mathbf{R} to mathbf{R}^n$ be a convex function. We seek to solve the uconstrained problem
p^ast = min_x : f(x).

To minimize

, the subgradient method described below uses only subgradient information at every step.

Now consider the convex, inequality constrained problem:
min : f_0(x) ~:~ f_i(x) le 0, ;; i=1,ldots, m.

We can formulate this as an unconstrained problem of minimizing the function

, with
$h(x) = left{ begin{array}{ll} f_0(x) & mbox{if }f_i(x) le 0, ;; i=1,ldots, m +infty & mbox{otherwise.} end{array} right.$
Certainly,

is not differentiable, but it is convex. So we can apply the subgradient method for unconstrained minimization to this function.

Algorithm

The algorithm proceeds as follows:
$x^{(k+1)} = x^{(k)} - alpha _k g^{(k)}, ;; g^{(k)} in partial f(x^{(k)}) .$
Here, $x^{(k)}$ is the

-th iterate, $g^{(k)}$ is any subgradient of

at $x^{(k)}$ , and alpha_k>0

is the

-th step size. Thus, at each iteration of the subgradient method, we take a step in the direction of a negative subgradient. Since this is not a descent method (the objective is not guaranteed to decrease at each step), we must keep track of the best solution seen so far, via the values
$p_{rm best} ^{(k)} = min _{0leq i leq k} : f(x^{(k)}) .$
We will also denote $overline{p} = lim_{krightarrow infty} p_{rm best} ^{(k)}$ .

Convergence analysis and complexity

Now let

be any minimizer of

. We have
$begin{array}{rcl} |x^{(k+1)} - x^star|_2^2 &=& |x^{(k)} - alpha _k g^{(k)} - x^star|_2^2 &=& |x^{)k)} - x^star|_2^2 - 2alpha_k g^{(k)T} (x^{(k)}-x^star) + alpha_k^2 |g^{(k)}|_2^2 &le& |x^{(k)} - x^star|_2^2 - 2alpha_k (f(x^{(k)}-p^star) + alpha_k^2 |g^{(k)}|_2^2 , end{array}$
where we have used the subgradient inequality:
$p^star = f(x^star) ge f(x^{(k)}) + g^{(k)T}(x^star-x^{(k)}) .$
Applying this recursively, we get
$begin{array}{rcl} |x^{(k+1)} - x^star|_2 &le & |x^{(1)}-x^star|_2^2 -2sum_{i=1}^k (f(x^{(i)}-p^star) + sum_{i=1}^k alpha_i^2 |g^{(i)}|_2^2 &le& R^2 -2sum_{i=1}^k (f(x^{(i)}-p^star) + G^2 sum_{i=1}^k alpha_i^2 . end{array}$
By definition of $p_{rm best} ^{(k)}$ , we have
$sum_{i=1}^k (f(x^{(i)}-p^star) ge (p_{rm best} ^{(k)}-p^star) left( sum_{i=1}^k alpha_i right),$
which yields
$p_{rm best}^{(k)}-p^star le frac{R^2 + G^2 sum_{i=1}^k alpha_i^2}{2sum_{i=1}^k alpha_i}.$

Complexity

The convergence analysis result (ref{eq:conv-anal-L19}) depends on the choice of the step size rule. Which rule is optimal for this bound? The problem of minimizing the upper bound above is convex and symmetric (the function does not change when we exchange variables). Hence, the optimal alpha_i

’s are all equal at optimum, to a number $alpha = (R/G) k^{-1/2}$ . With this choice of step length rule, the number of iterations needed to achieve epsilon

-sub-optimality, as predicted by the analysis, is (RG)/epsilon^2

. The dependence on epsilon

is now

, which is very slow. Interior-point methods alluded to earlier typically behave in O(log (1/epsilon))

), but they do not apply to general convex problems, only to a special class.

The above discusses the dependence of complexity on the accuracy epsilon

. The complexity depends on problem size as well, and more generally, on the structure of the function involved, via the need to compute subgradients at each step, and also the parameters

and

Example: We seek to find a point in a given intersection of closed convex sets,
C=C_1cap cdots cap C_m subseteq R^n .

We formulate this problem as minimizing

, where
$f(x)=max(mbox{bf dist}(x,C_1)cdots mbox{bf dist}(x,C_m)) .$
Let P_j

the projection operator on C_j

, and let
$f_j(x)=mbox{bf dist}(x,C_j)$
be the corresponding distance function. Thus, P_j(x)

achieves the minimum value of f_j

. Then, a subgradient of f_j

is
$g_j (x) = frac {x-P_j (x)} {|x-P_j (x)|_2}.$

Projected subgradient method

One extension of the subgradient method for solving constrained optimization problems, is the projected subgradient method. We consider the problem
min_x : f(x) ~:~ x in C,

where

is a convex set, which can be defined by a set of inequality constraints. The projected subgradient method uses the iteration
$x^{(k+1)}=P_C (x^{(k)} - alpha{(k)} g{(k)})$
where P_C

is projection on

, and

is any subgradient of

Example: Let $A in mathbf{R}^{m times n}$ , $b in mathbf{R}^m$ , with m le n

full rank (hence, AA^T succ 0

). We consider the problem
min_x : f(x) ~:~ Ax = b.

The projection on the affine space {x ::: Ax=b }

is
$P_N = I-A^T(AA^T )^{-1} A$
The subgradient iterations are given by:
$x^{(k)} = P_N(x^{(k)} - alpha^{(k)} g^{(k)}) = x^{(k)} - alpha^{(k)} P_N (g^{(k)}),$
where we have exploited the fact that the operator

is linear.