Convex Functions

Convexity > Convex Sets | Convex Functions | Convex Problems | Simple Algorithm

Domain of a function
Definition of convexity
Alternate characterizations of convexity
Operations that preserve convexity
Subgradients

Domain of a function

The domain of a function $f: mathbf{R}^n rightarrow mathbf{R}$ is the set $mathbf{dom} f subseteq mathbf{R}^n$ over which is well-defined, in other words: (

{bf dom}f :={ x in mathbf{R}^n: -infty < f(x) < +infty}. ) Here are some examples:

The function with values has domain ${bf dom} f=mathbf{R}_{++}$ .
The function with values has domain ${bf dom} f={cal S}_{++}^n$ (the set of positive-definite matrices).

Definition of convexity

A function $f : mathbf{R}^n rightarrow mathbf{R}$ is convex if

is convex;
and , . Note that the convexity of the domain is required. For example, the function defined as

is not convex, although is it linear (hence, convex) on its domain .

We say that a function is concave if is convex.

Here are some examples:

The support function of a given set ${cal S} subseteq mathbf{R}^n$ , which is defined as $x rightarrow displaystylemax_{u in {cal S}} : x^Tu$ , is convex for any set .
The indicator function of a given set ${cal S} subseteq mathbf{R}^n$ , defined as
$I_S(x) = left{ begin{array}{ll} 0 & x in {cal S} , +infty & x notin S. end{array} right.$
is convex if and only if is convex.
A norm is a convex function that is positively homogeneous ( for every $xin mathbf{R}^n$ , ), and positive-definite (it is non-negative, and zero if and only if its argument is).
The quadratic function , with $P in {cal S}^n_{++}$ , is convex. (For a proof, see later.)
The function defined as for and is convex.
The function defined as for every is not convex; but if we modify to take the value on the set of non-positive real numbers, then it is convex. end{itemize}

Alternate characterizations of convexity

Let $f : mathbf{R}^n rightarrow mathbf{R}$ . The following are equivalent conditions for to be convex.

Epigraph condition: is convex if and only if its epigraph
$mbox{bf epi} f := left{ (x,t) in mathbf{R}^{n+1} ~:~ t ge f(x) right}$
is convex. We can us this result to prove for example, that the largest eigenvalue function $lambda_{rm max} : {cal S}^n rightarrow mathbf{R}$ , which to a given symmetric matrix associates its largest eigenvalue, is convex, since the condition $lambda_{rm max}(X) le t$ is equivalent to the condition that $t I - X in {cal S}_+^n$ .

Restriction to a line: The function is convex if and only if its restriction to any line is convex, meaning that for every $x_0 in mathbf{R}^n$ , and $v in mathbf{R}^n$ , the function is convex.

For example, the function f(X) = log det X is convex. (Prove this as an exercise.) You can also use this to prove that the quadratic function f(x) = x^TPx+2q^Tx + r is convex if and only if P succeq 0 .

First-order condition: If is differentiable (that is, is open and the gradient exists everywhere on the domain), then is convex if and only if

The geometric interpretation is that the graph of is bounded below everywhere by anyone of its tangents.

Second-order condition: If is twice differentiable, then it is convex if and only if its Hessian is positive semi-definite everywhere. This is perhaps the most commonly known characterization of convexity, although it is often hard to check.

For example, the function f(x,t) = x^Tx/t with domain { (x,t) ::: t >0} , is convex. (Check this!) Other examples include the log-sum-exp function, $f(x) = log sum_{i=1}^n exp x_i$ , and the quadratic function alluded to above.

Operations that preserve convexity

The nonnegative weighted sum of convex functions is convex.
The composition with an affine function preserves convexity: if $A in mathbf{R}^{m times n}$ , $b in mathbf{R}^m$ and $f : mathbf{R}^m rightarrow mathbf{R}$ is convex, then the function $g : mathbf{R}^n rightarrow mathbf{R}$ with values is convex.
The pointwise maximum of a family of convex functions is convex: if $(f_alpha)_{alpha in {cal A}}$ is a family of convex functions index by , then the function
$f(x) := max_{alpha in {cal A}} : f_alpha(x)$
is convex. For example, the dual norm
$x rightarrow max_{y :: |y| le 1} : y^Tx$
is convex, as the maximum of convex (in fact, linear) functions (indexed by the vector ). Another example is the largest singular value of a matrix : $f(A) = sigma_{rm max}(A) = max_{x::: |x|_2 = 1} |Ax|_2$ . Here, each function (indexed by $x in mathbf{R}^n$ ) is convex, since it is the composition of the Euclidean norm (a convex function) with an affine function . Also, this can be used to prove convexity of the function we introduced in lecture 2,
$|x|_{1,k} := sum_{i=1}^k |x|_{[i]} = max_{u} : u^T|x| ~:~ sum_{i=1}^n u_i = k, ;; uin{0,1}^n,$
where we use the fact that for any feasible for the maximization problem, the function is convex (since ).
If is a convex function in , then the function is convex. (Note that joint convexity in is essential.)
If is convex, its perspective with domain , is convex. You can use this to prove convexity of the function , with domain .
The composition with another function does not always preserve convexity. However, if the functions $g_i : mathbf{R}^n rightarrow mathbf{R}$ , are convex and $h : mathbf{R}^k rightarrow mathbf{R}$ is convex and non-decreasing in each argument, with $mbox{bf dom}g_i = mbox{bf dom} h = mathbf{R}$ , then is convex.

For example, if g_i ’s are convex, then log sum_i exp g_i also is.

Subgradients

Definition

Let ${f: mathbf{R}^n to mathbf{R}}$ be a convex function. The vector ${ginmathbf{R}^n}$ is a subgradient of {f} at {x} if the subgradient inequality:
$f(y) geq f(x) + g^{T}(y-x) .$
holds for every . The subdifferential of at , denoted partial f(x) , is the set of such subgradients at .

is convex, closed, never empty on the relative interiorfootnote{The relative interior of a set is the interior of the set, relative to the smallest affine subspace that contains it.} of its domain.
if is differentiable at , then the subdifferential is a singleton: = .

For example, consider f(x) = |x| for x in mathbf{R} . We have
partial f(x) = left{ begin{array}{ll} {-1} & mbox{if } x <0, [-1,1] & mbox{if } x = 0, {+1} & mbox{if } x >0. end{array}right.

Constructing subgradients

One of the most important rules for contructing a subgradient is based on the following rule.

Weak rule for point-wise supremum: if f_alpha are differentiable and convex functions that depend on a parameter alphain {cal A} , with {cal A} an arbitrary set, then
$f(x) = sup_{alpha in {cal A}} : f_alpha(x)$
is possibly non-differentiable but convex. If beta is such that f(x) = f_beta(x) , then a subgradient of at is simply any element in partial f_beta (x) .

Example: maximum eigenvalue. For $X=X^T in mathbf{R}^{n times n}$ , define $f(X) = lambda_{rm max}(X)$ to be the largest eigenvalue of ( is real valued since is symmetric). A subgradient of at can be found using the following variational (that is, optimization-based) representation of f(X) :
$f(X) = max_{y ::: |y|_2 = 1} : y^T X y .$
Any unit-norm eigenvector $y_{rm max}$ of corresponding to the largest eigenvalue achieves the maximum in the above. Hence, by the weak rule above, a subgradient of at is given by a gradient of the function $X rightarrow y_{rm max}^T X y_{rm max}$ , which is $y_{rm max}y_{rm max}^T$ .