Convex Functions

  • Domain of a function

  • Definition of convexity

  • Alternate characterizations of convexity

  • Operations that preserve convexity

  • Subgradients

Domain of a function

The domain of a function f: mathbf{R}^n rightarrow mathbf{R} is the set mathbf{dom} f subseteq mathbf{R}^n over which f is well-defined, in other words: (

{bf dom}f :={ x in mathbf{R}^n: -infty < f(x) < +infty}. ) Here are some examples:

  1. The function with values f(x)=log(x) has domain {bf dom} f=mathbf{R}_{++}.

  2. The function with values f(X)=logdet(X) has domain {bf dom} f={cal S}_{++}^n (the set of positive-definite matrices).

Definition of convexity

A function f : mathbf{R}^n rightarrow mathbf{R} is convex if

  1. {bf dom} f is convex;

  2. forall, x,yin {bf dom} f and forall, lambdain [0, 1], fleft(lambda x + (1-lambda) yright) le lambda f(x) + (1-lambda) f(y). Note that the convexity of the domain is required. For example, the function f : mathbf{R} rightarrow mathbf{R} defined as
     f(x) = left{ begin{array}{ll} x & mbox{if } x notin [-1,1]  +infty & mbox{otherwise} end{array}right.
    is not convex, although is it linear (hence, convex) on its domain ]-infty,-1)cup(1,+infty[.

We say that a function is concave if -f is convex.

Here are some examples:

  1. The support function of a given set {cal S} subseteq mathbf{R}^n, which is defined as x rightarrow displaystylemax_{u in {cal S}} : x^Tu, is convex for any set {cal S}.

  2. The indicator function of a given set {cal S} subseteq mathbf{R}^n, defined as
     I_S(x) = left{ begin{array}{ll} 0 & x in {cal S} ,  +infty & x notin S. end{array} right.
    is convex if and only if cal S is convex.

  3. A norm is a convex function that is positively homogeneous (f(alpha x) = alpha f(x) for every xin mathbf{R}^n, alphage0), and positive-definite (it is non-negative, and zero if and only if its argument is).

  4. The quadratic function f(x) = x^TPx+2q^Tx + r, with P in {cal S}^n_{++}, is convex. (For a proof, see later.)

  5. The function f : mathbf{R} rightarrow mathbf{R} defined as f(x) = 1/x for x>0 and f(x) = +infty is convex.

  6. The function f : mathbf{R} rightarrow mathbf{R} defined as f(x) = x^3 for every x is not convex; but if we modify f to take the value +infty on the set of non-positive real numbers, then it is convex. end{itemize}

Alternate characterizations of convexity

Let f : mathbf{R}^n rightarrow mathbf{R}. The following are equivalent conditions for f to be convex.

  1. Epigraph condition: f is convex if and only if its epigraph
     mbox{bf epi} f := left{ (x,t) in mathbf{R}^{n+1} ~:~ t ge f(x) right}
    is convex. We can us this result to prove for example, that the largest eigenvalue function lambda_{rm max} : {cal S}^n rightarrow mathbf{R}, which to a given n times n symmetric matrix X associates its largest eigenvalue, is convex, since the condition lambda_{rm max}(X) le t is equivalent to the condition that t I - X in {cal S}_+^n.

  1. Restriction to a line: The function f is convex if and only if its restriction to any line is convex, meaning that for every x_0 in mathbf{R}^n, and v in mathbf{R}^n, the function g(t) := f(x_0+tv) is convex.

For example, the function f(X) = log det X is convex. (Prove this as an exercise.) You can also use this to prove that the quadratic function f(x) = x^TPx+2q^Tx + r is convex if and only if P succeq 0.

  1. First-order condition: If f is differentiable (that is, mbox{bf dom} f is open and the gradient exists everywhere on the domain), then f is convex if and only if
     forall : x,y ~:~ f(y) ge f(x) + nabla f (x)^T(y-x) .
    The geometric interpretation is that the graph of f is bounded below everywhere by anyone of its tangents.

  1. Second-order condition: If f is twice differentiable, then it is convex if and only if its Hessian nabla^2 f is positive semi-definite everywhere. This is perhaps the most commonly known characterization of convexity, although it is often hard to check.

For example, the function f(x,t) = x^Tx/t with domain { (x,t) ::: t >0}, is convex. (Check this!) Other examples include the log-sum-exp function, f(x) = log sum_{i=1}^n exp x_i, and the quadratic function alluded to above.

Operations that preserve convexity

  1. The nonnegative weighted sum of convex functions is convex.

  2. The composition with an affine function preserves convexity: if A in mathbf{R}^{m times n}, b in mathbf{R}^m and f : mathbf{R}^m rightarrow mathbf{R} is convex, then the function g : mathbf{R}^n rightarrow mathbf{R} with values g(x) = f(Ax+b) is convex.

  3. The pointwise maximum of a family of convex functions is convex: if (f_alpha)_{alpha in {cal A}} is a family of convex functions index by alpha, then the function
     f(x) := max_{alpha in {cal A}} : f_alpha(x)
    is convex. For example, the dual norm
     x rightarrow max_{y :: |y| le 1} : y^Tx
    is convex, as the maximum of convex (in fact, linear) functions (indexed by the vector y). Another example is the largest singular value of a matrix A: f(A) = sigma_{rm max}(A) = max_{x::: |x|_2 = 1} |Ax|_2. Here, each function (indexed by x in mathbf{R}^n) A rightarrow |Ax|_2 is convex, since it is the composition of the Euclidean norm (a convex function) with an affine function A rightarrow Ax. Also, this can be used to prove convexity of the function we introduced in lecture 2,
     |x|_{1,k} := sum_{i=1}^k |x|_{[i]} = max_{u} : u^T|x| ~:~ sum_{i=1}^n u_i = k, ;; uin{0,1}^n,
    where we use the fact that for any u feasible for the maximization problem, the function x rightarrow u^T|x| is convex (since uge0).

  4. If f is a convex function in x=(y,z), then the function g(y) := min_z : f(y,z) is convex. (Note that joint convexity in (y,z) is essential.)

  5. If f is convex, its perspective g(x,t) := t f(x/t) with domain mbox{bf dom}g = { (x,t) ::: x in mbox{bf dom} f, : t>0}, is convex. You can use this to prove convexity of the function f(x,t) = x^Tx/t, with domain { (x,t) ::: t >0}.

  6. The composition with another function does not always preserve convexity. However, if the functions g_i : mathbf{R}^n rightarrow mathbf{R}, i=1,ldots,k are convex and h : mathbf{R}^k rightarrow mathbf{R} is convex and non-decreasing in each argument, with mbox{bf dom}g_i = mbox{bf dom} h = mathbf{R}, then x rightarrow (h circ g)(x) := h(g_1(x),ldots,g_k(x)) is convex.

For example, if g_i’s are convex, then log sum_i exp g_i also is.

Subgradients

Definition

Let {f: mathbf{R}^n to mathbf{R}} be a convex function. The vector {ginmathbf{R}^n} is a subgradient of {f} at {x} if the subgradient inequality:
 f(y) geq f(x) + g^{T}(y-x) .
holds for every y. The subdifferential of f at x, denoted partial f(x), is the set of such subgradients at x.

  1. {partial f(x)} is convex, closed, never empty on the relative interiorfootnote{The relative interior of a set is the interior of the set, relative to the smallest affine subspace that contains it.} of its domain.

  2. if f is differentiable at x, then the subdifferential is a singleton: {partial f(x)} = {{ nabla f(x)}}.

For example, consider f(x) = |x| for x in mathbf{R}. We have
 partial f(x)  = left{ begin{array}{ll}  {-1} & mbox{if } x <0,   [-1,1] & mbox{if } x = 0,  {+1} & mbox{if } x >0. end{array}right.

Constructing subgradients

One of the most important rules for contructing a subgradient is based on the following rule.

Weak rule for point-wise supremum: if f_alpha are differentiable and convex functions that depend on a parameter alphain {cal A}, with {cal A} an arbitrary set, then
 f(x) = sup_{alpha in {cal A}} : f_alpha(x)
is possibly non-differentiable but convex. If beta is such that f(x) = f_beta(x), then a subgradient of f at x is simply any element in partial f_beta (x).

Example: maximum eigenvalue. For X=X^T in mathbf{R}^{n times n}, define f(X) = lambda_{rm max}(X) to be the largest eigenvalue of X (f is real valued since X is symmetric). A subgradient of f at X can be found using the following variational (that is, optimization-based) representation of f(X):
 f(X) = max_{y ::: |y|_2 = 1} : y^T X y .
Any unit-norm eigenvector y_{rm max} of X corresponding to the largest eigenvalue achieves the maximum in the above. Hence, by the weak rule above, a subgradient of f at X is given by a gradient of the function X rightarrow y_{rm max}^T X y_{rm max}, which is y_{rm max}y_{rm max}^T.