Linear Optimization

  • Definition and standard forms

  • Examples

Definition and standard forms

Definition

A linear optimization problem (or, linear program, LP) is one of the standard form:
 displaystylemin_x :  f_0(x) ~:~ begin{array}[t]{l} f_i(x) le 0,quad i=1,cdots, m,  Ax = b, quad i=1,cdots,p, end{array}
where every function f_0,f_1,ldots,f_m is affine. Thus, the feasible set of an LP is a polyhedron.

Standard forms

Linear optimization problems admits several standard forms. One is derived from the general standard form:
 min_x : c^Tx + d ~:~ Ax = b, ;; Cx le h,
where the inequalities are understood componentwise. Notice that the convention for componentwise inequalities differs from the one adopted in BV. I will reserve the symbol preceq or succeq for negative and positive semi-definiteness of symmetric matrices. The constant term d in the objective function is, of course, immaterial.

Another standard form — used in several off-the-shelf algorithms — -is
 min_x : c^Tx ~:~ Ax = b, ;; x ge 0.
We can always transform the above problem into the previous standard form, and vice-versa.

Examples

Piece-wise linear minimization

A piece-wise linear function is the point-wise maximum of affine functions, and has the form
 f(x) = max_{1 le i le m} : (a_i^Tx+b_i),
for appropriate vectors a_i and scalars b_i, i=1,ldots,m. The (unconstrained) problem of minimizing the piece-wise linear function above is not an LP. However, its epigraph form:
 min_{x,t} : t ~:~ t ge a_i^Tx + b_i, ;; i=1,ldots,m
is.

l_1-norm regression

A related example involves the minimization of the l_1-norm of a vector that depends affinely on the variable. This arises in regression problems, such as image reconstruction. Using a notation similar to the previous example, the problem has the form
 min_x : sum_{i=1}^m |a_i^Tx+b_i| .
The problem is not an LP, but we can introduce slack variables and re-write the above in the equivalent, LP format:
 min_{x,v} : sum_{i=1}^m v_i ~:~ -v_i le a_i^Tx+b_i le v_i , ;; i=1,ldots,m.

Linear binary classification

alt text 

A linear classifier, with a total of four errors on the training set. The sum of the lengths of the dotted lines (which correspond to classification errors) is the value of the loss function at optimum.

Consider a two-class classification problem as shown above. Given m data points x_{i}inmathbf{R}^{n}, each of which is associated with a label y_i in {-1,1}, the problem is to find a hyperplane that separates, as much as possible, the two classes.

The two classes are separable by a hyperplane {cal H}(w,b) = { x ::: w^Tx + b le 0 }, where w in mathbf{R}^n, w ne 0, and bin mathbf{R}, if and only if w^Tx_i + b ge 0 for y_i = +1, and w^Tx_i +b le 0 if y_i=-1. Thus, the conditions on (w,b)
 y_i(w^Tx_i + b) ge 0, ;; i=1,ldots,m
ensure that the data set is separable by a linear classifier. In this case, the parameters w,b allow to predict the label associated to a new point x, via y = mbox{bf sign}(w^Tx+b). The feasibility problem — finding (w,b) that satisfy the above separability constraints — is an LP. If the data set is strictly separable (every condition in the above holds strictly), then we can re-scale the constraints and transform them into
 y_i(w^Tx_i + b) ge 1, ;; i=1,ldots,m .
In practice, the two classes may not be linearly separable. In this case, we would like to minimize, by proper choice of the hyperplane parameters, the total number of classification errors. Our objective function has the form
 sum_{i = 1}^m psi (y_i(w^Tx_i + b)),
where psi(t) = 1 if t<0, and 0 otherwise.

Unfortunately, the above objective function is not convex, and hard to minimize. We can replace it by an upper bound, which is called the hinge function, h(t) = (1-t)_+ = max(0,1-t). Our problem becomes one of minimizing a piece-wise linear ‘‘loss’’ function:
 min_{w,b} : sum_{i=1}^m (1-y_i(w^Tx_i + b))_+.
In the above form, the problem is not yet an LP. We may introduce slack variables to obtain the LP form:
 min_{w,b,v} : sum_{i=1}^m v_i ~:~ v ge 0, ;; y_i(w^Tx_i + b) ge 1-v_i, ;; i=1,ldots,m.
The above can be seen as a variant of the separability conditions seen before, where we allow for infeasibilities, and seek to minimize their sum. The value of the loss function at optimum can be read from the figure: it is the sum of the lengths of the dotted lines, from data points that are wrongly classified, to the hyperplane.

Network flow

Consider a network (directed graph) having m nodes connected by n directed arcs (ordered pairs (i,j)). We assume there is at most one arc from node i to node j, and no self-loops. We define the arc-node incidence matrix A in mathbf{R}^{m times n} to be the matrix with coefficients A_{ij} = 1 if arc j starts at node i, -1 if it ends there, and 0 otherwise. Note that the column sums of A are zero: mathbf{1}^T A = 0.

A flow (of traffic, information, charge) is represented by a vector x in mathbf{R}^n, and the total flow leaving node i is then (Ax)_i = sum_{j=1}^n A_{ij} x_j.

The minimum cost network flow problem has the LP form
 min_x : c^Tx ~:~ Ax = b, ;; l le x le u,
where c_i is the cost of flow through arc i, l,u provide upper and lower bounds on x and b in mathbf{R}^m is an external supply vector. This vector may have positive or negative components, as it represents supply and demand. We assume that mathbf{1}^Tb = 0, so that the total supply equals the total demand. The constraint Ax=b represents the balance equations of the network.

A more specific example is the max-flow problem, where we seek to maximize the flow between node 1 (the source) and node m (the sink). It bears the form
 min_{x,t} : t ~:~ Ax = te, ;; l le x le u,
with e = (1,0,ldots,0,-1).

LP relaxation of boolean problems

A boolean optimization problem is one where the variables are constrained to be boolean. An example of boolean problem is the so-called boolean LP
 p^ast = min_x : c^Tx ~:~ Ax le b, ;; x in {0,1}^n.
Such problems are non-convex, and usually hard to solve. The LP relaxation takes the form
 p^ast_{rm LP} := min_x : c^Tx ~:~ Ax le b, ;; 0 le x le mathbf{1}.
The relaxation provides a lower bound on the original problem: p^ast_{rm LP} le p^ast. Hence, its optimal points may not be feasible (not boolean). Even though a solution of the LP relaxation may not necessarily be boolean, we can often interpret it as a fractional solution to the original problem. For example, in a graph coloring problem, the LP relaxation colors the nodes of the graph not with a single color, but with many.

Boolean problems are not always hard to solve. Indeed, in some cases, one can show that the LP relaxation provides an exact solution to the boolean problem, as optimal points turn out to be boolean. A few examples in this category, involving network flow problems with boolean variables:

  1. The weighted bipartite matching problem is to match N people to N tasks, in a one-to-one fashion. The cost of matching person i to task j is A_{ij}. The problem reads
     min_x : sum_{i,j=1}^N A_{ij} x_{ij} ~:~  begin{array}{ll}  x_{ij} in {0,1} &  forall : j , ;; sum_{i=1}^N x_{ij} = 1 & mbox{(one person for each task)}  forall : i , ;; sum_{j=1}^N x_{ij} = 1 & mbox{(one task for each person)}  end{array}

  2. The shortest path problem has the form
     min_x : mathbf{1}^Tx ~:~ Ax = (1,0,ldots,0,-1), ;; x in{0,1}^n ,
    where A stands for the incidence matrix of the network, and arcs with x_i = 1 form a shortest forward path between nodes 1 and m. As before the LP relaxation in this case is exact, in the sense that its solution is boolean. (The LP relaxation problem can be solved very efficiently with specialized algorithms.)