Conic Duality

  • Conic problem and its dual

  • Strong duality and KKT conditions

  • SDP duality

  • SOCP duality

  • Examples

Conic problem and its dual

The conic optimization problem in standard equality form is:
 p^ast := min_x : c^Tx ~:~ Ax = b, ;; x in {cal K}.
where cal K is a proper cone, for example a direct product of cones that are one of the three types: positive orthant, second-order cone, or semidefinite cone. Let cal K^* be the cone dual cal K, which we define as (

{cal K}^ast :={ lambda  :  forall : x in {cal K}, ;; lambda^Tx geq 0 }. ) All cones we mentioned (positive orthant, second-order cone, or semidefinite cone, and products thereof), are self-dual, in the sense that {cal K}^ast = {cal K}.

The Lagrangian of the problem is given by
{cal L}(x, lambda, y) = c^T x + y^T(b - Ax) - lambda^Tx
The last term is added to take account of the constraint x in {cal K}. From the very definition of the dual cone:
 max_{lambda in {cal K}^ast} : -lambda^Tx = left{  begin{array}{ll}  0      & mbox{if }x in {cal K}, +infty & mbox{otherwise.} end{array}right.
Thus, we have
 begin{array}{ll} p^* &= min_{x} max_{y,lambda in {cal K}^ast} {cal L}(x, lambda, y)  &= min_{x} max_{y,lambda in {cal K}^ast} c^T x + y^T(b - Ax) - lambda^Tx  &geq d^ast := max_{y,lambda in {cal K}^ast} : g(lambda,y)  end{array}
where
 g(lambda,y)=min_x : c^T x + y^T(b - Ax) - lambda^Tx = left{ begin{array}{ll} y^Tb & mbox{if } c - A^Ty -lambda = 0, -infty &mbox{otherwise.} end{array} right.
The dual for the problem is:
 d^* = max : y^{T}b ~:~  c - A^Ty -lambda = 0, ;; lambda in {cal K}^ast .
Eliminating lambda, we can simplify the dual as:
 d^* = max : y^{T}b ~:~  c - A^Ty  in {cal K}^ast .

Strong duality and KKT conditions

Conditions for strong duality

We now summarize the results stated before. Strong duality holds when either: begin{itemize}

  1. The primal is strictly feasible, i.e. exists x: Ax=b, xin mbox{bf int}({cal K}). This also implies that the dual problem is attained.

  2. The dual is strictly feasible, i.e. exists y: c -A^Ty in mbox{bf int}({cal K}^ast). This also implies that the primal problem is attained. item If both the primal and dual are strictly feasible then both are attained (and p^*=d^*). end{itemize}

KKT conditions

Assume p^*=d^* and both the primal and dual are attained by some primal-dual triplet (x^ast,lambda^ast,y^ast). Then,
 begin{array}{ll} p^*=c^Tx^*=d^*&=g(lambda^*,y^*) &= min_x {cal L}(x,lambda^*,y^*) &le {cal L}(x^ast,lambda^*,y^*)  &= c^Tx^* -lambda^{*T}x^* +y^{*T}(b-Ax^*)  & le c^Tx^ast = p^ast. end{array}
The last term in the fourth line is equal to zero which implies lambda^{*T}x^*=0.

Thus the KKT conditions are:

  1. x in {cal K}, Ax=b,

  2. lambda in {cal K}^ast,

  3. lambda^Tx=0,

  4. c - A^Ty -lambda=0, that is, nabla_x {cal L}(x,lambda,y)=0.

Eliminating lambda from the above allows us to get rid of the Lagrangian stationarity condition, and gives us the following theorem.

Theorem: KKT conditions for conic problems

The conic problem
 p^ast := min_x : c^Tx ~:~ Ax = b, ;; x in {cal K}.
admits the dual bound p^ast ge d^ast, where
 d^ast = max : y^{T}b ~:~  c - A^Ty  in {cal K}^ast .
If both problems are strictly feasible, then the duality gap is zero: p^ast=d^ast, and both values are attained. Then, a pair (x,y) is primal-dual optimal if and only if the KKT conditions

  1. Primal feasibility: x in {cal K}, Ax=b,

  2. Dual feasibility: c - A^Ty in {cal K}^ast,

  3. Complementary slackness: (c - A^Ty)^Tx=0, hold.

SDP duality

In this section, we consider the SDP in standard form:
 p^ast := max_X : langle C,X rangle ~:~ langle A_i, X rangle = b_i , ;; i= 1,ldots, m, ;; X succeq 0,
where C,A_i are given symmetric matrices, langle A, B rangle = mbox{bf Tr} AB denotes the scalar product between two symmetric matrices, and b in mathbf{R}^m is given.

Conic Lagrangian

Consider the ‘‘conic’’ Lagrangian
{cal L}(X,nu,Y) := langle C,X rangle + sum_{i=1}^m nu_i (b_i- langle A_i, X rangle) + langle Y, X rangle,
where now we associate a matrix dual variable Y to the constraint X succeq 0.

Let us check that the Lagrangian above ‘‘works’’, in the sense that we can represent the constrained maximization problem (ref{eq:SDP-conic:L13}) as an unconstrained, maximin problem:
 p^ast = max_X : min_{Y succeq 0} : {cal L}(X,nu,Y).
We need to check that, for an arbitrary matrix Z, we have
{eq:conic-dual-SDP-L13} min_{Y succeq 0} : langle Y, X rangle = left{ begin{array}{ll} 0 & mbox{if } X succeq 0 -infty & mbox{otherwise.} end{array}right.
This is an immediate consequence of the following:
 min_{Y succeq 0} : langle Y, X rangle = min_{t ge 0} : min_{Y succeq 0, : mbox{bf Tr} Y = t} : langle Y, X rangle = min_{t ge 0} : t lambda_{rm min}(X),
where we have exploited the representation of the minimum eigenvalue. The geometric interpretation is that the cone of positive-semidefinite matrices has a 90^o angle at the origin.

Dual problem

The minimax inequality then implies
 p^ast le d^ast := min_{nu, :Y succeq 0} : max_X : {cal L}(X,nu,Y).
The corresponding dual function is
 g(Y,nu) = max_X : {cal L}(X,nu,Y) = left{ begin{array}{ll}  nu^Tb & mbox{if } C-sum_{i=1}^m nu_i A_i + Y = 0 -infty & mbox{otherwise.} end{array}right.
The dual problem then writes
 d^ast = min_{nu,: Y succeq 0} : g(Y,nu) = min_{nu, : Y succeq 0} : nu^Tb ~:~ C-sum_{i=1}^m nu_i A_i = - Y preceq 0.
After elimination of the variable Y, we find the dual
 d^ast = min_{nu} : nu^Tb ~:~ C-sum_{i=1}^m nu_i A_i preceq 0.
which is in standard inequality form.

Weak and strong duality

For the maximization problem (ref{eq:SDP-conic:L13}), weak duality states that p^ast le d^ast. Note that the fact that weak duality inequality
 nu^Tb ge langle C,X rangle
holds for any primal-dual feasible pair (X,nu), is a direct consequence of (ref{eq:conic-dual-SDP-L13}).

From Slater’s theorem, strong duality will hold if the primal problem is strictly feasible, that is, if there exist X succ 0 such that langle A_i, X rangle = b_i, i= 1,ldots, m. It turns out that if both problems are strictly feasible, then both problems are attained (optimal set is not empty).

The following strong duality theorem summarizes our results.

Theorem: Strong duality in SDP

Consider the SDP
 p^ast := max_X : langle C,X rangle ~:~ langle A_i, X rangle = b_i , ;; i= 1,ldots, m, ;; X succeq 0
and its dual
 d^ast = min_{nu} : nu^Tb ~:~ sum_{i=1}^m nu_i A_i succeq C .
The following holds:

  1. Duality is symmetric, in the sense that the dual of the dual is the primal.

  2. Weak duality always holds: p^ast le d^ast, so that, for any primal-dual feasible pair (X,nu), we have nu^Tb ge langle C,X rangle.

  3. If the primal (resp.dual) problem is bounded above (resp. below), and strictly feasible, then p^ast = d^ast and the dual (resp. primal) is attained.

  4. If both problems are strictly feasible, then p^ast = d^ast and both problems are attained.

KKT conditions

Consider the SDP
 p^ast = min_x : c^Tx ~:~ F(x) := F_0 + sum_{i=1}^m x_i F_i succeq 0,
with c in mathbf{R}^n, F_i in {cal S}^n, i=0,ldots,m. The Lagrangian is (

{cal L}(x,Y) = c^Tx - mbox{bf Tr} F(x) Y, ) and the dual problem reads
 d^ast = max_{Y succeq 0} : min_x : {cal L}(x,Y) = max_Y : -mbox{bf Tr} F_0 Y ~:~ mbox{bf Tr} F_i Y = c_i, ;; i=1,ldots,n, ;; Y succeq 0.

The KKT conditions for optimality are as follows:

  1. F(x) succeq 0,

  2. Y succeq 0, mbox{bf Tr}(F_iY)= c_i, i=1,ldots,n,

  3. mbox{bf Tr}(F(x)Y)=0.

The last condition can be expressed as F(x) Y = 0, according to the following result: Let F,Y in {cal S}^n. If F succeq 0 and Ysucceq 0 then mbox{bf Tr}(FY)=0 is equivalent to FY=0.

Proof: Let Y^{1/2} be the square root of Y (the unique positive semi-definite solution to Z^2 =Y). We have mbox{bf Tr} FY = mbox{bf Tr}  tilde{F} = 0, where tilde{F} := Y^{1/2}FY^{1/2}. Since F succeq 0, we have tilde{F} succeq 0. The trace of tilde{F} being zero then implies that tilde{F} = 0.

Using the eigenvalue decomposition, we can reduce the problem to the case when Y is diagonal. Let us assume that
 Y = left( begin{array}{cc} Lambda & 0  0 & 0 end{array}right), ;; F = left( begin{array}{cc} F_{11} & F_{12}  F_{12}^T & F_{22} end{array}right)
where Lambdasucc 0 is diagonal, and contains the eigenvalues of Y, and the matrix F_{11} is of the same size as Lambda (which is equal to the rank of Y). The condition tilde{F} = Y^{1/2}FY^{1/2} = 0 expresses as
 0 =  left( begin{array}{cc} Lambda^{1/2} & 0  0 & 0 end{array}right) left( begin{array}{cc} F_{11} & F_{12}  F_{12}^T & F_{22} end{array}right) left( begin{array}{cc} Lambda^{1/2} & 0  0 & 0 end{array}right) =  left( begin{array}{cc} Lambda^{1/2}F_{11}Lambda^{1/2} & 0  0 & 0 end{array}right).
Since Lambda succ 0, we obtain F_{11} = 0. But F succeq 0 then implies F_{12} = 0 (use Schur complements), thus
 FY = left( begin{array}{cc} 0 & 0  0 & F_{22} end{array}right)left( begin{array}{cc} Lambda & 0  0 & 0 end{array}right) = 0,
as claimed. Thus the last KKT condition can be written as F(x)Y=0.

Theorem: KKT conditions for SDP

The SDP
 p^ast := min_x : c^Tx ~:~ F(x) := F_0 + sum_{i=1}^m x_i F_isucceq 0
admits the dual bound p^ast ge d^ast, where
 d^ast =  max_Y : -mbox{bf Tr} F_0 Y ~:~ mbox{bf Tr} F_i Y = c_i, ;; i=1,ldots,n, ;; Y succeq 0.
If both problems are strictly feasible, then the duality gap is zero: p^ast=d^ast, and both values are attained. Then, a pair (x,Y) is primal-dual optimal if and only if the KKT conditions

  1. Primal feasibility: F(x) succeq 0,

  2. Dual feasibility: mbox{bf Tr} F_i Y = c_i, ;; i=1,ldots,n, ;; Y succeq 0,

  3. Complementary slackness: F(x) Y=0, hold.

Recall LP duality for a problem of the from min_x { c^Tx: Ax leq b} with dual variables y has the KKT conditions were forall i: y_i(b-Ax)_i=0. This can be compactly written as mbox{bf diag}(y)mbox{bf diag}(b-Ax)=0 which is similar to the KKT conditions for SDP (with Y = mbox{bf diag}(y)). This should come as no surprise as SDP problems include LP problems as a special case.

SOCP duality

We start from the second-order cone problem in inequality form:
 p^ast :=min_x : c^Tx ~:~ |A_ix+b_i|_2 le c_i^Tx+d_i, ;; i=1,ldots,m,
where c in mathbf{R}^n, A_i in mathbf{R}^{n_i times n}, b_i in mathbf{R}^{n_i}, c_i in mathbf{R}^n, d_i in mathbf{R}, i=1,ldots,m.

Conic Lagrangian

To build a Lagrangian for this problem, we use the fact that, for any pair (t,y):
 max_{(u,lambda) ::: |u|_2 le lambda} : -u^Ty -t lambda= max_{lambda ge 0} : lambda(|y|_2-t) =  left{ begin{array}{ll} 0 & mbox{if } |y|_2 le t +infty & mbox{otherwise.} end{array}right.
The above means that the second-order cone has a 90^o angle at the origin. To see this, observe that
 max_{(u,lambda) ::: |u|_2 le lambda} : -u^Ty -t lambda = -min_{(u,lambda) ::: |u|_2 le lambda} : left(begin{array}{c} u  lambda end{array} right)^T left(begin{array}{c} y  t end{array} right) .
The objective in the right-hand side is proportional to the cosine of the angle between the vectors involved. The largest angle achievable between any two vectors in the second-order cone is 90^o. If |y|_2 > t, then the cosine reaches negative values, and the maximum scalar product becomes infinite.

alt text 

Geometric interpretation of the 90^o angle at the origin property. The two orthogonal vectors in black form the maximum angle attainable by vector in the second-order cone. The vector in red forms a greater angle with the vector on the left, and the corresponding scalar product is unbounded.

Consider the following Lagrangian, with variables x, lambdain mathbf{R}^m, u_i in mathbf{R}^{n_i}, i=1,ldots,m:
{cal L}(x,lambda,u_1,ldots,u_m) = c^Tx - sum_{i=1}^m left[ u_i^T(A_ix+b_i) + lambda_i (c_i^Tx+d_i)  right].
Using the fact above leads to the following minimax representation of the primal problem:
 p^ast = min_x : max_{|u_i|_2 le lambda_i, : i=1,ldots,m} : {cal L}(x,lambda,u_1,ldots,u_m).

Conic dual

Weak duality expresses as p^ast ge d^ast, where
 d^ast := max_{|u_i|_2 le lambda_i, : i=1,ldots,m} : min_x : {cal L}(x,lambda,u_1,ldots,u_m).
The inner problem, which corresponds to the dual function, is very easy to solve as the problem is unconstrained and the objective affine (in x). Setting the derivative with respect to x leads to the dual constraints
 c=sum_{i=1}^m [A_i^Tu_i + lambda_i c_i ].
We obtain
 d^ast = max_{lambda,u_i, : i=1,ldots,m} : -lambda^Td - sum_{i=1}^m u_i^Tb_i ~:~  c=sum_{i=1}^m [A_i^Tu_i + lambda_i c_i ] , ;; |u_i|_2 le lambda_i, ;; i=1,ldots,m.
The above is an SOCP, just like the original one.

Direct approach

As for the SDP case, it turns out that the above “conic” approach is the same as if we had used the Lagrangian
{cal L}_{rm direct}(x,lambda) =   c^Tx + sum_{i=1}^m lambda_i left[ |A_ix+b_i|_2- (c_i^Tx+d_i)  right].
Indeed, we observe that
{cal L}_{rm direct}(x,lambda) = max_{u_i, : i=1,ldots,m} : {cal L}(x,lambda,u_1,ldots,u_m) ~:~ |u_i|_2 le lambda_i, ;; i=1,ldots,m.

Strong duality for SOCPs

Strong duality results are similar to those for SDP: a sufficient condition for strong duality to hold is that one of the primal or dual problems is strictly feasible. If both are, then the optimal value of both problems is attained.

Theorem: Strong duality in SOCP

Consider the SOCP
 p^ast := min_x : c^Tx ~:~ |A_ix+b_i|_2 le c_i^Tx+d_i, ;; i=1,ldots,m,
and its dual
 d^ast = max_{lambda,u_i, : i=1,ldots,m} : -lambda^Td - sum_{i=1}^m u_i^Tb_i ~:~  c=sum_{i=1}^m [A_i^Tu_i + lambda_i c_i ] , ;; |u_i|_2 le lambda_i, ;; i=1,ldots,m.
The following holds:

  1. Duality is symmetric, in the sense that the dual of the dual is the primal.

  2. Weak duality always holds: p^ast ge d^ast, so that, for any primal-dual feasible pair (x,(u_i,lambda_i)_{i=}^m), we have lambda^Td + sum_{i=1}^m u_i^Tb_i  le c^Tx.

  3. If the primal (resp. dual) problem is bounded above (resp. below), and strictly feasible, then p^ast = d^ast and the dual (resp. primal) is attained.

  4. If both problems are strictly feasible, then p^ast = d^ast and both problems are attained.

Examples

Largest eigenvalue problem

Let us use the KKT conditions to prove that, for any given matrix Ain {cal S}^n:
 max_{x} : { x^TAx ~:~ x^Tx = 1 } = max_{X succeq 0, : mbox{bf Tr} X = 1} : mbox{bf Tr} AX = lambda_{rm max} (A) ,
where lambda_{rm max} denotes the largest singular value.

Duality theory for SDP immediately tells us that the second equality holds. Indeed, the SDP
 p^ast = max_{X} : mbox{bf Tr} AX ~:~ X succeq 0, ;; mbox{bf Tr} X = 1
admits the following dual (see XXX):
 p^ast le d^ast := min_{t} : t ~:~ t I succeq A.
Using the eigenvalue decomposition of A, it is easy to show that d^ast = lambda_{rm max}(A).

It remains to prove the first equality. We observe that
 max_{x} : { x^TAx ~:~ x^Tx = 1 } = max_{X} : mbox{bf Tr} AX ~:~ X  succeq 0, : mbox{bf Tr} X = 1, ;; mbox{bf rank}(X) = 1.
(To see this, set X = xx^T.) Thus we need to show that at optimum, the rank of the primal variable X in the problem above is one.

The pair of primal and dual problems are both strictly feasible, hence the KKT condition theorem applies, and both problems are attained by some primal-dual pair (X,t), which satisfies the KKT conditions. These are X succeq 0, t I succeq A, and (t I - A)X = 0. The last condition proves that any non-zero column x of X satisfies (tI-A)x = 0 (in other words, x is an eigenvector associated with the largest eigenvalue). Let us normalize x so that |x|_2 = 1, so that mbox{bf Tr} xx^T = 1. We have (tI-A)xx^T = 0, which proves that the feasible primal variable X^ast = xx^T succeq 0, mbox{bf Tr} X^ast = 1, is feasible and optimal for the primal problem (ref{eq:primal-evpb-L15}). Since X^ast has rank one, our first equality is proved.

An SDP where strong duality fails

Contrarily to linear optimization problems, SDPs can fail to have a zero duality gap, even when they are feasible. Consider the example:
 p^ast = min_{x} : x_2 ~:~ left(begin{array}{ccc} x_2 + 1 & 0 & 0  0 & x_1 & x_2  0 & x_2 & 0 end{array}right) succeq 0 .
Any primal feasible x satisfies x_2 = 0. Indeed, positive-semidefiniteness of the lower-right 2 times 2 block in the LMI of the above problem writes, using Schur complements, as x_1 le 0, x_2^2 le 0. Hence, we have p^ast = 0. The dual is
 d^ast = max_{Y in {cal S}^3} : -Y_{11} ~:~ Y succeq 0, ;; Y_{22} = 0, ;; 1-Y_{11}-2Y_{23} = 0.
Any dual feasible Y satisfies Y_{23} = 0 (since Y_{22} = 0), thus Y_{11} = -1 = d^ast.

An eigenvalue problem

For a matrix A in {cal S}_+^n, we consider the SDP
 p^ast = max_X : langle A,X rangle ~:~ mbox{bf Tr} X = 1, ;; X succeq 0.
The associated Lagrangian, using the conic approach, is
{cal L}(X,Y,nu) = langle A,X rangle + nu(1-mbox{bf Tr} X) + langle Y, X rangle,
with the matrix dual variable Y succeq 0, while nuinmathbf{R} is free.

The dual function is
 g(Y,nu) = max_X : {cal L}(X,Y,nu)  = left{ begin{array}{ll} nu & mbox{if }nu I = Y+A +infty & mbox{otherwise.} end{array}right.
We obtain the dual problem
 p^ast le d^ast = min_{nu,Y} : nu ~:~ Y+A = lambda I, ;; Y succeq 0.
Eliminating Y leads to
 d^ast = min_nu : { nu ~:~ nu I succeq A } = lambda_{rm max}(A).
Both the primal and dual problems are strictly feasible, so p^ast = d^ast, and both values are attained. This proves the representation given above for the largest eigenvalue of A.

SDP relaxations for a non-convex quadratic problem

In XXx, we have seen two kinds of relaxation for the non-convex problem
 p^ast := max_x : x^TWx ~:~ x_i^2 le 1, ;; i=1,ldots,n,
where the symmetric matrix W in {cal S}^n is given.

One relaxation is based on a standard relaxation of the constraints, and leads to (see XXX)
 p^ast le d^{rm lag} := min_D : mbox{bf Tr} D ~:~ D succeq W, ;; D mbox{ diagonal.}
Another relaxation (XXX) involved expressing the problem as an SDP with rank constraints on the a matrix X = xx^T:
 d^{rm rank} := max_X : langle W,X rangle ~:~ X succeq 0, ;; X_{ii} = 1, ;; i=1,ldots, m.

Let us examine the dual of the first relaxation (ref{eq:lag-rel-sdp-bool-L13}). We note that the problem is strictly feasible, so strong duality holds. Using the conic approach, we have
 begin{array}{rcl} d^{rm lag} &:=& min_D :max_{Y succeq 0} :  mbox{bf Tr} D + langle Y, W -D rangle  &=& max_{Y succeq 0} : min_D : mbox{bf Tr} D + langle Y, W -D rangle  &=& max_{Y} : langle Y, W rangle ~:~  succeq 0, ;; Y_{ii}=1, ;; i=1,ldots, m  &=& d^{rm rank}. end{array}
This shows that both Lagrange and rank relaxations give the same value, and are dual of each other.

In general, for arbitrary non-convex quadratic problems, the rank relaxation can be shown to be always better than the Lagrange relaxation, as the former is the (conic) dual to the latter. If either is strictly feasible, then they have the same optimal value.

Minimum distance to an affine subspace

Return to the problem seen in XXX:
 p^ast = min: |x|_2 ~:~ Ax = b,
where A in mathbf{R}^{p times n}, b in mathbf{R}^p, with b in the range of A. We have seen how to develop a dual when the objective is squared. Here we will work directly with the Euclidean norm.

The above problem is an SOCP. To see this, simply put the problem in epigraph form. Hence the above theory applies. A more direct (equivalent) way, which covers cases when norms appear in the objective, is to use the representation of the objective as a maximum:
 p^ast = min_{x} : max_{nu, : |u|_2 le 1} : x^Tu +nu^T(b-Ax) ge d^ast = max_{nu, : |u|_2 le 1} : min_{x} : x^Tu +nu^T(b-Ax).

The dual function is
 g(u) = min_{x} : x^Tu +nu^T(b-Ax) = left{ begin{array}{ll} nu^Tb & mbox{if } A^Tnu = u, -infty & mbox{otherwise.} end{array}right.
We obtain the dual
 d^ast = max_{nu, : u} : b^Tnu ~:~ A^Tnu = u, ;; |u|_2 le 1 .
Eliminating u:
 d^ast = max_{nu} : b^Tnu ~:~ |A^Tnu|_2 le 1.

Robust least-squares

Consider a least-squares problem
 min_x : |Ax-b|_2,
where A in mathbf{R}^{m times n}, b in mathbf{R}^m. In practice, A may be noisy. To handle this, we assume that A is additively perturbed by a matrix bounded in largest singular value norm (denoted |cdot| in the sequel) by a given number rho ge 0. The robust counterpart to the least-squares problem then reads
 min_x : max_{|Delta| le rho} : |(A+Delta)x-b|_2.
Using convexity of the norm, we have
 forall :  Delta, ;; |Delta| le rho ~:~  |(A+Delta)x-b|_2 le |Ax-b|_2 + |Delta x|_2 le |Ax-b|_2 + rho |x|_2.
The upper bound is attained, with the choicefootnote{We assume that x ne 0, Ax ne b. These cases are easily analyzed and do not modify the result.}
 Delta = frac{rho}{|x|_2 cdot |Ax-b|_2} (Ax-b)x^T.
Hence, the robust counterpart is equivalent to the SOCP
 min_x : |Ax-b|_2 + rho |x|_2 .

Again, we can use epigraph representations for each norm in the objective:
 min_{x,t,tau} : t + rho tau ~:~ t ge |Ax-b|_2, ;; tau ge |x|_2 .
and apply the standard theory for SOCP developed in section ref{s:dual-pb-socp-L14}. Strong duality holds, since the problem is strictly feasible.

An equivalent, more direct approach is to represent each norm as a maximum:
 p^ast = min_x : max_{|u|_2 le 1, : |v|_2 le rho} : u^T(b-Ax)+v^Tx .
Exchanging the min and the max leads to the dual
 p^ast ge d^ast = max_{|u|_2 le1, : |v|_2 le rho} : min_x : u^T(b-Ax)+v^Tx .
The dual function is
 g(u,v) = min_x : v^T(b-Ax)+u^Tx  = left{ begin{array}{ll} v^Tb & mbox{if } A^Tv +u=0, -infty & mbox{otherwise.} end{array}right.
Eliminating u, we obtain the dual
 d^ast = max_{u,v} : v^Tb ~:~ |A^Tv|_2 le 1, ;; |v|_2 le rho.
As expected, when rho grows, the dual solution tends to the least-norm solution to the system Ax=b. It turns out that the above approach leads to a dual that is equivalent to the SOCP dual, and that strong duality holds.

Probabilistic classification

Consider a binary classification problem, in which the data points for each class and radom variables x_+,x_-, each assumed to obey to a given Gaussian distribution {cal N}(hat{x}_pm,Sigma_pm), where hat{x}_pm in mathbf{R}^n, Sigma_pm in {cal S}^n_{++} are the given class-dependent means and covariance matrices, respectively. We seek an hyperplane {cal H}(w,b) := { x ::: w^Tx+b = 0} that probabilistically separates the two classes, in the sense that
 mbox{bf Prob} { x_+ ::: (w^Tx_++b) ge 0 } ge 1-epsilon, ;; mbox{bf Prob} { x_- ::: (w^Tx_-+b) le 0 } ge 1-epsilon,
where epsilon is a given small number. The interpretation is that we would like that, with high probability, the samples taken from each distribution fall on the correct side of the hyperplane. The number epsilon <1/2 allows to set the level of reliability of the classification, with small epsilon corresponding to a low probability of mis-classification. We assume that hat{x}_+ ne hat{x}_-.

When x obeys to a distribution {cal N}(hat{x},Sigma), the random variable w^Tx+b follows the distribution {cal N}(hat{xi},sigma^2), with hat{xi} := w^That{x}+b, sigma^2 =w^TSigma w. We can write xi = hat{xi} + sigma u, with u a normal (zero-mean, unit variance) random variable. We thus have
 mbox{bf Prob}{ xi ::: xi ge 0} = mbox{bf Prob}{ u ::: u ge -hat{xi}/sigma } = 1 - Phi(-hat{xi}/sigma),
where Phi is the cumulative density function of the normal distribution, namely Phi(alpha) := mbox{bf Prob}{u ::: u le alpha}. Since Phi is monotone increasing, the condition mbox{bf Prob}{ xi ::: xi ge 0} ge 1-epsilon is equivalent to -hat{xi}/sigma le Phi^{-1}(epsilon), or hat{xi} ge kappa_epsilon sigma, where kappa_epsilon := -Phi^{-1}(epsilon). Note that kappa_epsilon>0 whenever 0 le epsilon < 1/2.

We obtain that the probability constraints above write
 w^That{x}_+ + b ge kappa_epsilon sqrt{w^TSigma_+ w}, ;; w^That{x}_- + b le -kappa_epsilonsqrt{w^TSigma_- w} .
Note that when Sigma_pm = 0, he above simply requires the correct classification of the means hat{x}_pm. When Sigma_pm grows in size, he conditions become more and more stringent. We can eliminate b from the constraints above: these constraints hold for some b if and only if
 w^T(hat{x}_+ - hat{x}_-) ge kappa_epsilon left( sqrt{w^TSigma_+ w} + sqrt{w^TSigma_- w} right).

Let us minimize the error probability level epsilon. Since kappa_epsilon is decreasing in epsilon, we would like to maximize kappa_epsilon subject to the constraints above. Exploiting homogeneity, we can always require w^T(hat{x}_+ - hat{x}_-) =1. We are led to the problem
 p^ast = 1/kappa^ast := min_w : sqrt{w^TSigma_+ w} + sqrt{w^TSigma_- w} ~:~ w^T(hat{x}_+ - hat{x}_-) =1 .
This is an SOCP, which is strictly feasible (in the sense of weak Slater’s condition), since hat{x}_+ ne hat{x}_-. Hence strong duality holds.

The dual can be obtained from the minimax expression
 p^ast = min_w : max_{nu, : |u_pm|_2 le 1} : u_+^TSigma_+^{1/2}w + u_-^TSigma_-^{1/2}w + nu(1-w^T(hat{x}_+ - hat{x}_-)).
Exchanging the min and max yields
 p^ast = d^ast = max_{nu, : |u_pm|_2 le 1} : nu ~:~ u_+^TSigma_+^{1/2}+ u_-^TSigma_-^{1/2} = nu(hat{x}_+ - hat{x}_-).

We observe that nu ne 0 at the optimum, otherwise we would have p^ast = 0, which is clearly impossible when hat{x}_+ ne hat{x}_-. We then set kappa = 1/nu, scale the variables u_pm by r, and change u_+ into its opposite. This leads to the dual
 kappa^ast := min_{|u_pm|_2 le kappa} : kappa ~:~ hat{x}_+ + Sigma_+^{1/2}u_+  = hat{x}_- + Sigma_-^{1/2}u_- .
The geometric interpretation is as follows. Consider the ellipsoids {cal E}_pm(kappa) := { hat{x}_pm + Sigma^{1/2}_pm u ::: |u|_2 le kappa}. The constraints in the dual problem above state that these two ellipsoids intersect. The dual problem amounts to finding the smallest r for which the two ellipsoids {cal E}_pm(kappa) intersect. The optimal value of kappa is then kappa^ast, and the corresponding optimal value of the error probability level is epsilon^ast = Phi(-kappa^ast). It can be shown that the optimal separating hyperplane corresponds to the common tangent at the intersection point.