Optimal set of Least-Squares via SVD

Theorem: optimal set of ordinary least-squares

The optimal set of the OLS problem

 p^ast := min_x : |Ax-y|_2

can be expressed as

 mathbf{X}^{rm opt} = A^dagger y + mathbf{N}(A).

where A^dagger is the pseudo-inverse of A, and A^dagger y is the minimum-norm point in the optimal set. If A is full column rank, the solution is unique, and equal to

 x^ast = A^dagger y = (A^TA)^{-1} A^Ty.

In general, the particular solution A^dagger y is the minimum-norm solution to the least-squares problem.

Proof: The following proof relies on the SVD of A, and the rotational invariance of the Euclidean norm.

Optimal value of the problem

Using the SVD we can find the optimal set to the lest-squares optimization problem

 p^ast := min_x : |Ax-y|_2^2.

Indeed, if (U,tilde{S},V) is an SVD of A, the problem can be written

 p^ast = min_x left| U left( begin{array}{cc} S & 0  0 & 0 end{array} right) V^T x - y right|_2 = min_x left| left( begin{array}{cc} S & 0  0 & 0 end{array} right) V^T x - U^Ty right|_2^2 ,

where we have exploited the fact that the Euclidean norm is invariant under the orthogonal transformation U^T. With tilde{x} := V^Tx, and tilde{y} := U^Ty, and changing the variable x to tilde{x}, we express the above as

 p^ast = min_{tilde{x}} : left| left( begin{array}{cc} S & 0  0 & 0 end{array} right) tilde{x} - tilde{y} right|_2^2.

Expanding the terms, and using the partitioned notations tilde{x} = (tilde{x}_r, tilde{x}_{n -r}), tilde{y} = (tilde{y}_r, tilde{y}_{m -r}), we obtain

 p^ast = min_{tilde{x}_r} : |S tilde{x}_r - tilde{y}_r |_2^2 + |tilde{y}_{m-r}|_2^2 .

Since S is invertible, we can reduce the first term in the objective to zero with the choice tilde{x}_r = S^{-1}tilde{y}_r. Hence the optimal value is

 p^ast = |tilde{y}_{m-r}|_2^2 .

We observe that the optimal value is zero if and only if y in mathbf{R}(A), which is exactly the same as tilde{y}_{m-r} = 0.

Optimal set

Let us detail the optimal set for the problem. The variable tilde{x} is partly determined, via its first r components: tilde{x}_r^ast = S^{-1}tilde{y}_r. The remaining n-r variables contained in x_{n-r} are free, as tilde{x}_{n-r} does not appear in the objective function of the above problem.

Thus, optimal points are of the form x = Vtilde{x}, with tilde{x} = (tilde{x}_r^ast, tilde{x}_{n -r}), tilde{x}_r^ast = S^{-1}tilde{y}_r, and tilde{x}_{n-r} free.

To express this in terms of the original SVD of A, we observe that x = Vtilde{x} = V(tilde{x}_r, tilde{x}_{n -r}) means that

 x = V_r tilde{x}_r + V_{n-r} tilde{x}_{n-r} = V_r S^{-1}tilde{y}_r + V_{n-r} tilde{x}_{n-r}

where V is partitioned as V = (V_r,V_{n-r}), with V_r in mathbf{R}^{n times r} and V_{n-r} in mathbf{R}^{n times (n-r)}. Similarly, the vector tilde{y}_r can be expressed as tilde{y}_r = U_r^Ty, with U_r formed with the first r columns of U. Thus, any element x^ast in the optimal set is of the form

 x^ast = x_{rm MN} + V_{n-r} z ~:~ z in mathbf{R}^{n-r},

where x_{rm MN} := V_r S^{-1}U_r^T y. (We will soon explain the acronym appearing in the subscript.) The free components tilde{x}_{n-r} correspond to the degrees of freedom allowed to by the nullspace of A.

Minimum-norm optimal point

The particular solution to the problem, x_{rm MN}, is the minimum-norm solution, in the sense that it is the element of mathbf{X}^{rm opt} that has the smallest Euclidean norm. This is best understood in the space of tilde{x}-variables.

Indeed,the particular choice tilde{x} = (tilde{x}_r^ast,0) corresponds to the element in the optimal set that has the smallest Euclidean norm. Indeed, the norm of x is the same as that of its rotated version, tilde{x}. the first r elements in tilde{x} tilde{x}_r^ast are fixed, and since |tilde{x}|_2^2 = |tilde{x}_{n-r}|_2^2 + |tilde{x}_{n-r}|_2^2, we see that the minimal norm is obtained with tilde{x}_{n-r} = 0.

Optimal set via the pseudo-inverse

The matrix V_r S^{-1}U_r^T, which appears in the expression of the particular solution x_{rm MN} mentioned above, is nothing else than the pseudo-inverse of A, which is denoted A^dagger. Indeed, we can express the pseudo-inverse in terms of the SVD as

 A^dagger = V left( begin{array}{cc} S^{-1} & 0  0 & 0 end{array} right) U^T.

With this convention, the minimum-norm optimal point is A^dagger y. Recall that the last n-r columns of V form a basis for the nullspace of A. Hence the optimal set of the problem is

 mathbf{X}^{rm opt} = A^dagger y + mathbf{N}(A).

When A is full column rank (r=n le m, and A^TAsucc 0), the optimal set reduces to a singleton, as the nullspace is {0}. The unique optimal point expresses as

 x_{rm LS} = A^dagger y = (A^TA)^{-1}A^T y.