Optimal set of Least-Squares via SVD

Theorem: optimal set of ordinary least-squares

The optimal set of the OLS problem

can be expressed as

$mathbf{X}^{rm opt} = A^dagger y + mathbf{N}(A).$

where A^dagger is the pseudo-inverse of , and A^dagger y is the minimum-norm point in the optimal set. If is full column rank, the solution is unique, and equal to

$x^ast = A^dagger y = (A^TA)^{-1} A^Ty.$

Proof: The following proof relies on the SVD of

, and the rotational invariance of the Euclidean norm.

Optimal value of the problem

Using the SVD we can find the optimal set to the lest-squares optimization problem

$p^ast = min_x left| U left( begin{array}{cc} S & 0 0 & 0 end{array} right) V^T x - y right|_2 = min_x left| left( begin{array}{cc} S & 0 0 & 0 end{array} right) V^T x - U^Ty right|_2^2 ,$

where we have exploited the fact that the Euclidean norm is invariant under the orthogonal transformation U^T

. With $tilde{x} := V^Tx$ , and $tilde{y} := U^Ty$ , and changing the variable

, we express the above as

$p^ast = min_{tilde{x}} : left| left( begin{array}{cc} S & 0 0 & 0 end{array} right) tilde{x} - tilde{y} right|_2^2.$

Expanding the terms, and using the partitioned notations $tilde{x} = (tilde{x}_r, tilde{x}_{n -r})$ , $tilde{y} = (tilde{y}_r, tilde{y}_{m -r})$ , we obtain

Since

is invertible, we can reduce the first term in the objective to zero with the choice $tilde{x}_r = S^{-1}tilde{y}_r$ . Hence the optimal value is

We observe that the optimal value is zero if and only if y in mathbf{R}(A)

, which is exactly the same as $tilde{y}_{m-r} = 0$ .

Optimal set

Let us detail the optimal set for the problem. The variable tilde{x}

is partly determined, via its first

components: $tilde{x}_r^ast = S^{-1}tilde{y}_r$ . The remaining n-r

variables contained in $x_{n-r}$ are free, as $tilde{x}_{n-r}$ does not appear in the objective function of the above problem.

Thus, optimal points are of the form x = Vtilde{x}

, with $tilde{x} = (tilde{x}_r^ast, tilde{x}_{n -r})$ , $tilde{x}_r^ast = S^{-1}tilde{y}_r$ , and $tilde{x}_{n-r}$ free.

To express this in terms of the original SVD of

, we observe that $x = Vtilde{x} = V(tilde{x}_r, tilde{x}_{n -r})$ means that

where

is partitioned as $V = (V_r,V_{n-r})$ , with $V_r in mathbf{R}^{n times r}$ and $V_{n-r} in mathbf{R}^{n times (n-r)}$ . Similarly, the vector $tilde{y}_r$ can be expressed as $tilde{y}_r = U_r^Ty$ , with U_r

formed with the first

columns of

. Thus, any element x^ast

in the optimal set is of the form

where $x_{rm MN} := V_r S^{-1}U_r^T y$ . (We will soon explain the acronym appearing in the subscript.) The free components $tilde{x}_{n-r}$ correspond to the degrees of freedom allowed to by the nullspace of

Minimum-norm optimal point

The particular solution to the problem, $x_{rm MN}$ , is the minimum-norm solution, in the sense that it is the element of $mathbf{X}^{rm opt}$ that has the smallest Euclidean norm. This is best understood in the space of tilde{x}

-variables.

Indeed,the particular choice $tilde{x} = (tilde{x}_r^ast,0)$ corresponds to the element in the optimal set that has the smallest Euclidean norm. Indeed, the norm of

is the same as that of its rotated version, tilde{x}

. the first

elements in tilde{x}

$tilde{x}_r^ast$ are fixed, and since $|tilde{x}|_2^2 = |tilde{x}_{n-r}|_2^2 + |tilde{x}_{n-r}|_2^2$ , we see that the minimal norm is obtained with $tilde{x}_{n-r} = 0$ .

Optimal set via the pseudo-inverse

The matrix $V_r S^{-1}U_r^T$ , which appears in the expression of the particular solution $x_{rm MN}$ mentioned above, is nothing else than the pseudo-inverse of

, which is denoted A^dagger

. Indeed, we can express the pseudo-inverse in terms of the SVD as

With this convention, the minimum-norm optimal point is A^dagger y

. Recall that the last n-r

columns of

form a basis for the nullspace of

. Hence the optimal set of the problem is

When

is full column rank ( r=n le m

, and

), the optimal set reduces to a singleton, as the nullspace is {0}

. The unique optimal point expresses as