The SVD theorem

The SVD theorem
Geometry
Link with the spectral theorem

The SVD theorem

Basic idea

Recall from here that any matrix $A in mathbf{R}^{m times n}$ with rank one can be written as

It turns out that a similar result holds for matrices of arbitrary rank

. That is, we can express any matrix $A in mathbf{R}^{m times n}$ with rank one as sum of rank-one matrices

where

are mutually orthogonal, v_1,ldots,v_r

are also mutually orthogonal, and the sigma_i

's are positive numbers called the singular values of

. In the above,

turns out to be the rank of

Theorem statement

The following important result applies to any matrix

, and allows to understand the structure of the mapping x rightarrow Ax

Theorem: Singular Value Decomposition (SVD)

An arbitrary matrix $A in mathbf{R}^{m times n}$ admits a decomposition of the form

$A = sum_{i=1}^r sigma_i u_i v_i^T = U tilde{ {S}} V^T, ;; tilde{ {S}} := left( begin{array}{cc} {S} & 0 0 & 0 end{array} right) ,$

where $U in mathbf{R}^{m times m}$ , $V in mathbf{R}^{n times n}$ are both orthogonal matrices, and the matrix {S} is diagonal:

$S = mbox{bf diag}(sigma_1 , ldots, sigma_r),$

where the positive numbers sigma_1 ge ldots ge sigma_r >0 are unique, and are called the singular values of . The number r le min(m,n) is equal to the rank of , and the triplet (U,tilde{ {S}},V) is called a singular value decomposition (SVD) of . The first columns of : u_i , i=1,ldots,r (resp. : v_i , ) are called left (resp. right) singular vectors of , and satisfy

A v_i = sigma_i u_i, ;; u_i^T A = sigma_i v_i, ;; i=1,ldots,r.

The proof of the theorem hinges on the spectral theorem for symmetric matrices. Note that in the theorem, the zeros appearing alongside {S}

are really blocks of zeros. They may be empty, for example if r = n

, then there are no zeros to the right of {S}

Computing the SVD

The SVD of a m times n

matrix

can be easily computed via a sequence of linear transformations. The complexity of the algorithm, expressed roughly as the number of floating point operations per seconds it requires, grows as O(nm min(n,m))

. This can be substantial for large, dense matrices. For sparse matrices, we can speed up the computation if we are interested only in the largest few singular values and associated singular vectors.

Matlab syntax

>> [U,Stilde,V] = svd(A); % this produces the SVD of A, with Stilde of same size as A
>> [Uk,Sk,Vk] = svds(A,k); % the k largest singular values of A, assuming A is sparse

Geometry

The theorem allows to decompose the action of

on a given input vector as a three-step process. To get

, where $x in mathbf{R}^n$ , we first form $tilde{x} := V^Tx in mathbf{R}^n$ . Since

is an orthogonal matrix, V^T

is also orthogonal, and tilde{x}

is just a rotated version of

, which still lies in the input space. Then we act on the rotated vector tilde{x}

by scaling its elements. Precisely, the first

elements of tilde{x}

are scaled by the singular values sigma_1,ldots,sigma_r

; the remaining n-r

elements are set to zero. This step results in a new vector tilde{y}

which now belongs to the output space $mathbf{R}^m$ . The final step consists in rotating the vector tilde{y}

by the orthogonal matrix

, which results in y = Utilde{y} = Ax

then for an input vector

in $mathbf{R}^2$ ,

is a vector in $mathbf{R}^3$ with first component 1.3x_1

, second component 2.1 x_2

, and last component zero.

To summarize, the SVD theorem states that any matrix-vector multiplication can be decomposed as a sequence of three elementary transformations: a rotation in the input space, a scaling that goes from the input space to the output space, and a rotation in the output space. In contrast with symmetric matrices, input and output directions are different.

Link with the SED