Singular value decomposition (SVD) theorem

Theorem: Singular Value Decomposition (SVD)

An arbitrary matrix A in mathbf{R}^{m times n} admits a decomposition of the form

 A = sum_{i=1}^r sigma_i u_i v_i^T = U tilde{ {S}} V^T, ;; tilde{ {S}} := left( begin{array}{cc}  {S} & 0  0 & 0 end{array} right) ,

where U in mathbf{R}^{m times m}, V in mathbf{R}^{n times n} are both orthogonal matrices, and the matrix {S} is diagonal:

 S = mbox{bf diag}(sigma_1 , ldots, sigma_r),

where the positive numbers sigma_1 ge ldots ge sigma_r >0 are unique, and are called the singular values of A. The number r le min(m,n) is equal to the rank of A, and the triplet (U,tilde{ {S}},V) is called a singular value decomposition (SVD) of A. The first r columns of U: u_i, i=1,ldots,r (resp. V: v_i, i=1,ldots,r) are called left (resp. right) singular vectors of A, and satisfy

 A v_i = sigma_i u_i, ;; u_i^T A  = sigma_i v_i, ;; i=1,ldots,r.

Proof: The matrix n times n A^TA is real and symmetric. According to the spectral theorem, it admits an eigenvalue decomposition in the form A^TA = V Lambda V^T, with V a n times n matrix whose columns form an orthonormal basis (that is, V^TV = VV^T = I_n), and Lambda = mbox{bf diag}(lambda_1,ldots,lambda_r,0,ldots,0). Here, r is the rank of A^TA (if r = n then there are no trailing zeros in Lambda). Since A^TA is positive semi-definite, the lambda_j's are non-negative, and we can define the non-zero quantities sigma_j := sqrt{lambda_j}, j=1,ldots,r.

Note that when j >r, Av_j = 0, since then |Av_j|_2^2 = v_j^TA^TAv_j = lambda_j v_j^Tv_j = 0.

Let us construct an m times m orthogonal matrix U as follows. We set

 u_i = frac{1}{sigma_i} Av_i, ;; i=1,ldots,r.

These m-vectors are unit-norm, and mutually orthogonal, since v_j's are eigenvectors of A^TA. Using (say) the Gram-Schmidt orthogonalization procedure, we can complete (if necessary, that is in the case r <m) this set of vectors by u_{r+1},ldots,u_m in order to form an orthogonal matrix U:=(u_1,ldots,u_m) in mathbf{R}^m.

Let us check that U,V satisfy the conditions of the theorem, by showing that U^TAV^T=tilde{S}:=mbox{bf diag}(sigma_1,ldots,sigma_r,0,ldots,0). We have

 (U^TAV)_{ij} = u_i^TAv_j = left{begin{array}{ll}sigma_j u_i^Tu_j &mbox{if } j le r  0 & mbox{otherwise,} end{array}right.

where the second line stems from the fact that Av_j = 0 when j >r. Thus, U^TAV = tilde{S}, as claimed.