• The SVD theorem

  • Geometry

  • Link with the spectral theorem

The SVD theorem

Basic idea

Recall from here that any matrix A in mathbf{R}^{m times n} with rank one can be written as

 A = sigma u v^T,

where u in mathbf{R}^m, v in mathbf{R}^n, and sigma >0.

It turns out that a similar result holds for matrices of arbitrary rank r. That is, we can express any matrix A in mathbf{R}^{m times n} with rank one as sum of rank-one matrices

 A = sum_{i=1}^r sigma_i u_i v_i^T,

where u_1,ldots,u_r are mutually orthogonal, v_1,ldots,v_r are also mutually orthogonal, and the sigma_i's are positive numbers called the singular values of A. In the above, r turns out to be the rank of A.

Theorem statement

The following important result applies to any matrix A, and allows to understand the structure of the mapping x rightarrow Ax.

Theorem: Singular Value Decomposition (SVD)

An arbitrary matrix A in mathbf{R}^{m times n} admits a decomposition of the form

 A = sum_{i=1}^r sigma_i u_i v_i^T = U tilde{ {S}} V^T, ;; tilde{ {S}} := left( begin{array}{cc}  {S} & 0  0 & 0 end{array} right) ,

where U in mathbf{R}^{m times m}, V in mathbf{R}^{n times n} are both orthogonal matrices, and the matrix {S} is diagonal:

 S = mbox{bf diag}(sigma_1 , ldots, sigma_r),

where the positive numbers sigma_1 ge ldots ge sigma_r >0 are unique, and are called the singular values of A. The number r le min(m,n) is equal to the rank of A, and the triplet (U,tilde{ {S}},V) is called a singular value decomposition (SVD) of A. The first r columns of U: u_i, i=1,ldots,r (resp. V: v_i, i=1,ldots,r) are called left (resp. right) singular vectors of A, and satisfy

 A v_i = sigma_i u_i, ;; u_i^T A  = sigma_i v_i, ;; i=1,ldots,r.

The proof of the theorem hinges on the spectral theorem for symmetric matrices. Note that in the theorem, the zeros appearing alongside {S} are really blocks of zeros. They may be empty, for example if r = n, then there are no zeros to the right of  {S}.

Computing the SVD

The SVD of a m times n matrix A can be easily computed via a sequence of linear transformations. The complexity of the algorithm, expressed roughly as the number of floating point operations per seconds it requires, grows as O(nm min(n,m)). This can be substantial for large, dense matrices. For sparse matrices, we can speed up the computation if we are interested only in the largest few singular values and associated singular vectors.

Matlab syntax
>> [U,Stilde,V] = svd(A); % this produces the SVD of A, with Stilde of same size as A
>> [Uk,Sk,Vk] = svds(A,k); % the k largest singular values of A, assuming A is sparse

Example: A 4 times 5 example.

Geometry

The theorem allows to decompose the action of A on a given input vector as a three-step process. To get Ax, where x in mathbf{R}^n, we first form tilde{x} := V^Tx in mathbf{R}^n. Since V is an orthogonal matrix, V^T is also orthogonal, and tilde{x} is just a rotated version of x, which still lies in the input space. Then we act on the rotated vector tilde{x} by scaling its elements. Precisely, the first r elements of tilde{x} are scaled by the singular values sigma_1,ldots,sigma_r; the remaining n-r elements are set to zero. This step results in a new vector tilde{y} which now belongs to the output space mathbf{R}^m. The final step consists in rotating the vector tilde{y} by the orthogonal matrix U, which results in y =  Utilde{y} = Ax.

Example: Assume A has the simple form

 A = left( begin{array}{cc} 1.3 & 0  0 & 2.1  0 & 0 end{array} right),

then for an input vector x in mathbf{R}^2, Ax is a vector in mathbf{R}^3 with first component 1.3x_1, second component 2.1 x_2, and last component zero.

To summarize, the SVD theorem states that any matrix-vector multiplication can be decomposed as a sequence of three elementary transformations: a rotation in the input space, a scaling that goes from the input space to the output space, and a rotation in the output space. In contrast with symmetric matrices, input and output directions are different.

The interpretation allows to make a few statements about the matrix.

Example: A 4 times 4 example.

Link with the SED

If A admits an SVD, then the matrices AA^T and A^TA has the following SEDs:

 AA^T = U Lambda_m U^T, ;; A^TA = V Lambda_n V^T,

where Lambda_m : = tilde{ {S}}tilde{ {S}}^T = mbox{bf diag}(sigma_1^2,ldots,sigma_r^2,0,ldots,0) is m times m (so it has m-r trailing zeros), and Lambda_n := tilde{ {S}}^Ttilde{ {S}} = mbox{bf diag}(sigma_1^2,ldots,sigma_r^2,0,ldots,0) is n times n (so it has n-r trailing zeros). The eigenvalues of AA^T and A^TA are the same, and equal to the squared singular values of A. The corresponding eigenvectors are the left and right singular vectors of A.

This is a method (not the most computationally efficient) to find the SVD of a matrix, based on the SED.