• Motivating example

  • RMS gain: the Frobenius norm

  • Peak gain: the largest singular value norm

  • Applications

Motivating example: effect of noise in a linear system

We saw how a matrix (say, A in mathbf{R}^{m times n}) induces, via the amtrix-vector product, a linear map x rightarrow Ax. Here, x is an input vector and y=Ax is the output. The mapping (that is, A) could represent a linear amplifier with input an audio signal x and output another audio signal y.

Now, assume that there is some noise in the vector x: the actual input is x+v, where v in mathbf{R}^n is an error vector. This implies that there will be noise in the output as well: the noisy output is A(x+v), so that the error on the output due to noise is Av. How could we quantify the effect of input noise on the output noise?

One approach is to try to measure the norm of the error vector, |Av|. Obviously, this norm depends on the noise v, which we do not know. So we will assume that v can take values in a set. We need to come up with a single number that captures in some way the different values of |Av| when v spans that set. Since scaling v simply scales the norm |Av| accordingly, we will restrict the vectors v to have a certain norm, say |v| = 1.

Clearly, depending on the choice of the set, the norms we use to measure norm lengths, and how we choose to capture many numbers |Av| with one, etc, we will obtain different numbers.

RMS gain: the Frobenius norm

Let us first assume that the noise vector v can take a finite set of directions, specifically the directions represented by the standard basis,e_1,ldots,e_n. Then let us look at the average of the squared error norm:

 frac{1}{n} sum_{i=1}^n |Ae_i|_2^2 = frac{1}{n} sum_{i=1}^n |a_i|_2^2 ,

where a_i stands for the i-th column of A. The quantity above can be written as frac{1}{n} |A|_F^2, where

 |A|_F := sqrt{sum_{i=1}^m sum_{j=1}^n A_{ij}^2} = sqrt{mbox{bf Tr}(A^TA)}

is the Frobenius norm of A.

The function A rightarrow |A|_F turns out to satisfy the basic conditions of a norm in the matrix space mathbf{R}^{m times n}. In fact, it is the Euclidean norm of the vector of length nm formed with all the coefficients of A. Further, the quantity would remain the same if we had chosen any orthonormal basis other than the standard one.

The Frobenius norm is useful to measure the RMS (root-mean-square) gain of the matrix, its average response along given mutually orthogonal directions in space. Clearly, this approach does not capture well the variance of the error, only the average effect of noise.

The computation of the Frobenius norm is very easy: it requires about nm flops.

Matlab syntax
>> frob_norm = norm(A,'fro');

Peak gain: the largest singular value norm

To try to capture the variance of the output noise, we may take a worst-case approach.

Let us assume that the noise vector is bounded but otherwise unknown. Specifically, all we know about v is that |v|_2 le alpha, where alpha is the maximum amount of noise (measured in Euclidean norm). What is then the worst-case (peak) value of the norm of the output noise? This is answered by the optimization problem

 max_{v} : |Av|_2 ~:~ |v|_2 le alpha.

The quantity

 |A|_{rm LSV} := max_{v} : |Av|_2 ~:~ |v|_2 le 1

measures the peak gain of the mapping A, in the sense that if the noise vector is bounded in norm by alpha, then the output noise is bounded in norm by alpha |A|. Any vector v which achieves the maximum above corresponds to a direction in input space that is maximally amplified by the mapping A.

The quantity |A|_{rm LSV} is indeed a matrix norm, called the largest singular value (LSV) norm, for reasons seen here. It is perhaps the most popular matrix norm.

The computation of the largest singular value norm of a matrix is not as easy as with the Frobenius norm. Hovewer, it can be computed with linear algebra methods seen here, in about min(n,m)nm flops. While it is more expensive to compute than the Frobenious norm, it is also more useful because it goes beyond capturing the average response to noise.

Matlab syntax
>> lsv_norm = norm(A);

Other norms

Many other matrix norms are possible, and sometimes useful. In particular, we can generalize the notion of peak norm by using different norms to measure vector size in the input and output spaces. For example, the quantity

 |A|_{infty,1} := max_v : |Av|_1 ~:~ |v|_infty le 1

measures the peak gain with inputs bounded in maximum norm, and outputs measured with the l_1-norm.

The norms we have just introduced, the Frobenius and largest singular value norms, are the most popular ones, and are easy to compute. Many other norms are hard to compute.

Applications

Distance between matrices

Matrix norms are ways to measure the size of a matrix. This allows to quantify the difference between matrices.

Assume for example that we are trying to estimate a matrix A, and came up with an estimate hat{A}. How can we measure the quality of our estimate? One way is to evaluate by how much they differ when they act on the standard basis. This leads to the Frobenius norm.

Another way is to look at the difference in the output:

 |Av - hat{A}v|_2

when v runs the whole space. Clearly, we need to scale, or limit the size, of v, otherwise the difference above may be arbitrarily big. Let's look at the worst-case difference when v satisfies |v|_2 le 1. We obtain

 max_{v} : |Av - hat{A}v|_2 ~:~ |v|_2 le 1,

which is the largest singular value norm of the difference A-hat{A}.

Direction of maximal variance

Consider a data set described as a collection of vectors a_1,ldots,a_n, with a_i in mathbf{R}^m. We can gather this data set in a single matrix A = [a_1,ldots,a_n]in mathbf{R}^{m times n}. For simplicity, let us assume that the average vector is zero:

 hat{a} := frac{1}{n} sum_{i=1}^n a_i = 0.

Let us try to visualize the data set by projecting it on a single line passing through the origin. The line is thus defined by a vector x in mathbf{R}^m, which we can without loss of generality assume to be of Euclidean norm 1. The data points, when projected on the line, are turned into real numbers x^Ta_i, i=1,ldots,n.

It can be argued that a good line to project data on is one which spreads the numbers x^Ta_i as much as possible. (If all the data points are projected to numbers that are very close, we will not see anything, as all data points will collapse to close locations.)

We can find a direction in space which accomplishes this, as follows. The average of the numbers is

 frac{1}{n} sum_{i=1}^n a_i^Tx = left( frac{1}{n} sum_{i=1}^n a_iright)^T x = hat{a}^Tx = 0,

while their variance is

 frac{1}{n} sum_{i=1}^n (a_i^Tx - hat{a}^Tx)^2 = frac{1}{n} sum_{i=1}^n (a_i^Tx)^2 = frac{1}{n} x^TAA^Tx = frac{1}{n} |A^Tx|_2^2

The direction of maximal variance is found by computing the LSV norm of A^T

 max_{v} : left{ |A^Tv|_2 ~:~ |v|_2 le 1 right} = |A^T|_{rm LSV} .

(It turns out that this quantity is the same as the LSV norm of A itself.)