Application: data visualization by projection on a lineVectors > Basics | Scalar product, Norms | Projection on a line | Orthogonalization | Hyperplanes | Linear functions | Application
Senate voting data
Visualization of high-dimensional data via projectionAs seen in the picture above, simply plotting the raw data is often not very informative. We can try to visualize the data set, by projecting each data point (each row or column of the matrix) on (say) a one-, two- or three-dimensional space. Each ‘‘view’’ corresponds to a particular projection, that is, a particular one-, two- or three-dimensional subspace on which we choose to project the data. Let us detail what it means to project on a one-dimensional set, that is, on a line. Projecting on a line allows to assign a single number, or ‘‘score’’, to each data point, via a scalar product. We choose a (normalized) direction , and a scalar . This corresponds to the affine ‘‘scoring’’ function , which, to a generic data point , assigns the value We thus obtain a vector of values , with components , . It is often useful to center these scores around zero. This can be done by choosing v such that The zero-mean condition implies , where is the vector of sample averages of the different data points. The vector can be interpreted as the ‘‘average response’’ across data points (the average vote across Senators in our running example). The values of our scoring function can now be expressed as In order to be able to compare the relative merits of different directions, we can assume, without loss of generality, that the direction vector u is normalized (so that ). Note that our definition of above is consistent wit idea of projecting the data points on the line passing through the origin and with normalized direction . Indeed, the component of on the line is . In the Senate voting example above, a particular projection (that is, a direction in ) corresponds to assigning a ‘‘score’’ to each Senator, and thus represent all the Senators as a single value on a line. We will project the data along a vector in the ‘‘bill’’ space, which is . That is, we are going to form linear combinations of the bills, so that the votes for each Senator is reduced to a single number, or ‘‘score’’. Since we centered our data, the average score (across Senators) is zero. ExamplesProjection on a random directionProjection on the ‘‘all-ones’’ vectorClearly, not all directions are ‘‘good’’, in the sense of producing informative plots. Here, we discuss a general principle that allows to choose an ‘‘informative’’ direction. But for this data set, a good guess could be to choose the direction that corresponds to the ‘‘average bill’’. That is, we choose the direction to be the parallel to the vector of ones in , scaled appropriately so that its Euclidean norm is one. |