Senate Voting: Projecting Data
Projecting on a lineTo simplify, let us first consider the simple problem of representing the high-dimensional data set on a simple line, using the method described here. Scoring SenatorsSpecifically we would like to assign a single number, or ‘‘score’’, to each column of the matrix. We choose a direction , and a scalar . This corresponds to the affine ‘‘scoring’’ function , which, to a generic column of the data matrix, assigns the value We thus obtain a vector of values , with , . It is often useful to center these values around zero. This can be done by choosing such that that is: , where is the vector of sample averages across the columns of the matrix (that is, data points). The vector can be interpreted as the ‘‘average response’’ across experiments. The values of our scoring function can now be expressed as In order to be able to compare the relative merits of different directions, we can assume, without loss of generality, that the vector is normalized (so that ). Centering dataIt is convenient to work with the ‘‘centered’’ data matrix, which is where is the vector of ones in . In matlab, we can compute the centered data matrix as follows. Matlab syntax
>> xhat = mean(X,2); >> [m,n] = size(X); >> Xcent = X-xhat*ones(1,n); We can compute the (row) vector scores using the simple matrix-vector product: We can check that the average of the above row vector is zero: Example: visualizing along random directionProjection on a planeWe can also try to project the data on a plane, which involves assigning two scores to each data point. Scoring mapThis corresponds to the affine ‘‘scoring’’ map , which, to a generic column of the data matrix, assigns the two-dimensional value where are two vectors, and two scalars, while , . The affine map allows to generate two-dimensional data points (instead of -dimensional) , . As before, we can require that the 's be centered: by choosing the vector to be such that , where is the ‘‘average response’’ defined above. Our (centered) scoring map takes the form We can encapsulate the scores in the matrix . The latter can be expressed as the matrix-matrix product with the centered data matrix defined above. Clearly, depending on which plan we choose to project on, we get a very different pictures. Some planes seem to be more ‘‘informative’’ than others. We return to this issue here.
|