Full text  Click to download. 
Citation  Yale University Technical Report YALEU/DCS/TR1270, 2004

Authors  Petros Drineas
Ravi Kannan Michael W. Mahoney 
In many applications, the data consists of (or may be naturally formulated as) an m x
n matrix A. It is often of interest to find a lowrank approximation to
A, i.e., an approximation D to the matrix A of rank not greater than
a specified rank k, where k is much smaller than m and n.
Methods such as the Singular Value Decomposition (SVD) may be used to find an
approximation to A which is the best in a well defined sense. These methods require
memory and time which are superlinear in m and n; for many applications in
which the data sets are very large this is prohibitive. Two simple and intuitive
algorithms are presented which, when given an m x n matrix A, compute a
description of a lowrank approximation D* to A, and which are qualitatively
faster than the SVD. Both algorithms have provable bounds for the error matrix A  D*.
For any matrix X, let X_F and X_2 denote its
Frobenius norm and its spectral norm, respectively. In the first algorithm, c = O(1)
columns of A are randomly chosen. If the m x c matrix C consists
of those c columns of A (after appropriate rescaling) then it is shown that
from C^TC approximations to the top singular values and corresponding singular vectors
may be computed. From the computed singular vectors a description of D* of the
matrix A may be computed such that rank (insert equation here) and such that
AD*_\xi^2 \le min_{D:rank(D)\le k}AD_\xi^2+poly(k,1/c)A_F^2
holds with high probability for \xi = 2,f. This algorithm may
be implemented without storing the matrix A in Random Access Memory (RAM), provided
it can make two passes over the matrix stored in external memory and use O(m +
n) additional RAM memory. The second algorithm is similar except that it further
approximates the matrix C by randomly sampling r = O(1) rows of C to
form a r x c matrix W. Thus, it has additional error, but it can be
implemented in three passes over the matrix using only constant additional RAM memory. To
achieve an additional error (beyond the best rank k approximation) that is at most
\epsilonA_F^2, both algorithms take time which is polynomial in k, 1/\epsilon,
and log (1/\sigma), where \sigma > 0 is a failure probability; the first takes
time linear in max(m,n) and the second takes time independent of m and n.
Our bounds improve previously published results with respect to the rank parameter k
for both the Frobenius and spectral norms. In addition, the proofs for the error bounds use
a novel method that makes important use of matrix perturbation theory. The probability
distribution over columns of A and the rescaling are crucial features of the
algorithms which must be chosen judiciously.