Full text | Click to download. |
Citation | Yale University Technical Report YALEU/DCS/TR-1271, 2004
|
Authors | Petros Drineas
Ravi Kannan Michael W. Mahoney |
In many applications, the data consist of (or may be naturally formulated as) an m x n
matrix A which may be stored on disk but which is too large to be read into RAM or to
practically perform superlinear polynomial time computations on it. Two algorithms are
presented which, when given an m x n matrix A, compute approximations to
A which are the product of three smaller matrices, C, U,and R,
each of which may be computed rapidly. Let A'= CUR be the computed approximate
decomposition; both algorithms have provable bounds for the error matrix A - A'. In
the first algorithm, c = O(1) columns of A and r = O(1) rows of A
are randomly chosen. If the m x c matrix C consists of those c columns
of A (after appropriate rescaling) and the r x n matrix R consists
of those r rows of A (also after appropriate rescaling) then the c x
r matrix U may be calculated from C and R. For any matrix
X, let and denote its Frobenius norm
and its spectral norm, respectively. It is proven that
holds in
expectation and with high probability for both \xi = 2,F and for all k =
1,...,rank(A); thus be appropriate choice of k
||A - A'||_2 \le \epsilon||A||_F
also
hold in expectation and with high probability. This algorithm may be implemented without
storing the matrix A in Random Access Memory (RAM), provided it can make two passes
over the matrix stored in external memory and use O(m + n) additional
RAM memory. The second algorithm is similar except that it approximates the matrix C
by randomly sampling O(1) rows of C. Thus, it has additional error but it can
be implemented in three passes over the matrix using only constant additional RAM memory.
To achieve an additional error (beyond the best rank k approximation) that is at
most \epsilon||A||_F, both algorithms take time which is a low-dgree polynomial in
k, 1/\epsilon, and 1/\sigma, where \sigma>0 is a failure probability; the first takes time linear in
max(m,n) and the second takes time independent of m and n. The proofs
for the error bounds make important use of matrix perturbation theory and previous work on
approximating matrix multiplication and computing low-rank approximations to a matrix. The
probability distribution over columns and rows and the rescaling are crucial features of the
algorithms and must be chosen judiciously.