Fast Monte Carlo Algorithms for Matrices III: Computing an Efficient Approximate Decomposition of a Matrix

Full textClick to download.
CitationYale University Technical Report YALEU/DCS/TR-1271, 2004
AuthorsPetros Drineas
Ravi Kannan
Michael W. Mahoney


In many applications, the data consist of (or may be naturally formulated as) an m x n matrix A which may be stored on disk but which is too large to be read into RAM or to practically perform superlinear polynomial time computations on it. Two algorithms are presented which, when given an m x n matrix A, compute approximations to A which are the product of three smaller matrices, C, U,and R, each of which may be computed rapidly. Let A'= CUR be the computed approximate decomposition; both algorithms have provable bounds for the error matrix A - A'. In the first algorithm, c = O(1) columns of A and r = O(1) rows of A are randomly chosen. If the m x c matrix C consists of those c columns of A (after appropriate rescaling) and the r x n matrix R consists of those r rows of A (also after appropriate rescaling) then the c x r matrix U may be calculated from C and R. For any matrix X, let ||X||_F and ||X||_2 denote its Frobenius norm and its spectral norm, respectively. It is proven that

||A-A'||_\xi \le min_{D:rank(D)\le k}||A-D||_\xi+poly(k,1/c)||A||_F

holds in expectation and with high probability for both \xi = 2,F and for all k = 1,...,rank(A); thus be appropriate choice of k

||A - A'||_2 \le \epsilon||A||_F

also hold in expectation and with high probability. This algorithm may be implemented without storing the matrix A in Random Access Memory (RAM), provided it can make two passes over the matrix stored in external memory and use O(m + n) additional RAM memory. The second algorithm is similar except that it approximates the matrix C by randomly sampling O(1) rows of C. Thus, it has additional error but it can be implemented in three passes over the matrix using only constant additional RAM memory. To achieve an additional error (beyond the best rank k approximation) that is at most \epsilon||A||_F, both algorithms take time which is a low-dgree polynomial in k, 1/\epsilon, and 1/\sigma, where \sigma>0 is a failure probability; the first takes time linear in max(m,n) and the second takes time independent of m and n. The proofs for the error bounds make important use of matrix perturbation theory and previous work on approximating matrix multiplication and computing low-rank approximations to a matrix. The probability distribution over columns and rows and the rescaling are crucial features of the algorithms and must be chosen judiciously.

Back to publications
Back to previous page