 #### Fast Monte Carlo Algorithms for Matrices II: Computing Low-Rank Approximations to a Matrix

 Full text Click to download. Citation Yale University Technical Report YALEU/DCS/TR-1270, 2004 Authors Petros Drineas Ravi Kannan Michael W. Mahoney

#### Abstract

In many applications, the data consists of (or may be naturally formulated as) an m x n matrix A. It is often of interest to find a low-rank approximation to A, i.e., an approximation D to the matrix A of rank not greater than a specified rank k, where k is much smaller than m and n. Methods such as the Singular Value Decomposition (SVD) may be used to find an approximation to A which is the best in a well defined sense. These methods require memory and time which are superlinear in m and n; for many applications in which the data sets are very large this is prohibitive. Two simple and intuitive algorithms are presented which, when given an m x n matrix A, compute a description of a low-rank approximation D* to A, and which are qualitatively faster than the SVD. Both algorithms have provable bounds for the error matrix A - D*. For any matrix X, let ||X||_F and ||X||_2 denote its Frobenius norm and its spectral norm, respectively. In the first algorithm, c = O(1) columns of A are randomly chosen. If the m x c matrix C consists of those c columns of A (after appropriate rescaling) then it is shown that from C^TC approximations to the top singular values and corresponding singular vectors may be computed. From the computed singular vectors a description of D* of the matrix A may be computed such that rank (insert equation here) and such that

||A-D*||_\xi^2 \le min_{D:rank(D)\le k}||A-D||_\xi^2+poly(k,1/c)||A||_F^2

holds with high probability for \xi = 2,f. This algorithm may be implemented without storing the matrix A in Random Access Memory (RAM), provided it can make two passes over the matrix stored in external memory and use O(m + n) additional RAM memory. The second algorithm is similar except that it further approximates the matrix C by randomly sampling r = O(1) rows of C to form a r x c matrix W. Thus, it has additional error, but it can be implemented in three passes over the matrix using only constant additional RAM memory. To achieve an additional error (beyond the best rank k approximation) that is at most \epsilon||A||_F^2, both algorithms take time which is polynomial in k, 1/\epsilon, and log (1/\sigma), where \sigma > 0 is a failure probability; the first takes time linear in max(m,n) and the second takes time independent of m and n. Our bounds improve previously published results with respect to the rank parameter k for both the Frobenius and spectral norms. In addition, the proofs for the error bounds use a novel method that makes important use of matrix perturbation theory. The probability distribution over columns of A and the rescaling are crucial features of the algorithms which must be chosen judiciously.