banner.jpg

The Pipelined Set Cover Problem

Full textClick to download.
CitationProceedings of the Tenth International Conference on Database Theory (ICDT), 2005
AuthorsK. Munagala
Shivnath Babu
Rajeev Motwani
Jennifer Widom

Abstract

The pipelined filters problem in databases is the problem of finding the optimal ordering of a set of dependent selection or join operators in a query processor. We provide an abstraction of this problem as a generalization of set cover called pipelined set cover, where the sets are applied sequentially to the universe and the covered elements are discarded. The cost of including a set in pipelined set cover is directly proportional to the number of uncovered elements at the point of applying the set. The cost therefore depends on the ordering of the sets, in addition to the sets themselves. We show that several natural heuristics for this NP-hard problem (such as greedy set cover and local search) can be analyzed using a common linear-programming framework, which bounds not only the approximation ratio, but also the running time of the corresponding algorithms. In particular, we show that the greedy and local search algorithms are 4-approximations for both uniform and nonuniform processing costs, which is in contrast to the logarithmic performance guarantees achievable for classical set cover. We extend our analysis to minimize the Lp-norm of the costs paid by the sets, where p >= 2 is an integer, to examine the improvement in performance when the total cost has increasing contribution from the initial sets in the pipeline. Using a novel Lagrangian-relaxation analysis, we show that the greedy algorithm is a 9^(1/p)-approximation for the uniform processing cost model, while the local search algorithm is a 4^(1/p)-approximation when the costs are nonuniform. Our analysis framework may be applicable to other problems. The algorithms we consider have effective and efficient implementations in a data-stream processing system.

Back to publications
Back to previous page