Full text  Click to download. 
Citation  Proceedings of the Tenth International Conference on
Database Theory (ICDT), 2005

Authors  K. Munagala
Shivnath Babu Rajeev Motwani Jennifer Widom 
The pipelined filters problem in databases is the problem of finding the optimal ordering of a set of dependent selection or join operators in a query processor. We provide an abstraction of this problem as a generalization of set cover called pipelined set cover, where the sets are applied sequentially to the universe and the covered elements are discarded. The cost of including a set in pipelined set cover is directly proportional to the number of uncovered elements at the point of applying the set. The cost therefore depends on the ordering of the sets, in addition to the sets themselves. We show that several natural heuristics for this NPhard problem (such as greedy set cover and local search) can be analyzed using a common linearprogramming framework, which bounds not only the approximation ratio, but also the running time of the corresponding algorithms. In particular, we show that the greedy and local search algorithms are 4approximations for both uniform and nonuniform processing costs, which is in contrast to the logarithmic performance guarantees achievable for classical set cover. We extend our analysis to minimize the Lpnorm of the costs paid by the sets, where p >= 2 is an integer, to examine the improvement in performance when the total cost has increasing contribution from the initial sets in the pipeline. Using a novel Lagrangianrelaxation analysis, we show that the greedy algorithm is a 9^(1/p)approximation for the uniform processing cost model, while the local search algorithm is a 4^(1/p)approximation when the costs are nonuniform. Our analysis framework may be applicable to other problems. The algorithms we consider have effective and efficient implementations in a datastream processing system.