## Efficient Top-K Query Calculation in Distributed Networks

Pei Cao and Zhe Wang

#### Abstract

This paper presents a new algorithm to answer *top-k* queries (e.g. ``find the *k* objects with the highest aggregate values'') in a distributed
network.
Existing algorithms such as the Threshold Algorithm
consume an excessive amount of bandwidth when the number of nodes,
*m*, is high.
We propose a new algorithm called ``Three-Phase Uniform Threshold'' (TPUT).
TPUT reduces network bandwidth consumption by pruning away ineligible objects,
and terminates in three round-trips regardless of data input.
The paper presents two sets of results about TPUT. First, trace-driven
simulations show that, depending on the size of the network, TPUT reduces
network traffic by one to two orders of magnitude compared to existing
algorithms. Second, TPUT is proven to be instance-optimal on data
series that satisfy a lower bound on the slope of decreases in values.
In particular, analysis shows that by using a pruning parameter less than 1,
TPUT achieves a qualitative reduction in network traffic, for example,
lowering the optimality ratio from $O(m*m)$ to $O(m*\sqrt{m})$ for data series
following Zipf distribution.

The paper in PDF is here.