Dipsea: A Modular Distributed Hash Table

Full textClick to download.
CitationPhD Thesis, Stanford University, 2004
AuthorG. S. Manku


A Distributed Hash Table (DHT) is a giant hash table that is cooperatively maintained by a large number of machines worldwide. The machines join and leave the system autonomously. The unprecedented scale and dynamism of the system calls for novel design techniques which emphasize decentralization and automatic re-configuration. Briefly, each machine in a DHT is assigned an ID in I = [0; 1). The set of IDs divides I into disjoint partitions, managed by one machine each. As a function of their IDs, the machines set up connections among themselves. These connections are used for routing messages between the machines. The challenge lies in devising efficient decentralized algorithms for ID management and connection maintenance.

We propose Dipsea, a modular architecture for building DHTs, consisting of three layers: ID Management, Overlay Routing and Data Management. This thesis focuses on the first two layers. The modularity of Dipsea imbues the overall system with several good properties. A large complex problem is broken down into smaller sub-problems, each of which can be attacked more or less independently. This contributes to reduction in complexity: It is possible to explore the design space of a sub-problem without being encumbered by its interactions with other sub-problems. Furthermore, the best solutions for individual sub-problems can be identified and put together to arrive at an overall design that is far more powerful than a design arrived at by a holistic approach.

Dipsea places existing DHT designs and improvements suggested for various DHTs into a common algorithmic framework. A significant accomplishment is the identification of layers and modules which are cleanly separated on the basis of functionality. Then for each module, we devise and analyze efficient algorithms. Almost all of the algorithms we propose are the currently best-known algorithms for the corresponding modules.

Highlights of our contributions in ID Management and Overlay Routing layers:

1. A simple, decentralized ID Management algorithm which is independent of the Overlay Routing layer. The algorithm requires O(R + log n) messages and only one reassignment of existing IDs in response to arrival or departure of machines. R denotes the average number of messages required by the Overlay Routing layer, and n denotes the current number of machines in the system.

2. A generalization of the above scheme whose analysis requires the solution to a novel Structured Coupon Collection Problem over cliques with multiple-choices per trial.

3. The design of an Emulation Engine which can emulate arbitrary families of deterministic and randomized routing networks. The Emulation Engine makes the design of Dipsea a significant improvement over existing DHT implementations, all of which are tied to specific families of routing networks.

4. Characterization of shortest paths in Chord, a family of deterministic routing networks that has been designed for DHTs.

5. The design of Papillon, a butterfly-based family of graphs defined over nodes placed in a circle. Papillon supports efficient greedy routing, in which each node forwards a message along that out-going edge which reduces the clockwise distance to the destination by the largest amount. In an n-node graph, Papillon routes in O(log n / log d) hops with d links per node, which is asymptotically optimal.

6. The design of Symphony, one of the first randomized routing networks proposed for DHT routing. With k links per node, clockwise greedy routing takes O( 1/k log^2(n)) hops on average.

7. Tight analysis of clockwise greedy routing with/without lookahead in several randomized routing networks including Symphony, randomized Chord, and randomized hypercube. The idea underlying ``greedy with lookahead'' is to allow a node to use knowledge of its neighbor's neighbors for better routing decisions. We show that greedy routing without lookahead requires Theta(log n) hops on average whereas greedy with lookahead entails only Theta(log n / log log n) hops on average.

8. The design of Mariposa, an interesting combination of butter y networks and Kleinberg's small-world construction. Mariposa also offers routes of O(log n / log d) hops in the worst case, with d out-going links per node.

Back to publications
Back to previous page