Massive Data Streams in Graph Theory and Computational Geometry

Full textClick to download.
CitationPhD Thesis, Yale University, 2005
AuthorJian Zhang


Streaming is an important paradigm for handling data sets that are too large to fit in main memory. In the streaming computational model, algorithms are restricted to using much less space than they would need to store the input. The massive data set is accessed in a sequential fashion and, therefore, can be viewed as a stream of data elements. The order of the data elements in the stream is not controlled by the algorithm. There are three important resources considered in the streaming model: the size of the workspace, the number of passes that the algorithm makes over the stream, and the time to process each data element in the stream. In this thesis, we study computational-geometry problems and graph problems in the streaming model. We design algorithms for computing diameter in the streaming and the sliding-window models and prove some corresponding lower bounds. The sliding-window model is a variation of the streaming model in which only more recent data elements in the stream are considered. Previously, work in the streaming model has focused on algorithms that use polylog space. For graph problems, this restriction is too strong. We investigate the computational power of the model with different space restrictions. For a graph of n vertices, our lower bounds show that not many interesting computations can be done when the space restriction is o(n). We also show that, by using n*polylog(n) space, one can design algorithms that provide approximations to some basic problems, e.g., the all-pairs, shortest-path distance problem. We present two main algorithms that approximate these distances. We also exhibit tradeoffs between the accuracy of the results and the resources used by the algorithms.

Back to publications
Back to previous page