|Full text||Click to download.|
|Citation||PhD Thesis, Yale University Computer Science Department, 2008.
Partitioning of a set objects into a number of clusters according to a suitable distance metric is one of the major questions in data mining, information retrieval and other fields of computer science. This thesis describes and analyzes algorithms for clustering and partitioning data generated from random and pseudo-random models. In random models, one assumes that the data matrix to be partitioned is generated from a simple distribution. A common feature of all the algorithms analyzed is that they are all spectral algorithms, as they employ information about the spectrum of the data matrix to perform clustering. We provide new results in a number of directions. A method based on the second singular vector is analyzed for a mixture model for graphs. The use of the notion of pseudo-randomness is another important aspect of our work. Pseudo-randomness, the idea to use deterministic definitions to capture properties of randomness, is used to extend the notion of probabilistic models, thus making it possible to model clustering problems for sparse (constant-degree) graphs. This work also proposes the first geometric, projection based algorithms known for discrete distributions, which allows a number of generalizations. In addition, entrywise bounds for eigenvectors of adjacency matrices of random graphs are studied, and their implications for spectral algorithms are considered.
Back to publications
Back to previous page