Computer Sciences Dept.

Badger: An Entropy-Based Web Search Clustering System with Randomization and Voting

Lidan Wang, Chloe Whyte Schulze

We have implemented and improved an entropy-based clustering algorithm. In addition to utilizing entropy as a clustering mechanism, our algorithm, Badger, uses randomization and a voting scheme to improve the quality of the resulting clusters. Using parsed web search result snippets, we have tested our algorithm and compared it against EigenCluster, a clustering meta-search engine developed by a research group at MIT. Our algorithm performs comparably to EigenCluster, but with slightly more overhead due to the extra work of the randomization step. We have found entropy to be a valid and interesting measure of document similarity and additionally we find it produces cohesive clusters.

Download this report (PDF)

Return to tech report index

Computer Science | UW Home