Badger: An Entropy-Based Web Search Clustering System with Randomization and Voting
Lidan Wang, Chloe Whyte Schulze
We have implemented and improved an entropy-based clustering algorithm. In addition to utilizing entropy as a clustering mechanism, our algorithm, Badger, uses randomization and a voting scheme to improve the quality of the resulting clusters. Using parsed web search result snippets, we have tested our algorithm and compared it against EigenCluster, a clustering meta-search engine developed by a research group at MIT. Our algorithm performs comparably to EigenCluster, but with slightly more overhead due to the extra work of the randomization step. We have found entropy to be a valid and interesting measure of document similarity and additionally we find it produces cohesive clusters.
Download this report (PDF)
Return to tech report index