End-Biased Samples for Join Cardinality Estimation
Cristian Estan, Jeffrey F. Naughton
2005
We present a new technique for using samples to estimate join cardinalities. This technique, which we term �end-biased samples,� is inspired by recent work in network traffic measurement. It improves on random samples by using coordinated pseudo-random samples and retaining the sampled values in proportion to their frequency. We show that end-biased samples always provide more accurate estimates than random samples with the same sample size. The comparison with histograms is more interesting � while end-biased histograms are somewhat better than end-biased samples for uncorrelated data sets, end-biased samples dominate by a large margin when the data is correlated. Finally, we compare end-biased samples to the recently proposed �skimmed sketches� and show that neither dominates the other, that each has different and compelling strengths and weaknesses. These results suggest that endbiased samples may be a useful addition to the repertoire of techniques used for data summarization.
Download this report (PDF)
Return to tech report index
|