Understanding and Exploiting Network Traffic Redundancy
Archit Gupta, Aditya Akella, Srinivasan Seshan, Scott Shenker and Jia Wang
The Internet carries a vast amount and a wide range of content. Some of this content is more popular, and accessed more frequently, than others. The popularity of content could be quite ephemeral - e.g., a Web flash crowd - or much more permanent - e.g., google.com's banner. A direct consequence of the skew in popularity is that, at any time, a fraction of the information carried over the Internet is redundant.
We make two contributions in this paper. First, we study the fundamental properties of the redundancy in the information carried over the Internet, with a focus on network edges. We collect traffic traces at two network edge locations -- a large university's access link serving roughly 50,000 users, and a tier-1 ISP network link connected to a large data center. We conduct several analyses over this data: What fraction of bytes are redundant? What is the frequency at which strings of bytes repeat across different packets? What is the overlap in the information accessed by distinct groups of end-users?
Second, we leverage our measurement observations in the design of a family mechanisms for eliminating redundancy in network traffic and improving the overall network performance. The mechanisms we proposed can improve the available capacity of single network links as well as balance load across multiple network links.
Download this report (PDF)
Return to tech report index