Approximating Streaming Window Joins Under CPU Limitations
Ahmed Ayad, Jeffrey Naughton, Stephen Wright, Utkarsh Srivastava
Data streaming systems face the possibility of having to shed load in the case of CPU or memory resource limitations. In this paper, we study the CPU limited scenario in detail. First, we propose a new model for the CPU cost. Then we formally state the problem of shedding load for the goal of obtaining the maximum possible subset of the complete answer, assuming an offline scenario. We then present an online strategy for semantic load shedding, assuming a frequency model. Moving on to random load shedding, we prove that having an open memory budget does not help the CPU constrained case. We then present a random load shedding strategy – Probe-No-Insert – based on decoupling the window maintenance and tuple production operations of the symmetric hash join, and prove that it always dominates the previously proposed coin flipping strategy. Finally, we investigate the problem in case the goal is to obtain a random sample of the join.
Download this report (PDF)
Return to tech report index