Content-Based Routing for Continuous Query-Optimization
Pedro Bizarro, David DeWitt, Shivnath Babu, Jennifer Widom
Current Data Stream Management Systems do not fully exploit their adaptive nature to handle complex queries. To date, such systems route stream tuples to operators or operator paths based only on operator-level statistics. Their optimizers ignore non-independent distributions, attribute correlations, and tuple content. In this paper; we propose a content-based tuple routing approach which, together withz histogram-like statistics, allows a stream query processing system to exploit non-independent distributions and correlations instead of being hurt by them. We present a framework for content-based routing in a stream query processing system and an algorithm for learning content-based routes automatically and efficiently. We present an extensive experimental evaluation of content-based routing based on a prototype implementation in TelegraphCQ. Our results clearly indicate that good content-based routes can be learned quickly and efficiently to improve query performance significantly. We believe that any system that processes complex queries over possibly non-uniform data, even in a non-stream environment, can profit by being simultaneously adaptive and content-aware.
Download this report (PDF)
Return to tech report index