Title: Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window Speaker: Lap-Kei Lee HKU Date: Friday May 29, 2009 Time 11:00-11:50 Venue: Room 3530 (Lifts 25/26), HKUST Abstract: The past decade has witnessed many interesting algorithms for maintaining statistics over a data stream. While the algorithm community has gradually extended their work from the whole data stream to a sliding window, the database community is getting interested in the communication cost of monitoring multiple, distributed data streams. We initiate a mathematical study of algorithms for monitoring distributed data streams over a time-based sliding window (which contains a variable number of items and possibly out-of-order items). The concern is how to minimize the communication between individual streams and the root, while allowing the root, at any time, to be able to report the global statistics of all streams within a given error bound. We present communication-efficient algorithms for four classical statistics, namely, basic counting, approximate counting, frequent items and quantiles. The worst-case communication cost over a window is O(log(e B)/e) bits for basic counting and O((log B)/e) words for the remaining, where B is the maximum number of items that can arrive in a time window, and e < 1 is the desired error bound. Matching and almost matching lower bounds are also presented. The upper bound results directly imply those for the whole data stream. Joint work with Ho-Leung Chan, Tak-Wah Lam and Hing-Fung Ting.