There will be two short talks on June 26, 2009 ************************************************************************* Title: Indexing Uncertain Data Speaker: Ke Yi HKUST Date: Friday June 26, 2009 Time 11:00-11:50 Venue: Room 3464, HKUST Abstract: Querying uncertain data has emerged as an important problem in data management due to the imprecise nature of many measurement data. In this paper we study answering range queries over uncertain data. Specifically, we are given a collection $P$ of $n$ points in $\reals$, each represented by its one-dimensional probability density function (pdf). The goal is to build an index on $P$ such that given a query interval $I$ and a probability threshold $\tau$, we can quickly report all points of $P$ that lie in $I$ with probability at least $\tau$. We present various indexing schemes with linear or near-linear space and logarithmic query time. Our schemes support pdf's that are either histograms or more complex ones such as Gaussian or piecewise algebraic. They also extend to the external memory model in which the goal is to minimize the number of disk accesses when querying the index. This is joint work with Pankaj K. Agarwal, Siu-Wing Cheng, and Yufei Tao ************************************************************************* Title: Optimal Tracking of Distributed Heavy Hitters and Quantiles Speaker: Zhang Qin HKUST Date: Friday June 26, 2009 Time 11:00-11:50 Venue: Room 3464, HKUST Abstract: We consider the the problem of tracking heavy hitters and quantiles in the distributed streaming model. The heavy hitters and quantiles are two important statistics for characterizing a data distribution. Let $A$ be a multiset of elements, drawn from the universe $U=\{1,\dots,u\}$. For a given $0 \le \phi \le 1$, the $\phi$-heavy hitters are those elements of $A$ whose frequency in $A$ is at least $\phi |A|$; the $\phi$-quantile of $A$ is an element $x$ of $U$ such that at most $\phi|A|$ elements of $A$ are smaller than $A$ and at most $(1-\phi)|A|$ elements of $A$ are greater than $x$. Suppose the elements of $A$ are received at $k$ remote {\em sites} over time, and each of the sites has a two-way communication channel to a designated {\em coordinator}, whose goal is to track the set of $\phi$-heavy hitters and the $\phi$-quantile of $A$ approximately at all times with minimum communication. We give tracking algorithms with worst-case communication cost $O(k/\eps \cdot \log n)$ for both problems, where $n$ is the total number of items in $A$, and $\eps$ is the approximation error. This substantially improves upon the previous known algorithms. We also give matching lower bounds on the communication costs for both problems, showing that our algorithms are optimal. We also consider a more general version of the problem where we simultaneously track the $\phi$-quantiles for all $0 \le \phi \le 1$.