Title: Independent Range Sampling Speaker: Yufei Tao, Chinese University of Hong Kong Time/Date: Friday, Mar 14, 11-12 Location: Room 3464 (Math/CS Conference Room) Abstract: In this talk, we will discuss the "independent range sampling" problem. The input is a set P of n points in the real domain. Given an interval q = [x, y] and an integer t > = 1, a query returns t elements randomly sampled with replacement from the intersection of P and q. This sample bag (a.k.a., a multi-set) must be independent from those returned by all previous queries. The objective is to store P in a structure for answering all queries efficiently. If P fits in memory, the problem is interesting when P is dynamic (i.e., allowing insertions and deletions). The state of the art is a structure of O(n) space that answers a query in O(t \log n) time, and supports an update in O(\log n) time. We describe a new structure of O(n) space that answers a query in O(\log n + t) expected time, and supports an update in O(\log n) time. If P does not fit in memory, the problem is challenging even when P is static. The best known structure incurs O(\log_B n + t) I/Os per query, where B is the block size. We develop a new structure of O(n/B) space that answers a query in O(\logstar (n/B) + \log_B n + (t/B) \log_{M/B} (n/B)) amortized expected I/Os, where M is the memory size, and \logstar (n/B) is the number of iterative \log_2(.) operations we need to perform on n/B before going below a constant. We also give a lower bound argument showing that this is nearly optimal---in particular, the multiplicative term \log_{M/B} (n/B) is necessary.