The Limits of Buffering: A Tight Lower Bound for Dynamic Membership in the External Memory Model by Qin Zhang HKUST Time: 2PM on Wednesday, Feb 24, 2010 Venue: 2578 Abstract: We study the dynamic membership (or dynamic dictionary) problem, which is one of the most fundamental problems in data structures. We study the problem in the external memory model with cell size b bits and cache size m bits. We prove that if the amortized cost of updates is at most 0.999 (or any other constant < 1), then the query cost must be ­(log_{b log n}(n/m)), where n is the number of elements in the dictionary. This is a threshold phenomenon for data structures: when the update time is allowed to be 1 + o(1), then a bit vector or hash table give query time O(1). This lower bound answers a folklore conjecture in the external memory community. Since almost any data structure task can solve membership, our lower bound implies a dichotomy result: that when designing external-memory data structures, one needs to choose between two alternatives: (i) make the update time at least 1 (so the data structure does not buffer, and the cache is almost useless), or (ii) make the query time at least roughly logarithmic in n. Our result holds even when the updates and queries are chosen uniformly at random; it holds for randomized data structures, holds when the universe size is O(n), and does not make any restrictive assumptions such as indivisibility. The lower bound has some striking implications for external memory data structures. It shows that the query complexities of many problems such as 1D-range counting, predecessor, rank-select, and many others, are all the same in the regime where the update time is less than 1. The proof of our lower bound is based on a new combinatorial lemma called the Lemma Of Surprising Intersections (LOSI) which allows us to use a proof methodology where we first analyze the intersection structure of the positive queries by using encoding arguments, and then use statistical arguments to deduce properties of the intersection structure of all queries, even the negative ones. All other data structure arguments that we know cannot argue anything about the negative queries, since their structure cannot be understood by conventional arguments. Therefore we believe that the LOSI and this proof methodology will find future uses for other problems. Joint work with Elad Verbin