The Limits of Buffering: A Tight Lower Bound for Dynamic Membership in the External Memory Model
by Qin Zhang
HKUST

Time: 2PM on Wednesday, Feb 24, 2010
Venue: 2578

Abstract:
We study the dynamic membership (or dynamic dictionary) problem, which is one of the most fundamental
problems in data structures. We study the problem in the external memory model with cell size
b bits and cache size m bits. We prove that if the amortized cost of updates is at most 0.999 (or any other
constant < 1), then the query cost must be ­(log_{b log n}(n/m)), where n is the number of elements in the
dictionary. This is a threshold phenomenon for data structures: when the update time is allowed to be
1 + o(1), then a bit vector or hash table give query time O(1).

This lower bound answers a folklore conjecture in the external memory community. Since almost
any data structure task can solve membership, our lower bound implies a dichotomy result: that when
designing external-memory data structures, one needs to choose between two alternatives: (i) make the
update time at least 1 (so the data structure does not buffer, and the cache is almost useless), or (ii) make
the query time at least roughly logarithmic in n. Our result holds even when the updates and queries
are chosen uniformly at random; it holds for randomized data structures, holds when the universe size is
O(n), and does not make any restrictive assumptions such as indivisibility.

The lower bound has some striking implications for external memory data structures. It shows that
the query complexities of many problems such as 1D-range counting, predecessor, rank-select, and many
others, are all the same in the regime where the update time is less than 1.

The proof of our lower bound is based on a new combinatorial lemma called the Lemma Of Surprising
Intersections (LOSI) which allows us to use a proof methodology where we  first analyze the
intersection structure of the positive queries by using encoding arguments, and then use statistical arguments
to deduce properties of the intersection structure of all queries, even the negative ones. All other
data structure arguments that we know cannot argue anything about the negative queries, since their
structure cannot be understood by conventional arguments. Therefore we believe that the LOSI and this
proof methodology will find future uses for other problems. 

Joint work with Elad Verbin