Cache-Oblivious Hashing

by 

Zhewei Wei
HKUST

Time: 11am on Friday, June 4, 2010
Venue: 3530

Abstract:
The hash table, especially its external memory version, is
one of the most important index structures in large databases.
Assuming a truly random hash function, it is known that in a
standard external hash table with block size b, searching for
a particular key only takes expected average t_q = 1+1/2^Omega(b)
disk accesses for any load factor alpha bounded away from 1.
However, such near-perfect performance is achieved only
when b is known and the hash table is particularly tuned
for working with such a blocking. In this paper we study if
it is possible to build a cache-oblivious hash table that works
well with any blocking. Such a hash table will automatically
perform well across all levels of the memory hierarchy and
does not need any hardware-specific tuning, an important
feature in autonomous databases.

We first show that linear probing, a classical collision res-
olution strategy for hash tables, can be easily made cache-
oblivious but it only achieves t_q = 1 + O(alpha/b). Then we
demonstrate that it is possible to obtain t_q = 1 + 1/2^Omega(b),
thus matching the cache-aware bound, if the following two
conditions hold: (a) b is a power of 2; and (b) every block
starts at a memory address divisible by b. Both conditions
hold on a real machine, although they are not stated in the
cache-oblivious model. Interestingly, we also show that nei-
ther condition is dispensable: if either of them is removed,
the best obtainable bound is t_q = 1 + O(alpha/b), which is ex-
actly what linear probing achieves.

This is joint work with Rasmus Pagh, Ke Yi, and Qin Zhang.