Title: I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis
Speaker: Ke Yi
Duke

Date: Friday,  March 3, 2006
Time 11-12 Venue: Room 3416, HKUST   
[Note that venue is different than usual!!]


Abstract 

Despite extensive study over the last four decades and numerous
applications, no I/O-efficient algorithm is known for the union-find
problem.  In this talk I will present an I/O-efficient algorithm for the
batched (off-line) version of the union-find problem.  Given any sequence
of N mixed union and find operations, where each union operation joins two
distinct sets, our algorithm uses O(sort(N)) = O(N/B log_{M/B}(N/B)) I/Os,
where M is the memory size and B is the disk block size.  This bound is
asymptotically optimal in the worst case.  If there are union operations
that join a set with itself, our algorithm uses O(sort(N) + mst(N)) I/Os,
where mst(N) is the number of I/Os needed to compute the minimum spanning
tree of a graph with N edges.

The main motivation for our study of the union-find problem arises from
problems in terrain analysis.  A terrain can be abstracted as a height
function defined over R^2, and many problems that deal with such functions
require a union-find data structure.  I will talk about two terrain
analysis problems that benefit from a union-find data structure: (i)
computing topological persistence and (ii) constructing the contour
tree.  We give the first O(sort(N))-I/O algorithms for these two problems,
assuming that the input terrain is represented as a triangular mesh with N
vertices.


Bio:

Ke Yi got his B.E. in Computer Science from Tsinghua University, China, in
2001.  He is currently a Ph.D. candidate in the Department of Computer
Science at Duke University, and is expected to complete all degree
requirements by August, 2006.  During his Ph.D. study he also spent two
summers as an intern at IBM T.J. Watson Research Center and AT&T Labs -
Research, respectively.  His primary research interest is algorithms, with
a focus on I/O-efficient algorithms, but he is also interested in many
problems that arise from database systems.