Title: I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis Speaker: Ke Yi Duke Date: Friday, March 3, 2006 Time 11-12 Venue: Room 3416, HKUST [Note that venue is different than usual!!] Abstract Despite extensive study over the last four decades and numerous applications, no I/O-efficient algorithm is known for the union-find problem. In this talk I will present an I/O-efficient algorithm for the batched (off-line) version of the union-find problem. Given any sequence of N mixed union and find operations, where each union operation joins two distinct sets, our algorithm uses O(sort(N)) = O(N/B log_{M/B}(N/B)) I/Os, where M is the memory size and B is the disk block size. This bound is asymptotically optimal in the worst case. If there are union operations that join a set with itself, our algorithm uses O(sort(N) + mst(N)) I/Os, where mst(N) is the number of I/Os needed to compute the minimum spanning tree of a graph with N edges. The main motivation for our study of the union-find problem arises from problems in terrain analysis. A terrain can be abstracted as a height function defined over R^2, and many problems that deal with such functions require a union-find data structure. I will talk about two terrain analysis problems that benefit from a union-find data structure: (i) computing topological persistence and (ii) constructing the contour tree. We give the first O(sort(N))-I/O algorithms for these two problems, assuming that the input terrain is represented as a triangular mesh with N vertices. Bio: Ke Yi got his B.E. in Computer Science from Tsinghua University, China, in 2001. He is currently a Ph.D. candidate in the Department of Computer Science at Duke University, and is expected to complete all degree requirements by August, 2006. During his Ph.D. study he also spent two summers as an intern at IBM T.J. Watson Research Center and AT&T Labs - Research, respectively. His primary research interest is algorithms, with a focus on I/O-efficient algorithms, but he is also interested in many problems that arise from database systems.