Abstract: Standard SVM training has O(m^3) time and O(m^2) space complexities, where m is the training set size. In this paper, we scale up kernel methods by exploiting the "approximateness" in practical SVM implementations. We formulate many kernel methods as equivalent minimum enclosing ball problems in computational geometry, and then obtain provably approximately optimal solutions efficiently with the use of core-sets. Our proposed Core Vector Machine (CVM) algorithm has a time complexity that is linear in m and a space complexity that is independent of m. Experiments on large toy and real-world data sets demonstrate that the CVM is much faster and can handle much larger data sets than existing scale-up methods. In particular, on our PC with only 512M RAM, the CVM with Gaussian kernel can process the checkerboard data set with 1 million points in less than 13 seconds.
Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics (AISTATS), Barbados, January 2005.