# Core Vector Machines: Fast SVM Training on Very Large Data Sets

### Ivor W. Tsang, James T. Kwok and Pak-Ming Cheung

**Abstract:**
Standard SVM training has *O(m^3)* time and *O(m^2)* space complexities,
where *m* is the training set size. It is thus computationally infeasible
on very large data sets.
By observing that
practical SVM implementations only *approximate* the optimal solution by
an iterative strategy,
we scale up kernel methods by exploiting such "approximateness" in this
paper.
We first show that many kernel methods can be equivalently formulated as
minimum enclosing ball (MEB)
problems in computational geometry. Then, by adopting an efficient approximate
MEB
algorithm, we
obtain provably approximately optimal solutions with the idea of core-sets.
Our proposed Core Vector Machine
(CVM) algorithm can be used with nonlinear kernels and has a time complexity
that is *linear* in *m* and a space
complexity that is *independent* of *m*.
Experiments on large toy and real-world data sets demonstrate that the CVM is
as accurate as existing SVM implementations, but is
much faster and can handle much larger data sets than
existing scale-up methods.
For example,
CVM with the Gaussian kernel produces superior results on the KDDCUP-99
intrusion detection data,
which has about five million training patterns,
in only 1.4 seconds
on a 3.2GHz Pentium-4 PC.
*Journal of Machine Learning Research*, 6(Apr): 363--392, 2005.

Pdf:
http://www.cs.ust.hk/~jamesk/papers/jmlr05.pdf

Back to James Kwok's home page.