Title: On Recent Developments in Learning with Agnostic Noise

Speaker:  Elad Verbin
ITCS, Tsinghua University 


Date: Wednesday, Feb 20, 2008
Time 11:00-12:00PM
Venue: Room 3464, HKUST

Abstract:
 
In this talk I will give an overview of some exciting recent
developments in learning boolean functions with noise. In normal
non-noisy learning, the goal is to learn a function, given samples of
the function at randomly-selected points. In noisy learning, the
problem is made more difficult, by allowing a fraction of the examples
to be wrong. If the wrong examples are chosen at random, then the
noise is called "random", while if the examples are corrupted by a
malicious adversary, the noise is called "agnostic". The subject of
computation with noisy input has been ubiquitous in computer science,
in topics such as coding theory and approximation algorithms, as well
as practical topics such as computational biology and vision. Learning
with noise has been considered extremely difficult (especially in the
agnostic setting), but in recent years we've seen a large influx of
positive results (algorithms), as well as some negative results
(hardness results). I will describe some of these. I will also outline
connections to coding theory and computational complexity.

Here is an example of one of the problems that I will discuss in the
talk, the problem of learning parity functions with noise: In the
problem of learning parity (without noise), one gets samples of the
form (x,f(x)), where x is an n-bit string chosen uniformly at random,
and f is the XOR function on a subset of the bits. The function f is
not known to the algorithm. The algorithm can request as many of these
samples as it wants (but cannot specify the x's. They are picked at
random). The goal is to return f with high probability.

Learning parity without noise is easy to do in time O(n3), using O(n)
samples, by using Gaussian elimination. Now, suppose that 1% noise is
introduced, so that in each sample, f(x) is returned with probability
0.99, and 1-f(x) is returned with probability 0.01. In this case, the
problem becomes extremely difficult. The faster algorithm known is due
to Blum, Kalai and Wasserman, and runs in time $2^{O(n/logn)}$. The
problem is believed to be hard (but even defining what hardness means
here is non-trivial).

In discussing this problem, I will describe some possible consequences
of parity with noise being easy or hard. Each way would be good:
Easiness would give us a holy hammer for use in learning theory, while
hardness would imply simple and efficient cryptographic primitives,
among other consequences.

Some parts of this talk are based on joint work with Adam Kalai and
Yishay Mansour. In the talk I will not assume any prior knowledge on
computational learning theory.