[ml] paper on unsupervised pre training of deep nets

Sun Jul 17 00:11:09 UTC 2011

A deep network is a neural network with many
hidden layers. Until recently it's been difficult to
train neural nets with multiple hidden layers with
a common technique like backpropagation. The
reason is because the weights of the deep net
aren't initialized to a point that's close to an optimal
solution, so the training falls into a non-optimal local
minimum.

Unsupervised pre-training is a technique where first,
each hidden layer of the deep net is trained individually
to reproduce it's output. Another way of putting this
is that each individual layer of a deep net is pre-trained
to be an auto-encoder:

http://en.wikipedia.org/wiki/Auto-encoder

Anyways, if you pre-train the hidden layers of a deep
net, you can then train the deep net using backpropagation
and get good results.

Here's a paper that shows experimentally why pre-training
works:

http://jmlr.csail.mit.edu/papers/volume11/erhan10a/erhan10a.pdf

The idea is that pre-training puts the weights of the deep
net in a good place that's closer to a global optimum, i.e.
a set of weights that makes the deep net predict well.

  mike