Deep learning and unsupervised feature learning offer the potential to transform many domains such as vision, speech, and NLP. However, these methods have been fundamentally limited by our computational abilities, and typically applied to small-sized problems. In this talk, I describe the key ideas that enabled scaling deep learning algorithms to train a very large model on a cluster of 16,000 CPU cores (2000 machines). This network has 1.15 billion parameters, which is more than 100x larger than the next largest network reported in the literature.
Such network, when applied at the huge scale, is able to learn abstract concepts in a much more general manner than previously demonstrated. Specifically, we find that by training on 10 million unlabeled images, the network produces features that are very selective for high-level concepts such as human faces and cats. Using these features, we also obtain significant leaps in recognition performance on several large-scale computer vision tasks.