Uncertainty in the Era of Deep Learning
Stephan Mandt, University of California, Irvine
In my talk, I will discuss the relevance of uncertainty in deep learning and the broader field of artificial intelligence. I will provide an introduction to the core concepts of artificial neural networks and deep learning techniques before discussing drawbacks of these approaches and solutions based on modeling uncertainty.
Artificial neural networks and deep learning techniques have become the state-of-the-art approach to analyzing and modeling diverse types of data, such as video, image, acoustic data and text. However, the traditional deep learning approach has several significant drawbacks in the following areas:
(1) Data Ineﬃciency. Deep learning relies on large-scale data sets. In recent years, technology companies have successfully trained models on billions of data points. However, not all areas of interest provide such large data sets, leading the deep learning algorithms to fail.
(2) Overconfidence. In many situations, neural networks are too confident, i.e. they often make unreliable predictions in terms of probability.
(3) Lack of interpretability. While neural networks are powerful predictors, they basically remain a black-box that does not allow a user to understand the criteria on which the prediction was based.
Therefore, it is desirable to create new generations of machine learning methods that give reliable confidence estimates, are more data eﬃcient, and model data in human-interpretable ways. Overconfidence and lack of interpretability may have dramatic consequences in automated decision making scenarios such as autonomous driving, predictive analytics in healthcare, or law enforcement.
One solution to these issues in the form of a complementary approach to deep learning based on another machine learning paradigm, namely on probabilistic models. Probabilistic models describe data in terms of a complex probability distribution, where the data points are modeled as draws from this distribution. Statistical reasoning about the unobserved aspects of this random process allows practitioners to find interpretable patterns in the data and make predictions with reliable confidence. Probabilistic models have complementary strengths and weaknesses relative to deep learning models, suggesting that both approaches should be combined.
The talk will present advances in synthesizing these complementary approaches and focus on two diﬀerent ideas: framing neural networks as probabilistic models, and integrating deep neural networks into a probabilistic model. In particular, I will present two examples for such hybrid models developed by my group. First, I will propose a model that learns to identify semantic changes of individual words over centuries of historical text data. This so-called dynamic word embedding model was trained by us on millions of Google Books and allows us to understand how language evolves over time. The model is an example of so-called unsupervised learning approach, which means that it learns to find interpretable patterns in the data without being taught explicitly what to look for. Second, I will present a model that predicts the future evolution of a video based on the video’s beginning while learning to tell apart static and dynamic aspects in the video. I will show that such models can yield a new generation of video compression algorithms that have the potential to outperform established video codecs.