PROBABILISTIC MODELS FOR DENSITY ESTIMATION, STRUCTURAL DISCOVERY AND SEMI-SUPERVISED LEARNING FROM VIDEO
Kevin Murphy (University of British Columbia)
Monday, January 10, 2011
In this talk, I give an overview of three different projects which my group is working on, all of which involve probabilistic modeling of one form or another.
The first project is concerned with creating efficient density models for data of mixed type (categorical, real, ordinal, etc.), in the presence of missing data. Such data is particularly common in social science surveys, but arises in many other areas of data analysis, too. Our approach is based on mixtures of factor analysers, extended to handle exponential family response variables. Although this model is not new, we propose a new variational bound for the multinomial likelihood function, which allows us to efficiently fit the model to real and categorical data using variational EM. We show that this outperforms previous approaches (in terms of accuracy per unit of CPU time) based on MAP estimation, and a Bayesian approach using Hamiltonian MCMC. (For details, see "Variational bounds for mixed-data factor analysis", M. Khan, B. Marlin, G. Bouchard, K. Murphy, NIPS 2010.)
The second project is also concerned with density modeling, but uses sparse graphical models rather than low-rank decompositions. The advantage of such models is that they are often more interpretable. We have made several contributions in this area, but in this talk I will focus on one in which we extended the graphical lasso algorithm for learning the structure of Gaussian graphical models to handle unknown grouping of the variables. The technique is based on a new variational bound to the group Laplace distribution. This can be used in a variational EM algorithm, where the E step estimates the group assignments, and the M step uses a standard convex solver using the known groups. (For details, see "Group sparse priors for covariance estimation", B. Marlin, M. Schmidt, K. Murphy, UAI 2009.) I will also discuss some work we are currently doing to extend this approach to learn sparse _latent_ GGMs from categorical data, using the variational bound on the multinomial likelihood mentioned above.
Finally, I will briefly present some work in progress in which we use a graphical model to align play-by-play text data from sports videos (basketball, ice hockey, etc) to detections of players, which are tracked over time. This allows us to learn the identity of the players without any explicit supervision.
Kevin Murphy is an associate professor at the University of British Columbia in Vancouver, Canada, in the departments of computer science and statistics, which he joined in 2004. He holds a Canada research chair in machine learning/ computational statistics. He is currently (2010) on sabbatical in the Bay Area. Prior to coming to UBC, Kevin did a postdoc at MIT, his PhD at UC Berkeley, his MSc at U. Pennsylvania, and his BSc at U. Cambridge. Kevin is best known for his work in the area of Bayesian networks/ graphical models. He is interested in model selection in the N