Abstract: Perception problems are hard. Whether it is object detection, pose estimation, or scene understanding, vision systems must deal with tremendous amounts of noise and ambiguity. Unfortunately, idealized probabilistic models for dealing with this uncertainty are typically computationally intractable. This leads to a major formal divide -- we either a) make performance-limiting assumptions and end up with restricted probabilistic models (e.g. "attractive" pairwise MRFs) that don't work too well; or b) abandon the probabilistic framework in favor of rich feed-forward "pipelines" (pixels --> regions --> labels) that mismanage uncertainty.
In this talk, I will give a high-level sampling of some projects in my lab. As a specific example, we have developed a two-stage image segmentation model where the first stage is a tractable probabilistic model that outputs not just a single-best solution, rather a /diverse/ set of plausible solutions or guesses. The second stage is a discriminative re-ranker that is free to exploit arbitrarily complex features, and attempts to pick out the best solution from this set. This hybrid model has recently achieved state-of-art performance on Pascal VOC 2012 segmentation dataset.
Joint work with Students: Abner Guzman-Rivera (UIUC), Ankit Laddha (VT), Adarsh Prasad (VT/UT-Austin), Qing Sun (VT), Payman Yadollahpour (TTIC); Collaborators: Chris Dyer (CMU), Kevin Gimpel (TTIC), Stefanie Jegelka (UC Berekely), Pushmeet Kohli (MSRC), Greg Shakhnarovich (TTIC), Danny Tarlow (MSRC).
Speaker: (cited from: filebox.ece.vt.edu/~dbatra/files/bio.txt) Dhruv Batra is an Assistant Professor at the Bradley Department of Electrical and Computer Engineering at Virginia Tech, where he leads the VT Machine Learning & Perception group. He is a member of the Virginia Center for Autonomous Systems (VaCAS) and the VT Discovery Analytic Center (DAC).
Prior to joining VT, he was a Research Assistant Professor at Toyota Technological Institute at Chicago (TTIC), a philanthropically endowed academic computer science institute located in the campus of University of Chicago. He received his M.S. and Ph.D. degrees from Carnegie Mellon University in 2007 and 2010 respectively, advised by Tsuhan Chen. In past, he has held visiting positions at the Machine Learning Department at CMU, CSAIL MIT, Microsoft Research Cambridge and Cornell University.
His research interests lie at the intersection of machine learning, computer vision and AI, with a focus on developing scalable algorithms for learning and inference in probabilistic models for holistic scene understanding. He has also worked on other topics such as interactive co-segmentation of large image collections, human body pose estimation, action recognition, depth estimation and distributed optimization for inference and learning in probabilistic graphical models.
He was a recipient of the Carnegie Mellon Dean's Fellowship in 2007, the Google Faculty Research Award in 2013, and the Virginia Tech Teacher of the Week in 2013. His research is supported by NSF, Google, Amazon, and Microsoft.