Active learning can be framed as a planning in information space problem: the goal is to learn about the world by taking actions that improve expected performance. In some domains, planning far into the future is prohibitively expensive and the agent is not able to discover effective information-gathering plans. However, by using macro-actions consisting of fixed-length open-loop policies, the policy class considered during planning is explicitly restricted in return for computational gains that allow much deeper-horizon forward search. In a certain subset of domains, it is possible to analytically compute the distribution over posterior beliefs that results from a single macro-action; this distribution captures any observation sequence that could occur during the macro-action, and allows significant additional computational savings. I will show performance on two simulation experiments: a standard exploration domain and a UAV search domain.

Loading more stuff…

Hmm…it looks like things are taking a while to load. Try again?

Loading videos…