Marcus Hutter, Canberra, Australia. General purpose intelligent learning agents cycle through (complex,non-MDP) sequences of observations, actions, and rewards. On the other hand, reinforcement learning is well- developed for small finite state Markov Decision Processes (MDPs). So far it is an art performed by human designers to extract the right state representation out of the bare observa- tions, i.e. to reduce the agent setup to the MDP framework. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main con- tribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm.

Loading more stuff…

Hmm…it looks like things are taking a while to load. Try again?

Loading videos…