Metrics for Markov decision processes

We present a class of metrics, defined on the state space of a finite Markov decision process (MDP), each of which is sound with respect to stochastic bisimulation, a notion of MDP state equivalence derived from the theory of concurrent processes. Such metrics are based on similar metrics develop...

Full description

Bibliographic Details
Main Author: Ferns, Norman Francis
Other Authors: Panangaden, Prakash (advisor)
Format: Others
Language:en
Published: McGill University 2003
Subjects:
Online Access:http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=80263
Description
Summary:We present a class of metrics, defined on the state space of a finite Markov decision process (MDP), each of which is sound with respect to stochastic bisimulation, a notion of MDP state equivalence derived from the theory of concurrent processes. Such metrics are based on similar metrics developed in the context of labelled Markov processes, and like those, are suitable for state space aggregation. Furthermore, we restrict our attention to a subset of this class that is appropriate for certain reinforcement learning (RL) tasks, specifically, infinite horizon tasks with an expected total discounted reward optimality criterion. Given such an RL metric, we provide bounds relating it to the optimal value function of the original MDP as well as to the value function of the aggregate MDP. Finally, we present an algorithm for calculating such a metric up to a prescribed degree of accuracy and some empirical results.