Metrics for Markov decision processes

We present a class of metrics, defined on the state space of a finite Markov decision process (MDP), each of which is sound with respect to stochastic bisimulation, a notion of MDP state equivalence derived from the theory of concurrent processes. Such metrics are based on similar metrics develop...

Full description

Bibliographic Details
Main Author: Ferns, Norman Francis
Other Authors: Panangaden, Prakash (advisor)
Format: Others
Language:en
Published: McGill University 2003
Subjects:
Online Access:http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=80263
id ndltd-LACETR-oai-collectionscanada.gc.ca-QMM.80263
record_format oai_dc
spelling ndltd-LACETR-oai-collectionscanada.gc.ca-QMM.802632014-02-13T03:52:28ZMetrics for Markov decision processesFerns, Norman FrancisComputer Science.We present a class of metrics, defined on the state space of a finite Markov decision process (MDP), each of which is sound with respect to stochastic bisimulation, a notion of MDP state equivalence derived from the theory of concurrent processes. Such metrics are based on similar metrics developed in the context of labelled Markov processes, and like those, are suitable for state space aggregation. Furthermore, we restrict our attention to a subset of this class that is appropriate for certain reinforcement learning (RL) tasks, specifically, infinite horizon tasks with an expected total discounted reward optimality criterion. Given such an RL metric, we provide bounds relating it to the optimal value function of the original MDP as well as to the value function of the aggregate MDP. Finally, we present an algorithm for calculating such a metric up to a prescribed degree of accuracy and some empirical results.McGill UniversityPanangaden, Prakash (advisor)Precup, Doina (advisor)2003Electronic Thesis or Dissertationapplication/pdfenalephsysno: 002151919proquestno: AAIMQ98632Theses scanned by UMI/ProQuest.All items in eScholarship@McGill are protected by copyright with all rights reserved unless otherwise indicated.Master of Science (School of Computer Science.) http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=80263
collection NDLTD
language en
format Others
sources NDLTD
topic Computer Science.
spellingShingle Computer Science.
Ferns, Norman Francis
Metrics for Markov decision processes
description We present a class of metrics, defined on the state space of a finite Markov decision process (MDP), each of which is sound with respect to stochastic bisimulation, a notion of MDP state equivalence derived from the theory of concurrent processes. Such metrics are based on similar metrics developed in the context of labelled Markov processes, and like those, are suitable for state space aggregation. Furthermore, we restrict our attention to a subset of this class that is appropriate for certain reinforcement learning (RL) tasks, specifically, infinite horizon tasks with an expected total discounted reward optimality criterion. Given such an RL metric, we provide bounds relating it to the optimal value function of the original MDP as well as to the value function of the aggregate MDP. Finally, we present an algorithm for calculating such a metric up to a prescribed degree of accuracy and some empirical results.
author2 Panangaden, Prakash (advisor)
author_facet Panangaden, Prakash (advisor)
Ferns, Norman Francis
author Ferns, Norman Francis
author_sort Ferns, Norman Francis
title Metrics for Markov decision processes
title_short Metrics for Markov decision processes
title_full Metrics for Markov decision processes
title_fullStr Metrics for Markov decision processes
title_full_unstemmed Metrics for Markov decision processes
title_sort metrics for markov decision processes
publisher McGill University
publishDate 2003
url http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=80263
work_keys_str_mv AT fernsnormanfrancis metricsformarkovdecisionprocesses
_version_ 1716640499460210688