Metrics for Markov decision processes

We present a class of metrics, defined on the state space of a finite Markov decision process (MDP), each of which is sound with respect to stochastic bisimulation, a notion of MDP state equivalence derived from the theory of concurrent processes. Such metrics are based on similar metrics develop...

Full description

Bibliographic Details
Main Author:	Ferns, Norman Francis
Other Authors:	Panangaden, Prakash (advisor)
Format:	Others
Language:	en
Published:	McGill University 2003
Subjects:	Computer Science.
Online Access:	http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=80263

id	ndltd-LACETR-oai-collectionscanada.gc.ca-QMM.80263
record_format	oai_dc
spelling	ndltd-LACETR-oai-collectionscanada.gc.ca-QMM.802632014-02-13T03:52:28ZMetrics for Markov decision processesFerns, Norman FrancisComputer Science.We present a class of metrics, defined on the state space of a finite Markov decision process (MDP), each of which is sound with respect to stochastic bisimulation, a notion of MDP state equivalence derived from the theory of concurrent processes. Such metrics are based on similar metrics developed in the context of labelled Markov processes, and like those, are suitable for state space aggregation. Furthermore, we restrict our attention to a subset of this class that is appropriate for certain reinforcement learning (RL) tasks, specifically, infinite horizon tasks with an expected total discounted reward optimality criterion. Given such an RL metric, we provide bounds relating it to the optimal value function of the original MDP as well as to the value function of the aggregate MDP. Finally, we present an algorithm for calculating such a metric up to a prescribed degree of accuracy and some empirical results.McGill UniversityPanangaden, Prakash (advisor)Precup, Doina (advisor)2003Electronic Thesis or Dissertationapplication/pdfenalephsysno: 002151919proquestno: AAIMQ98632Theses scanned by UMI/ProQuest.All items in eScholarship@McGill are protected by copyright with all rights reserved unless otherwise indicated.Master of Science (School of Computer Science.) http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=80263
collection	NDLTD
language	en
format	Others
sources	NDLTD
topic	Computer Science.
spellingShingle	Computer Science. Ferns, Norman Francis Metrics for Markov decision processes
description	We present a class of metrics, defined on the state space of a finite Markov decision process (MDP), each of which is sound with respect to stochastic bisimulation, a notion of MDP state equivalence derived from the theory of concurrent processes. Such metrics are based on similar metrics developed in the context of labelled Markov processes, and like those, are suitable for state space aggregation. Furthermore, we restrict our attention to a subset of this class that is appropriate for certain reinforcement learning (RL) tasks, specifically, infinite horizon tasks with an expected total discounted reward optimality criterion. Given such an RL metric, we provide bounds relating it to the optimal value function of the original MDP as well as to the value function of the aggregate MDP. Finally, we present an algorithm for calculating such a metric up to a prescribed degree of accuracy and some empirical results.
author2	Panangaden, Prakash (advisor)
author_facet	Panangaden, Prakash (advisor) Ferns, Norman Francis
author	Ferns, Norman Francis
author_sort	Ferns, Norman Francis
title	Metrics for Markov decision processes
title_short	Metrics for Markov decision processes
title_full	Metrics for Markov decision processes
title_fullStr	Metrics for Markov decision processes
title_full_unstemmed	Metrics for Markov decision processes
title_sort	metrics for markov decision processes
publisher	McGill University
publishDate	2003
url	http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=80263
work_keys_str_mv	AT fernsnormanfrancis metricsformarkovdecisionprocesses
_version_	1716640499460210688

Metrics for Markov decision processes

Similar Items