Metrics for Markov decision processes
We present a class of metrics, defined on the state space of a finite Markov decision process (MDP), each of which is sound with respect to stochastic bisimulation, a notion of MDP state equivalence derived from the theory of concurrent processes. Such metrics are based on similar metrics develop...
Main Author: | |
---|---|
Other Authors: | |
Format: | Others |
Language: | en |
Published: |
McGill University
2003
|
Subjects: | |
Online Access: | http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=80263 |
id |
ndltd-LACETR-oai-collectionscanada.gc.ca-QMM.80263 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-LACETR-oai-collectionscanada.gc.ca-QMM.802632014-02-13T03:52:28ZMetrics for Markov decision processesFerns, Norman FrancisComputer Science.We present a class of metrics, defined on the state space of a finite Markov decision process (MDP), each of which is sound with respect to stochastic bisimulation, a notion of MDP state equivalence derived from the theory of concurrent processes. Such metrics are based on similar metrics developed in the context of labelled Markov processes, and like those, are suitable for state space aggregation. Furthermore, we restrict our attention to a subset of this class that is appropriate for certain reinforcement learning (RL) tasks, specifically, infinite horizon tasks with an expected total discounted reward optimality criterion. Given such an RL metric, we provide bounds relating it to the optimal value function of the original MDP as well as to the value function of the aggregate MDP. Finally, we present an algorithm for calculating such a metric up to a prescribed degree of accuracy and some empirical results.McGill UniversityPanangaden, Prakash (advisor)Precup, Doina (advisor)2003Electronic Thesis or Dissertationapplication/pdfenalephsysno: 002151919proquestno: AAIMQ98632Theses scanned by UMI/ProQuest.All items in eScholarship@McGill are protected by copyright with all rights reserved unless otherwise indicated.Master of Science (School of Computer Science.) http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=80263 |
collection |
NDLTD |
language |
en |
format |
Others
|
sources |
NDLTD |
topic |
Computer Science. |
spellingShingle |
Computer Science. Ferns, Norman Francis Metrics for Markov decision processes |
description |
We present a class of metrics, defined on the state space of a finite Markov decision process (MDP), each of which is sound with respect to stochastic bisimulation, a notion of MDP state equivalence derived from the theory of concurrent processes. Such metrics are based on similar metrics developed in the context of labelled Markov processes, and like those, are suitable for state space aggregation. Furthermore, we restrict our attention to a subset of this class that is appropriate for certain reinforcement learning (RL) tasks, specifically, infinite horizon tasks with an expected total discounted reward optimality criterion. Given such an RL metric, we provide bounds relating it to the optimal value function of the original MDP as well as to the value function of the aggregate MDP. Finally, we present an algorithm for calculating such a metric up to a prescribed degree of accuracy and some empirical results. |
author2 |
Panangaden, Prakash (advisor) |
author_facet |
Panangaden, Prakash (advisor) Ferns, Norman Francis |
author |
Ferns, Norman Francis |
author_sort |
Ferns, Norman Francis |
title |
Metrics for Markov decision processes |
title_short |
Metrics for Markov decision processes |
title_full |
Metrics for Markov decision processes |
title_fullStr |
Metrics for Markov decision processes |
title_full_unstemmed |
Metrics for Markov decision processes |
title_sort |
metrics for markov decision processes |
publisher |
McGill University |
publishDate |
2003 |
url |
http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=80263 |
work_keys_str_mv |
AT fernsnormanfrancis metricsformarkovdecisionprocesses |
_version_ |
1716640499460210688 |