Topics in Online Markov Decision Processes

<p>This dissertation describes sequential decision making problems in non-stationary environments. Online learning algorithms deal with non-stationary environments, but generally there is no notion of a dynamic state to model future impacts of past actions. State-based models are common in sto...

Full description

Bibliographic Details
Main Author:	Guan, Peng
Other Authors:	Willett, Rebecca
Published:	2015
Subjects:	Electrical engineering Mathematics Operations research Game theory Markov Decision Process Online learning Optimization Reinforcement learning Stochastic control
Online Access:	http://hdl.handle.net/10161/11312

id	ndltd-DUKE-oai-dukespace.lib.duke.edu-10161-11312
record_format	oai_dc
spelling	ndltd-DUKE-oai-dukespace.lib.duke.edu-10161-113122016-01-06T03:30:45ZTopics in Online Markov Decision ProcessesGuan, PengElectrical engineeringMathematicsOperations researchGame theoryMarkov Decision ProcessOnline learningOptimizationReinforcement learningStochastic control<p>This dissertation describes sequential decision making problems in non-stationary environments. Online learning algorithms deal with non-stationary environments, but generally there is no notion of a dynamic state to model future impacts of past actions. State-based models are common in stochastic control settings, but well-known frameworks such as Markov decision processes (MDPs) assume a known stationary environment. In recent years, there has been a growing interest in fusing the above two important learning frameworks and considering an MDP setting in which the cost function is allowed to change arbitrarily over time. A number of online MDP algorithms have been designed to work under various assumptions about the dynamics of state transitions so far and provide performance guarantees, i.e. bounds on the regret defined as the performance gap between the total cost incurred by the learner and the total cost of the best available stationary policy that could have been chosen in hindsight. </p><p>However, most of the work in this area has been algorithmic: given a problem, one</p><p>would develop an algorithm almost from scratch and prove the performance guarantees on a case-by-case basis. Moreover, the presence of the state and the assumption of an arbitrarily varying environment complicate both the theoretical analysis and the development of computationally efficient methods. Another potential issue is that, by removing distributional assumptions about the mechanism generating the cost sequences, the existing methods have to consider the worst-case scenario, which may render their solutions too conservative in situations where the environment exhibits some degree of predictability. </p><p>This dissertation contributes several novel techniques to address the above challenges of the online MDP framework and opens up new research directions for online MDPs. </p><p>Our proposed general framework for deriving algorithms in the online MDP setting leads to a unifying view of existing methods and provides a general procedure for constructing new ones. Several new algorithms are developed and analyzed using this framework. We develop convex-analytical algorithms that take advantage of possible regularity of observed sequences, yet maintain the worst case performance guarantees. To further study the convex-analytic methods we applied above, we take a step back to consider the traditional MDP problem and extend the LP approach to MDPs by adding a relative entropy regularization term. A computationally efficient algorithm for this class of MDPs is constructed under mild assumptions on the state transition models. Two-player zero-sum stochastic games are also investigated in this dissertation as an important extension of the online MDP setting. In short, this dissertation provides in-depth analysis of the online MDP problem and answers several important questions in this field.</p>DissertationWillett, RebeccaRaginsky, Maxim2015Dissertationhttp://hdl.handle.net/10161/11312
collection	NDLTD
sources	NDLTD
topic	Electrical engineering Mathematics Operations research Game theory Markov Decision Process Online learning Optimization Reinforcement learning Stochastic control
spellingShingle	Electrical engineering Mathematics Operations research Game theory Markov Decision Process Online learning Optimization Reinforcement learning Stochastic control Guan, Peng Topics in Online Markov Decision Processes
description	<p>This dissertation describes sequential decision making problems in non-stationary environments. Online learning algorithms deal with non-stationary environments, but generally there is no notion of a dynamic state to model future impacts of past actions. State-based models are common in stochastic control settings, but well-known frameworks such as Markov decision processes (MDPs) assume a known stationary environment. In recent years, there has been a growing interest in fusing the above two important learning frameworks and considering an MDP setting in which the cost function is allowed to change arbitrarily over time. A number of online MDP algorithms have been designed to work under various assumptions about the dynamics of state transitions so far and provide performance guarantees, i.e. bounds on the regret defined as the performance gap between the total cost incurred by the learner and the total cost of the best available stationary policy that could have been chosen in hindsight. </p><p>However, most of the work in this area has been algorithmic: given a problem, one</p><p>would develop an algorithm almost from scratch and prove the performance guarantees on a case-by-case basis. Moreover, the presence of the state and the assumption of an arbitrarily varying environment complicate both the theoretical analysis and the development of computationally efficient methods. Another potential issue is that, by removing distributional assumptions about the mechanism generating the cost sequences, the existing methods have to consider the worst-case scenario, which may render their solutions too conservative in situations where the environment exhibits some degree of predictability. </p><p>This dissertation contributes several novel techniques to address the above challenges of the online MDP framework and opens up new research directions for online MDPs. </p><p>Our proposed general framework for deriving algorithms in the online MDP setting leads to a unifying view of existing methods and provides a general procedure for constructing new ones. Several new algorithms are developed and analyzed using this framework. We develop convex-analytical algorithms that take advantage of possible regularity of observed sequences, yet maintain the worst case performance guarantees. To further study the convex-analytic methods we applied above, we take a step back to consider the traditional MDP problem and extend the LP approach to MDPs by adding a relative entropy regularization term. A computationally efficient algorithm for this class of MDPs is constructed under mild assumptions on the state transition models. Two-player zero-sum stochastic games are also investigated in this dissertation as an important extension of the online MDP setting. In short, this dissertation provides in-depth analysis of the online MDP problem and answers several important questions in this field.</p> === Dissertation
author2	Willett, Rebecca
author_facet	Willett, Rebecca Guan, Peng
author	Guan, Peng
author_sort	Guan, Peng
title	Topics in Online Markov Decision Processes
title_short	Topics in Online Markov Decision Processes
title_full	Topics in Online Markov Decision Processes
title_fullStr	Topics in Online Markov Decision Processes
title_full_unstemmed	Topics in Online Markov Decision Processes
title_sort	topics in online markov decision processes
publishDate	2015
url	http://hdl.handle.net/10161/11312
work_keys_str_mv	AT guanpeng topicsinonlinemarkovdecisionprocesses
_version_	1718160404308295680

Topics in Online Markov Decision Processes

Similar Items