Optimal learning: Computational procedures for Bayes -adaptive Markov decision processes

This dissertation considers a particular aspect of sequential decision making under uncertainty in which, at each stage, a decision-making agent operating in an uncertain world takes an action that elicits a reinforcement signal and causes the state of the world to change. Optimal learning is a patt...

Full description

Bibliographic Details
Main Author:	Duff, Michael O'Gordon
Language:	ENG
Published:	ScholarWorks@UMass Amherst 2002
Subjects:	Computer science\|Operations research\|Statistics\|Artificial intelligence
Online Access:	https://scholarworks.umass.edu/dissertations/AAI3039353

id	ndltd-UMASS-oai-scholarworks.umass.edu-dissertations-3616
record_format	oai_dc
spelling	ndltd-UMASS-oai-scholarworks.umass.edu-dissertations-36162020-12-02T14:26:14Z Optimal learning: Computational procedures for Bayes -adaptive Markov decision processes Duff, Michael O'Gordon This dissertation considers a particular aspect of sequential decision making under uncertainty in which, at each stage, a decision-making agent operating in an uncertain world takes an action that elicits a reinforcement signal and causes the state of the world to change. Optimal learning is a pattern of behavior that yields the highest expected total reward over the entire duration of an agent's interaction with its uncertain world. The problem of determining an optimal learning strategy is a sort of meta-problem, with optimality defined with respect to a distribution of environments that the agent is likely to encounter. Given this prior uncertainty over possible environments, the optimal-learning agent must collect and use information in an intelligent way, balancing greedy exploitation of certainty-equivalent world models with exploratory actions aimed at discerning the true state of nature. My approach to approximating optimal learning strategies retains the full model of the sequential decision process that, in incorporating a Bayesian model for evolving uncertainty about unknown process parameters, takes the form of a Markov decision process defined over a set of “hyperstates” whose cardinality grows exponentially with the planning horizon. I develop computational procedures that retain the full Bayesian formulation, but sidestep intractability by utilizing techniques from reinforcement learning theory (specifically, Monte-Carlo simulation and the adoption of parameterized function approximators). By pursuing an approach that is grounded in a complete Bayesian world model, I develop algorithms that produce policies that exhibit performance gains over simple heuristics. Moreover, in contrast to many heuristics, the justification or legitimacy of the policies follows directly from the fact that they are clearly motivated by a complete characterization of the underlying decision problem to be solved. This dissertation's contributions include a reinforcement learning algorithm for estimating Gittins indices for multi-armed bandit problems, a Monte-Carlo gradient-based algorithm for approximating solutions to general problems of optimal learning, a gradient-based scheme for improving optimal learning policies instantiated as finite-state stochastic automata, and an investigation of diffusion processes as analytical models for evolving uncertainty. 2002-01-01T08:00:00Z text https://scholarworks.umass.edu/dissertations/AAI3039353 Doctoral Dissertations Available from Proquest ENG ScholarWorks@UMass Amherst Computer science\|Operations research\|Statistics\|Artificial intelligence
collection	NDLTD
language	ENG
sources	NDLTD
topic	Computer science\|Operations research\|Statistics\|Artificial intelligence
spellingShingle	Computer science\|Operations research\|Statistics\|Artificial intelligence Duff, Michael O'Gordon Optimal learning: Computational procedures for Bayes -adaptive Markov decision processes
description	This dissertation considers a particular aspect of sequential decision making under uncertainty in which, at each stage, a decision-making agent operating in an uncertain world takes an action that elicits a reinforcement signal and causes the state of the world to change. Optimal learning is a pattern of behavior that yields the highest expected total reward over the entire duration of an agent's interaction with its uncertain world. The problem of determining an optimal learning strategy is a sort of meta-problem, with optimality defined with respect to a distribution of environments that the agent is likely to encounter. Given this prior uncertainty over possible environments, the optimal-learning agent must collect and use information in an intelligent way, balancing greedy exploitation of certainty-equivalent world models with exploratory actions aimed at discerning the true state of nature. My approach to approximating optimal learning strategies retains the full model of the sequential decision process that, in incorporating a Bayesian model for evolving uncertainty about unknown process parameters, takes the form of a Markov decision process defined over a set of “hyperstates” whose cardinality grows exponentially with the planning horizon. I develop computational procedures that retain the full Bayesian formulation, but sidestep intractability by utilizing techniques from reinforcement learning theory (specifically, Monte-Carlo simulation and the adoption of parameterized function approximators). By pursuing an approach that is grounded in a complete Bayesian world model, I develop algorithms that produce policies that exhibit performance gains over simple heuristics. Moreover, in contrast to many heuristics, the justification or legitimacy of the policies follows directly from the fact that they are clearly motivated by a complete characterization of the underlying decision problem to be solved. This dissertation's contributions include a reinforcement learning algorithm for estimating Gittins indices for multi-armed bandit problems, a Monte-Carlo gradient-based algorithm for approximating solutions to general problems of optimal learning, a gradient-based scheme for improving optimal learning policies instantiated as finite-state stochastic automata, and an investigation of diffusion processes as analytical models for evolving uncertainty.
author	Duff, Michael O'Gordon
author_facet	Duff, Michael O'Gordon
author_sort	Duff, Michael O'Gordon
title	Optimal learning: Computational procedures for Bayes -adaptive Markov decision processes
title_short	Optimal learning: Computational procedures for Bayes -adaptive Markov decision processes
title_full	Optimal learning: Computational procedures for Bayes -adaptive Markov decision processes
title_fullStr	Optimal learning: Computational procedures for Bayes -adaptive Markov decision processes
title_full_unstemmed	Optimal learning: Computational procedures for Bayes -adaptive Markov decision processes
title_sort	optimal learning: computational procedures for bayes -adaptive markov decision processes
publisher	ScholarWorks@UMass Amherst
publishDate	2002
url	https://scholarworks.umass.edu/dissertations/AAI3039353
work_keys_str_mv	AT duffmichaelogordon optimallearningcomputationalproceduresforbayesadaptivemarkovdecisionprocesses
_version_	1719363067383382016

Optimal learning: Computational procedures for Bayes -adaptive Markov decision processes

Similar Items