Minimizing Regret in Combinatorial Bandits and Reinforcement Learning
This thesis investigates sequential decision making tasks that fall in the framework of reinforcement learning (RL). These tasks involve a decision maker repeatedly interacting with an environment modeled by an unknown finite Markov decision process (MDP), who wishes to maximize a notion of reward a...
Main Author: | |
---|---|
Format: | Doctoral Thesis |
Language: | English |
Published: |
KTH, Reglerteknik
2017
|
Subjects: | |
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-219970 http://nbn-resolving.de/urn:isbn:978-91-7729-618-8 |