Minimizing Regret in Combinatorial Bandits and Reinforcement Learning

This thesis investigates sequential decision making tasks that fall in the framework of reinforcement learning (RL). These tasks involve a decision maker repeatedly interacting with an environment modeled by an unknown finite Markov decision process (MDP), who wishes to maximize a notion of reward a...

Full description

Bibliographic Details
Main Author:	Talebi Mazraeh Shahi, Mohammad Sadegh
Format:	Doctoral Thesis
Language:	English
Published:	KTH, Reglerteknik 2017
Subjects:	Multi-armed Bandits Reinforcement Learning Regret Minimization Statistics Engineering and Technology Teknik och teknologier
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-219970 http://nbn-resolving.de/urn:isbn:978-91-7729-618-8

Internet

http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-219970
http://nbn-resolving.de/urn:isbn:978-91-7729-618-8

Minimizing Regret in Combinatorial Bandits and Reinforcement Learning

Internet

Similar Items