Fast Two-Stage Computation of an Index Policy for Multi-Armed Bandits with Setup Delays

We consider the multi-armed bandit problem with penalties for switching that include setup delays and costs, extending the former results of the author for the special case with no switching delays. A priority index for projects with setup delays that characterizes, in part, optimal policies was int...

Full description

Bibliographic Details
Published in:Mathematics
Main Author: José Niño-Mora
Format: Article
Language:English
Published: MDPI AG 2020-12-01
Subjects:
Online Access:https://www.mdpi.com/2227-7390/9/1/52