An Efficient Simulation-Based Policy Improvement with Optimal Computing Budget Allocation Based on Accumulated Samples
Markov decision processes (MDPs) are widely used to model stochastic systems to deduce optimal decision-making policies. As the transition probabilities are usually unknown in MDPs, simulation-based policy improvement (SBPI) using a base policy to derive optimal policies when the state transition pr...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI
2022
|
Subjects: | |
Online Access: | View Fulltext in Publisher |