Exploiting Distributional Temporal Difference Learning to Deal with Tail Risk

In traditional Reinforcement Learning (RL), agents learn to optimize actions in a dynamic context based on recursive estimation of expected values. We show that this form of machine learning fails when rewards (returns) are affected by tail risk, i.e., leptokurtosis. Here, we adapt a recent extensio...

Full description

Bibliographic Details
Main Authors: Peter Bossaerts, Shijie Huang, Nitin Yadav
Format: Article
Language:English
Published: MDPI AG 2020-10-01
Series:Risks
Subjects:
Online Access:https://www.mdpi.com/2227-9091/8/4/113