Exploiting Distributional Temporal Difference Learning to Deal with Tail Risk
In traditional Reinforcement Learning (RL), agents learn to optimize actions in a dynamic context based on recursive estimation of expected values. We show that this form of machine learning fails when rewards (returns) are affected by tail risk, i.e., leptokurtosis. Here, we adapt a recent extensio...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-10-01
|
Series: | Risks |
Subjects: | |
Online Access: | https://www.mdpi.com/2227-9091/8/4/113 |