Variational Reward Estimator Bottleneck: Towards Robust Reward Estimator for Multidomain Task-Oriented Dialogue

Despite its significant effectiveness in adversarial training approaches to multidomain task-oriented dialogue systems, adversarial inverse reinforcement learning of the dialogue policy frequently fails to balance the performance of the reward estimator and policy generator. During the optimization...

Full description

Bibliographic Details
Main Authors: Jeiyoon Park, Chanhee Lee, Chanjun Park, Kuekyeng Kim, Heuiseok Lim
Format: Article
Language:English
Published: MDPI AG 2021-07-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/11/14/6624