WaveNet With Cross-Attention for Audiovisual Speech Recognition

In this paper, the WaveNet with cross-attention is proposed for Audio-Visual Automatic Speech Recognition (AV-ASR) to address multimodal feature fusion and frame alignment problems between two data streams. WaveNet is usually used for speech generation and speech recognition, however, in this paper,...

Full description

Bibliographic Details
Main Authors: Hui Wang, Fei Gao, Yue Zhao, Licheng Wu
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9197622/