Audio–Visual Speech Recognition Based on Dual Cross-Modality Attentions with the Transformer Model

Since attention mechanism was introduced in neural machine translation, attention has been combined with the long short-term memory (LSTM) or replaced the LSTM in a transformer model to overcome the sequence-to-sequence (seq2seq) problems with the LSTM. In contrast to the neural machine translation,...

Full description

Bibliographic Details
Published in:	Applied Sciences
Main Authors:	Yong-Hyeok Lee, Dong-Won Jang, Jae-Bin Kim, Rae-Hong Park, Hyung-Min Park
Format:	Article
Language:	English
Published:	MDPI AG 2020-10-01
Subjects:	audio–visual recognition attention cross-modality alignment dual cross-modality attention hybrid CTC/attention transformer
Online Access:	https://www.mdpi.com/2076-3417/10/20/7263

Internet

https://www.mdpi.com/2076-3417/10/20/7263

Audio–Visual Speech Recognition Based on Dual Cross-Modality Attentions with the Transformer Model

Internet

Similar Items