Audio–Visual Speech Recognition Based on Dual Cross-Modality Attentions with the Transformer Model
Since attention mechanism was introduced in neural machine translation, attention has been combined with the long short-term memory (LSTM) or replaced the LSTM in a transformer model to overcome the sequence-to-sequence (seq2seq) problems with the LSTM. In contrast to the neural machine translation,...
| Published in: | Applied Sciences |
|---|---|
| Main Authors: | , , , , |
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2020-10-01
|
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/10/20/7263 |
