Auditory Attention Detection via Cross-Modal Attention

Humans show a remarkable perceptual ability to select the speech stream of interest among multiple competing speakers. Previous studies demonstrated that auditory attention detection (AAD) can infer which speaker is attended by analyzing a listener's electroencephalography (EEG) activities. How...

Full description

Bibliographic Details
Main Authors: Siqi Cai, Peiwen Li, Enze Su, Longhan Xie
Format: Article
Language:English
Published: Frontiers Media S.A. 2021-07-01
Series:Frontiers in Neuroscience
Subjects:
EEG
Online Access:https://www.frontiersin.org/articles/10.3389/fnins.2021.652058/full
id doaj-a198a0b39a4c47fbb5b1824d49f50c2c
record_format Article
spelling doaj-a198a0b39a4c47fbb5b1824d49f50c2c2021-07-21T08:24:38ZengFrontiers Media S.A.Frontiers in Neuroscience1662-453X2021-07-011510.3389/fnins.2021.652058652058Auditory Attention Detection via Cross-Modal AttentionSiqi CaiPeiwen LiEnze SuLonghan XieHumans show a remarkable perceptual ability to select the speech stream of interest among multiple competing speakers. Previous studies demonstrated that auditory attention detection (AAD) can infer which speaker is attended by analyzing a listener's electroencephalography (EEG) activities. However, previous AAD approaches perform poorly on short signal segments, more advanced decoding strategies are needed to realize robust real-time AAD. In this study, we propose a novel approach, i.e., cross-modal attention-based AAD (CMAA), to exploit the discriminative features and the correlation between audio and EEG signals. With this mechanism, we hope to dynamically adapt the interactions and fuse cross-modal information by directly attending to audio and EEG features, thereby detecting the auditory attention activities manifested in brain signals. We also validate the CMAA model through data visualization and comprehensive experiments on a publicly available database. Experiments show that the CMAA achieves accuracy values of 82.8, 86.4, and 87.6% for 1-, 2-, and 5-s decision windows under anechoic conditions, respectively; for a 2-s decision window, it achieves an average of 84.1% under real-world reverberant conditions. The proposed CMAA network not only achieves better performance than the conventional linear model, but also outperforms the state-of-the-art non-linear approaches. These results and data visualization suggest that the CMAA model can dynamically adapt the interactions and fuse cross-modal information by directly attending to audio and EEG features in order to improve the AAD performance.https://www.frontiersin.org/articles/10.3389/fnins.2021.652058/fullauditory attentionattention mechanismcocktail partycross-modalEEG
collection DOAJ
language English
format Article
sources DOAJ
author Siqi Cai
Peiwen Li
Enze Su
Longhan Xie
spellingShingle Siqi Cai
Peiwen Li
Enze Su
Longhan Xie
Auditory Attention Detection via Cross-Modal Attention
Frontiers in Neuroscience
auditory attention
attention mechanism
cocktail party
cross-modal
EEG
author_facet Siqi Cai
Peiwen Li
Enze Su
Longhan Xie
author_sort Siqi Cai
title Auditory Attention Detection via Cross-Modal Attention
title_short Auditory Attention Detection via Cross-Modal Attention
title_full Auditory Attention Detection via Cross-Modal Attention
title_fullStr Auditory Attention Detection via Cross-Modal Attention
title_full_unstemmed Auditory Attention Detection via Cross-Modal Attention
title_sort auditory attention detection via cross-modal attention
publisher Frontiers Media S.A.
series Frontiers in Neuroscience
issn 1662-453X
publishDate 2021-07-01
description Humans show a remarkable perceptual ability to select the speech stream of interest among multiple competing speakers. Previous studies demonstrated that auditory attention detection (AAD) can infer which speaker is attended by analyzing a listener's electroencephalography (EEG) activities. However, previous AAD approaches perform poorly on short signal segments, more advanced decoding strategies are needed to realize robust real-time AAD. In this study, we propose a novel approach, i.e., cross-modal attention-based AAD (CMAA), to exploit the discriminative features and the correlation between audio and EEG signals. With this mechanism, we hope to dynamically adapt the interactions and fuse cross-modal information by directly attending to audio and EEG features, thereby detecting the auditory attention activities manifested in brain signals. We also validate the CMAA model through data visualization and comprehensive experiments on a publicly available database. Experiments show that the CMAA achieves accuracy values of 82.8, 86.4, and 87.6% for 1-, 2-, and 5-s decision windows under anechoic conditions, respectively; for a 2-s decision window, it achieves an average of 84.1% under real-world reverberant conditions. The proposed CMAA network not only achieves better performance than the conventional linear model, but also outperforms the state-of-the-art non-linear approaches. These results and data visualization suggest that the CMAA model can dynamically adapt the interactions and fuse cross-modal information by directly attending to audio and EEG features in order to improve the AAD performance.
topic auditory attention
attention mechanism
cocktail party
cross-modal
EEG
url https://www.frontiersin.org/articles/10.3389/fnins.2021.652058/full
work_keys_str_mv AT siqicai auditoryattentiondetectionviacrossmodalattention
AT peiwenli auditoryattentiondetectionviacrossmodalattention
AT enzesu auditoryattentiondetectionviacrossmodalattention
AT longhanxie auditoryattentiondetectionviacrossmodalattention
_version_ 1721292980649721856