Auditory Attention Detection via Cross-Modal Attention

Humans show a remarkable perceptual ability to select the speech stream of interest among multiple competing speakers. Previous studies demonstrated that auditory attention detection (AAD) can infer which speaker is attended by analyzing a listener's electroencephalography (EEG) activities. How...

Full description

Bibliographic Details
Main Authors:	Siqi Cai, Peiwen Li, Enze Su, Longhan Xie
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2021-07-01
Series:	Frontiers in Neuroscience
Subjects:	auditory attention attention mechanism cocktail party cross-modal EEG
Online Access:	https://www.frontiersin.org/articles/10.3389/fnins.2021.652058/full

id	doaj-a198a0b39a4c47fbb5b1824d49f50c2c
record_format	Article
spelling	doaj-a198a0b39a4c47fbb5b1824d49f50c2c2021-07-21T08:24:38ZengFrontiers Media S.A.Frontiers in Neuroscience1662-453X2021-07-011510.3389/fnins.2021.652058652058Auditory Attention Detection via Cross-Modal AttentionSiqi CaiPeiwen LiEnze SuLonghan XieHumans show a remarkable perceptual ability to select the speech stream of interest among multiple competing speakers. Previous studies demonstrated that auditory attention detection (AAD) can infer which speaker is attended by analyzing a listener's electroencephalography (EEG) activities. However, previous AAD approaches perform poorly on short signal segments, more advanced decoding strategies are needed to realize robust real-time AAD. In this study, we propose a novel approach, i.e., cross-modal attention-based AAD (CMAA), to exploit the discriminative features and the correlation between audio and EEG signals. With this mechanism, we hope to dynamically adapt the interactions and fuse cross-modal information by directly attending to audio and EEG features, thereby detecting the auditory attention activities manifested in brain signals. We also validate the CMAA model through data visualization and comprehensive experiments on a publicly available database. Experiments show that the CMAA achieves accuracy values of 82.8, 86.4, and 87.6% for 1-, 2-, and 5-s decision windows under anechoic conditions, respectively; for a 2-s decision window, it achieves an average of 84.1% under real-world reverberant conditions. The proposed CMAA network not only achieves better performance than the conventional linear model, but also outperforms the state-of-the-art non-linear approaches. These results and data visualization suggest that the CMAA model can dynamically adapt the interactions and fuse cross-modal information by directly attending to audio and EEG features in order to improve the AAD performance.https://www.frontiersin.org/articles/10.3389/fnins.2021.652058/fullauditory attentionattention mechanismcocktail partycross-modalEEG
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Siqi Cai Peiwen Li Enze Su Longhan Xie
spellingShingle	Siqi Cai Peiwen Li Enze Su Longhan Xie Auditory Attention Detection via Cross-Modal Attention Frontiers in Neuroscience auditory attention attention mechanism cocktail party cross-modal EEG
author_facet	Siqi Cai Peiwen Li Enze Su Longhan Xie
author_sort	Siqi Cai
title	Auditory Attention Detection via Cross-Modal Attention
title_short	Auditory Attention Detection via Cross-Modal Attention
title_full	Auditory Attention Detection via Cross-Modal Attention
title_fullStr	Auditory Attention Detection via Cross-Modal Attention
title_full_unstemmed	Auditory Attention Detection via Cross-Modal Attention
title_sort	auditory attention detection via cross-modal attention
publisher	Frontiers Media S.A.
series	Frontiers in Neuroscience
issn	1662-453X
publishDate	2021-07-01
description	Humans show a remarkable perceptual ability to select the speech stream of interest among multiple competing speakers. Previous studies demonstrated that auditory attention detection (AAD) can infer which speaker is attended by analyzing a listener's electroencephalography (EEG) activities. However, previous AAD approaches perform poorly on short signal segments, more advanced decoding strategies are needed to realize robust real-time AAD. In this study, we propose a novel approach, i.e., cross-modal attention-based AAD (CMAA), to exploit the discriminative features and the correlation between audio and EEG signals. With this mechanism, we hope to dynamically adapt the interactions and fuse cross-modal information by directly attending to audio and EEG features, thereby detecting the auditory attention activities manifested in brain signals. We also validate the CMAA model through data visualization and comprehensive experiments on a publicly available database. Experiments show that the CMAA achieves accuracy values of 82.8, 86.4, and 87.6% for 1-, 2-, and 5-s decision windows under anechoic conditions, respectively; for a 2-s decision window, it achieves an average of 84.1% under real-world reverberant conditions. The proposed CMAA network not only achieves better performance than the conventional linear model, but also outperforms the state-of-the-art non-linear approaches. These results and data visualization suggest that the CMAA model can dynamically adapt the interactions and fuse cross-modal information by directly attending to audio and EEG features in order to improve the AAD performance.
topic	auditory attention attention mechanism cocktail party cross-modal EEG
url	https://www.frontiersin.org/articles/10.3389/fnins.2021.652058/full
work_keys_str_mv	AT siqicai auditoryattentiondetectionviacrossmodalattention AT peiwenli auditoryattentiondetectionviacrossmodalattention AT enzesu auditoryattentiondetectionviacrossmodalattention AT longhanxie auditoryattentiondetectionviacrossmodalattention
_version_	1721292980649721856

Auditory Attention Detection via Cross-Modal Attention

Similar Items