Learning long-term filter banks for audio source separation and audio scene classification

Abstract ■■■ Filter banks on short-time Fourier transform (STFT) spectrogram have long been studied to analyze and process audios. The frameshift in STFT procedure determines the temporal resolution. However, in many discriminative audio applications, long-term time and frequency correlations are ne...

Full description

Bibliographic Details
Main Authors: Teng Zhang, Ji Wu
Format: Article
Language:English
Published: SpringerOpen 2018-05-01
Series:EURASIP Journal on Audio, Speech, and Music Processing
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13636-018-0127-7
Description
Summary:Abstract ■■■ Filter banks on short-time Fourier transform (STFT) spectrogram have long been studied to analyze and process audios. The frameshift in STFT procedure determines the temporal resolution. However, in many discriminative audio applications, long-term time and frequency correlations are needed. The authors in this work use Toeplitz matrix motivated filter banks to extract long-term time and frequency information. This paper investigates the mechanism of long-term filter banks and the corresponding spectrogram reconstruction method. The time duration and shape of the filter banks are well designed and learned using neural networks. We test our approach on different tasks. The spectrogram reconstruction error in audio source separation task is reduced by relatively 6.7% and the classification error in audio scene classification task is reduced by relatively 6.5%, when compared with the traditional frequency filter banks. The experiments also show that the time duration of long-term filter banks in classification task is much larger than in reconstruction task.
ISSN:1687-4722