Animal acoustic identification, denoising and source separation using generative adversarial networks

Abstract Soundscapes contain rich ecological information, offering insights into both biodiversity and ecosystem dynamics. However, the sheer volume of data produced by passive acoustic monitoring presents significant challenges for scalable analysis and ecological interpretation. While convolutiona...

وصف كامل

التفاصيل البيبلوغرافية
الحاوية / القاعدة:Methods in Ecology and Evolution
المؤلفون الرئيسيون: Mei Wang, Kevin F. A. Darras, Renjie Xue, Fanglin Liu
التنسيق: مقال
اللغة:الإنجليزية
منشور في: Wiley 2025-10-01
الموضوعات:
الوصول للمادة أونلاين:https://doi.org/10.1111/2041-210X.70148
_version_ 1848771357560537088
author Mei Wang
Kevin F. A. Darras
Renjie Xue
Fanglin Liu
author_facet Mei Wang
Kevin F. A. Darras
Renjie Xue
Fanglin Liu
author_sort Mei Wang
collection DOAJ
container_title Methods in Ecology and Evolution
description Abstract Soundscapes contain rich ecological information, offering insights into both biodiversity and ecosystem dynamics. However, the sheer volume of data produced by passive acoustic monitoring presents significant challenges for scalable analysis and ecological interpretation. While convolutional neural networks (CNNs) have advanced species classification in bioacoustics, they often struggle with identifying acoustic targets in acoustic space and quantifying soundscapes' characteristics. In this study, we propose a novel spectrogram‐to‐spectrogram translation framework based on generative adversarial networks (GANs) to isolate and quantify acoustic sources within soundscape recordings. Our method is trained on paired spectrogram images: original full‐spectrogram representations and target spectrogram representations containing only the vocalizations of specific sound labels. This design enables the model to learn source‐specific mappings and perform both the species and community‐level separation of acoustic components in soundscape recordings. We developed and evaluated two GAN‐based models: a species‐level GAN targeting eight avian species, and a community‐level GAN distinguishing among avian, insect and anthropogenic sound sources. The models were trained and tested using soundscape recordings collected from the Yaoluoping National Nature Reserve, eastern China. The species‐level model achieved a mean F1 score of 0.76 for pixel‐wise detection, while the community‐level model reached 0.79 across categories. In addition to precise temporal‐spectral localization, our approach captures sources' acoustic occupancy and frequency distribution patterns, offering deeper ecological insight. Compared to baseline CNN classifiers, our model achieved a mean F1 score of 0.97, demonstrating comparable classification performance to ResNet50 (0.95) and VGG16 (0.98) across multiple species. Our GAN approach for extracting sound sources also significantly outperformed conventional methods in denoising and source separation, as indicated by lower image‐level mean squared error. These results demonstrate the utility of GANs in advancing ecoacoustic analyses and biodiversity monitoring. By enabling robust source separation and fine‐resolution signal mapping, the proposed approach contributes a scalable and transferable tool for soundscape quantification.
format Article
id doaj-art-87fac8bb52f84d1498efbb9ea40ef5fc
institution Directory of Open Access Journals
issn 2041-210X
language English
publishDate 2025-10-01
publisher Wiley
record_format Article
spelling doaj-art-87fac8bb52f84d1498efbb9ea40ef5fc2025-10-01T07:23:00ZengWileyMethods in Ecology and Evolution2041-210X2025-10-0116102472248610.1111/2041-210X.70148Animal acoustic identification, denoising and source separation using generative adversarial networksMei Wang0Kevin F. A. Darras1Renjie Xue2Fanglin Liu3Division of Life Sciences and Medicine University of Science and Technology of China Hefei ChinaEFNO, ECODIV, INRAE, Domaine des Barres, Nogent‐sur‐Vernisson Centre‐Val de Loire FranceHefei Institutes of Physical Science, Chinese Academy of Sciences Hefei ChinaDivision of Life Sciences and Medicine University of Science and Technology of China Hefei ChinaAbstract Soundscapes contain rich ecological information, offering insights into both biodiversity and ecosystem dynamics. However, the sheer volume of data produced by passive acoustic monitoring presents significant challenges for scalable analysis and ecological interpretation. While convolutional neural networks (CNNs) have advanced species classification in bioacoustics, they often struggle with identifying acoustic targets in acoustic space and quantifying soundscapes' characteristics. In this study, we propose a novel spectrogram‐to‐spectrogram translation framework based on generative adversarial networks (GANs) to isolate and quantify acoustic sources within soundscape recordings. Our method is trained on paired spectrogram images: original full‐spectrogram representations and target spectrogram representations containing only the vocalizations of specific sound labels. This design enables the model to learn source‐specific mappings and perform both the species and community‐level separation of acoustic components in soundscape recordings. We developed and evaluated two GAN‐based models: a species‐level GAN targeting eight avian species, and a community‐level GAN distinguishing among avian, insect and anthropogenic sound sources. The models were trained and tested using soundscape recordings collected from the Yaoluoping National Nature Reserve, eastern China. The species‐level model achieved a mean F1 score of 0.76 for pixel‐wise detection, while the community‐level model reached 0.79 across categories. In addition to precise temporal‐spectral localization, our approach captures sources' acoustic occupancy and frequency distribution patterns, offering deeper ecological insight. Compared to baseline CNN classifiers, our model achieved a mean F1 score of 0.97, demonstrating comparable classification performance to ResNet50 (0.95) and VGG16 (0.98) across multiple species. Our GAN approach for extracting sound sources also significantly outperformed conventional methods in denoising and source separation, as indicated by lower image‐level mean squared error. These results demonstrate the utility of GANs in advancing ecoacoustic analyses and biodiversity monitoring. By enabling robust source separation and fine‐resolution signal mapping, the proposed approach contributes a scalable and transferable tool for soundscape quantification.https://doi.org/10.1111/2041-210X.70148acoustic spaceclassificationdenoisegenerative adversarial network (GAN)source separation
spellingShingle Mei Wang
Kevin F. A. Darras
Renjie Xue
Fanglin Liu
Animal acoustic identification, denoising and source separation using generative adversarial networks
acoustic space
classification
denoise
generative adversarial network (GAN)
source separation
title Animal acoustic identification, denoising and source separation using generative adversarial networks
title_full Animal acoustic identification, denoising and source separation using generative adversarial networks
title_fullStr Animal acoustic identification, denoising and source separation using generative adversarial networks
title_full_unstemmed Animal acoustic identification, denoising and source separation using generative adversarial networks
title_short Animal acoustic identification, denoising and source separation using generative adversarial networks
title_sort animal acoustic identification denoising and source separation using generative adversarial networks
topic acoustic space
classification
denoise
generative adversarial network (GAN)
source separation
url https://doi.org/10.1111/2041-210X.70148
work_keys_str_mv AT meiwang animalacousticidentificationdenoisingandsourceseparationusinggenerativeadversarialnetworks
AT kevinfadarras animalacousticidentificationdenoisingandsourceseparationusinggenerativeadversarialnetworks
AT renjiexue animalacousticidentificationdenoisingandsourceseparationusinggenerativeadversarialnetworks
AT fanglinliu animalacousticidentificationdenoisingandsourceseparationusinggenerativeadversarialnetworks