Animal acoustic identification, denoising and source separation using generative adversarial networks
Abstract Soundscapes contain rich ecological information, offering insights into both biodiversity and ecosystem dynamics. However, the sheer volume of data produced by passive acoustic monitoring presents significant challenges for scalable analysis and ecological interpretation. While convolutiona...
| الحاوية / القاعدة: | Methods in Ecology and Evolution |
|---|---|
| المؤلفون الرئيسيون: | , , , |
| التنسيق: | مقال |
| اللغة: | الإنجليزية |
| منشور في: |
Wiley
2025-10-01
|
| الموضوعات: | |
| الوصول للمادة أونلاين: | https://doi.org/10.1111/2041-210X.70148 |
| _version_ | 1848771357560537088 |
|---|---|
| author | Mei Wang Kevin F. A. Darras Renjie Xue Fanglin Liu |
| author_facet | Mei Wang Kevin F. A. Darras Renjie Xue Fanglin Liu |
| author_sort | Mei Wang |
| collection | DOAJ |
| container_title | Methods in Ecology and Evolution |
| description | Abstract Soundscapes contain rich ecological information, offering insights into both biodiversity and ecosystem dynamics. However, the sheer volume of data produced by passive acoustic monitoring presents significant challenges for scalable analysis and ecological interpretation. While convolutional neural networks (CNNs) have advanced species classification in bioacoustics, they often struggle with identifying acoustic targets in acoustic space and quantifying soundscapes' characteristics. In this study, we propose a novel spectrogram‐to‐spectrogram translation framework based on generative adversarial networks (GANs) to isolate and quantify acoustic sources within soundscape recordings. Our method is trained on paired spectrogram images: original full‐spectrogram representations and target spectrogram representations containing only the vocalizations of specific sound labels. This design enables the model to learn source‐specific mappings and perform both the species and community‐level separation of acoustic components in soundscape recordings. We developed and evaluated two GAN‐based models: a species‐level GAN targeting eight avian species, and a community‐level GAN distinguishing among avian, insect and anthropogenic sound sources. The models were trained and tested using soundscape recordings collected from the Yaoluoping National Nature Reserve, eastern China. The species‐level model achieved a mean F1 score of 0.76 for pixel‐wise detection, while the community‐level model reached 0.79 across categories. In addition to precise temporal‐spectral localization, our approach captures sources' acoustic occupancy and frequency distribution patterns, offering deeper ecological insight. Compared to baseline CNN classifiers, our model achieved a mean F1 score of 0.97, demonstrating comparable classification performance to ResNet50 (0.95) and VGG16 (0.98) across multiple species. Our GAN approach for extracting sound sources also significantly outperformed conventional methods in denoising and source separation, as indicated by lower image‐level mean squared error. These results demonstrate the utility of GANs in advancing ecoacoustic analyses and biodiversity monitoring. By enabling robust source separation and fine‐resolution signal mapping, the proposed approach contributes a scalable and transferable tool for soundscape quantification. |
| format | Article |
| id | doaj-art-87fac8bb52f84d1498efbb9ea40ef5fc |
| institution | Directory of Open Access Journals |
| issn | 2041-210X |
| language | English |
| publishDate | 2025-10-01 |
| publisher | Wiley |
| record_format | Article |
| spelling | doaj-art-87fac8bb52f84d1498efbb9ea40ef5fc2025-10-01T07:23:00ZengWileyMethods in Ecology and Evolution2041-210X2025-10-0116102472248610.1111/2041-210X.70148Animal acoustic identification, denoising and source separation using generative adversarial networksMei Wang0Kevin F. A. Darras1Renjie Xue2Fanglin Liu3Division of Life Sciences and Medicine University of Science and Technology of China Hefei ChinaEFNO, ECODIV, INRAE, Domaine des Barres, Nogent‐sur‐Vernisson Centre‐Val de Loire FranceHefei Institutes of Physical Science, Chinese Academy of Sciences Hefei ChinaDivision of Life Sciences and Medicine University of Science and Technology of China Hefei ChinaAbstract Soundscapes contain rich ecological information, offering insights into both biodiversity and ecosystem dynamics. However, the sheer volume of data produced by passive acoustic monitoring presents significant challenges for scalable analysis and ecological interpretation. While convolutional neural networks (CNNs) have advanced species classification in bioacoustics, they often struggle with identifying acoustic targets in acoustic space and quantifying soundscapes' characteristics. In this study, we propose a novel spectrogram‐to‐spectrogram translation framework based on generative adversarial networks (GANs) to isolate and quantify acoustic sources within soundscape recordings. Our method is trained on paired spectrogram images: original full‐spectrogram representations and target spectrogram representations containing only the vocalizations of specific sound labels. This design enables the model to learn source‐specific mappings and perform both the species and community‐level separation of acoustic components in soundscape recordings. We developed and evaluated two GAN‐based models: a species‐level GAN targeting eight avian species, and a community‐level GAN distinguishing among avian, insect and anthropogenic sound sources. The models were trained and tested using soundscape recordings collected from the Yaoluoping National Nature Reserve, eastern China. The species‐level model achieved a mean F1 score of 0.76 for pixel‐wise detection, while the community‐level model reached 0.79 across categories. In addition to precise temporal‐spectral localization, our approach captures sources' acoustic occupancy and frequency distribution patterns, offering deeper ecological insight. Compared to baseline CNN classifiers, our model achieved a mean F1 score of 0.97, demonstrating comparable classification performance to ResNet50 (0.95) and VGG16 (0.98) across multiple species. Our GAN approach for extracting sound sources also significantly outperformed conventional methods in denoising and source separation, as indicated by lower image‐level mean squared error. These results demonstrate the utility of GANs in advancing ecoacoustic analyses and biodiversity monitoring. By enabling robust source separation and fine‐resolution signal mapping, the proposed approach contributes a scalable and transferable tool for soundscape quantification.https://doi.org/10.1111/2041-210X.70148acoustic spaceclassificationdenoisegenerative adversarial network (GAN)source separation |
| spellingShingle | Mei Wang Kevin F. A. Darras Renjie Xue Fanglin Liu Animal acoustic identification, denoising and source separation using generative adversarial networks acoustic space classification denoise generative adversarial network (GAN) source separation |
| title | Animal acoustic identification, denoising and source separation using generative adversarial networks |
| title_full | Animal acoustic identification, denoising and source separation using generative adversarial networks |
| title_fullStr | Animal acoustic identification, denoising and source separation using generative adversarial networks |
| title_full_unstemmed | Animal acoustic identification, denoising and source separation using generative adversarial networks |
| title_short | Animal acoustic identification, denoising and source separation using generative adversarial networks |
| title_sort | animal acoustic identification denoising and source separation using generative adversarial networks |
| topic | acoustic space classification denoise generative adversarial network (GAN) source separation |
| url | https://doi.org/10.1111/2041-210X.70148 |
| work_keys_str_mv | AT meiwang animalacousticidentificationdenoisingandsourceseparationusinggenerativeadversarialnetworks AT kevinfadarras animalacousticidentificationdenoisingandsourceseparationusinggenerativeadversarialnetworks AT renjiexue animalacousticidentificationdenoisingandsourceseparationusinggenerativeadversarialnetworks AT fanglinliu animalacousticidentificationdenoisingandsourceseparationusinggenerativeadversarialnetworks |
