Passive Acoustic Monitoring (PAM) enables continuous and non-invasive biodiversity monitoring, but analysing large acoustic datasets remains difficult because sound event detectors usually require temporally precise annotations. Creating such instance-level labels is expensive and requires expert knowledge. At the same time, museum and community-run sound libraries provide large numbers of animal recordings with weak file-level labels, indicating which species is present but not when it occurs. This project investigates how such weakly labelled sound collections can be turned into useful training data for PAM analysis.
We use Multiple Instance Learning (MIL) to learn from file-level species labels while still producing temporally localised predictions. Each recording is treated as a collection of short temporal segments, allowing the model to identify which parts of the recording likely contain the target vocalisation. Building on this approach, we developed an interactive workflow that supports MIL-based localisation, transfer to independent PAM datasets, and lightweight user refinement. Users can inspect candidate sound events, provide presence–absence feedback, and iteratively improve the detector.
Workflow for transferring weakly labelled sound-library recordings to Passive Acoustic Monitoring data using Multiple Instance Learning
A central challenge is the domain shift between focal sound-library recordings and real PAM soundscapes. Sound-library recordings often contain one dominant species under relatively clean conditions, while PAM recordings include overlapping vocalisations, environmental noise, and multiple species. To address this, we also investigate synthetic
multi-species augmentation, where recordings from different species are mixed during training to better simulate PAM-like acoustic conditions.
We evaluated the workflow using weakly labelled recordings from the FNJV sound collection and strongly annotated PAM data from AnuraSet. Preliminary results show that weakly labelled sound libraries can provide a meaningful training signal for downstream PAM detection. Synthetic multi-species augmentation further improved transfer performance, increasing recall and F1-score at both recording and segment level.

Screenshot of the interactive tool for inspecting candidate sound events, refining weakly supervised localisation results, and supporting transfer to PAM datasets.
Overall, this project provides a practical basis for making weak supervision interactive in biodiversity monitoring. By combining MIL-based localisation, pretrained acoustic embeddings, synthetic multi-species augmentation, and active learning-based refinement, the workflow helps bridge the gap between archival sound collections and real-world acoustic monitoring while reducing the need for costly strong annotations.
References
Mammadli, N. et al. Making weak supervision interactive: Exploring transfer from sound libraries to passive acoustic monitoring data. In Proceedings of the 34th International Joint Conference on Artificial Intelligence (IJCAI 2026) Demonstrations Track (2026).
Mammadli, N., Gouvêa, T. S. & Sonntag, D. Multi-species mixing for weakly supervised SED under domain shift. In Proceedings of the 49th German Conference on Artificial Intelligence (KI 2026) (Springer, 2026).
Kahl, S., Wood, C. M., Eibl, M., & Klinck, H.: BirdNET: A deep learning solution for avian diversity monitoring. Ecological Informatics, 2021.
Cañas, J. S., et al.: AnuraSet: A dataset for weakly supervised sound event detection in bioacoustics. 2023.
Wang, Y., Li, J., & Metze, F.: A comparison of five multiple instance learning pooling functions for sound event detection with weak labeling. ICASSP, 2019.
Contact
Novruz Mammadli (Novruz.Mammadli@dfki.de)