Building reliable AI models depends not only on how much data is annotated, but on the quality and meaning of the labels used during annotation. In many workflows, labels are flat, task-specific class names. They are easy to apply, but lack explicit semantic structure, provenance, and links to shared domain knowledge, making them hard to reuse, compare, refine, or combine across datasets and learning pipelines, especially in expert-driven domains where labels encode real domain knowledge.

We investigate Grounded Label Space Engineering (GLSE), an approach for constructing structured, ontology-grounded label spaces before annotation begins. Rather than defining labels only as plain text, experts describe the relevant concepts in natural language; the system retrieves candidate matches from authoritative reference resources, and the expert stays in control, inspecting, accepting, rejecting, or refining each grounding before it enters the label space. The result is a semantic contract for subsequent annotation: a label space that records concept relations, grounding status, locally defined concepts, and provenance about how each concept was selected and grounded. This makes labels more transparent, reusable, and machine-actionable.

We develop and evaluate GLSE in passive acoustic monitoring (PAM) and bioacoustics — increasingly used to study biodiversity, species activity, and ecosystem health — where annotation concepts include species, genus and family relations, call types, habitats, environmental sounds, and recording context. Species concepts are grounded to biodiversity taxonomies such as GBIF, and environmental concepts to ontologies such as ENVO; concepts that no resource covers, such as novel or locally observed species, are preserved as structured local representations for later review or alignment, so standard knowledge resources and local expert knowledge can coexist in one label space. Because the labels carry semantic structure, they support more interpretable model evaluation, for example distinguishing whether errors occur at the species, genus, or family level, or analysing the influence of background conditions and recording context.

Our current prototype demonstrates this grounding-first workflow end to end: experts provide concepts in natural language, the system searches relevant semantic resources, and confirmed concepts are stored with grounding status and provenance. In ongoing work, we are evaluating how this workflow supports realistic annotation settings, model development, and downstream model evaluation. The long-term goal is to connect annotation, knowledge engineering, and machine learning more tightly, enabling knowledge-centric AI systems that can better handle evolving concepts, local expert knowledge, and changing domain conditions.

Grounding-first label-space construction for bioacoustic annotation. Experts describe concepts in natural language, system retrieves candidates from standard knowledge base, and confirm candidates added to label space for subsequent annotation.

References

Sitapara, P., Doddanawar, P., Gouvêa, T. S., & Sonntag, D. (2026). Structuring annotation label spaces by natural language concept elicitation and ontology grounding [Submitted].

Sitapara, P., Gouvêa, T. S., & Sonntag, D. (2026). Grounded label space engineering for knowledge-centric annotation workflows [Submitted].

Phan, H., Hertel, L., Maass, M., Koch, P., Mazur, R., & Mertins, A. (2017). Improved audio scene classification based on label-tree embeddings and convolutional neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing25(6), 1278-1290.

Colonna, J. G., Gama, J., & Nakamura, E. F. (2018). A comparison of hierarchical multi-output recognition approaches for anuran classification. Machine Learning107(11), 1651-1671.

Breit, A., Waltersdorfer, L., Ekaputra, F. J., Sabou, M., Ekelhart, A., Iana, A., … & Van Harmelen, F. (2023). Combining machine learning and semantic web: A systematic mapping study. ACM Computing Surveys55(14s), 1-41.