Clinical Named Entity Recognition (NER) is essential for extracting important medical insights from clinical narratives. Given the challenges in obtaining expert training datasets for real-world clinical applications related to data protection regulations and the lack of standardised entity types, this work represents a collaborative initiative aimed at building a German clinical NER system with a focus on addressing these obstacles effectively. In response to the challenge of training data scarcity, we propose a Conditional Relevance Learning (CRL) approach in low-resource transfer learning scenarios [1]. CRL effectively leverages a pre-trained language model and domain-specific open resources, enabling the acquisition of a robust base model tailored for clinical NER tasks, particularly in the face of changing label sets. This flexibility empowers the implementation of a Multilayered Semantic Annotation (MSA) schema in our NER system, capable of organizing a diverse array of entity types, thus significantly boosting the NER system’s adaptability and utility across various clinical domains.
In the case study collaborating with the medical informatics institute, we demonstrate how our NER system can be applied to overcome resource constraints and comply with data privacy regulations. Lacking prior training on in-domain data, feedback from expert users in respective domains is essential in identifying areas for system refinement. Future work will focus on the integration of expert feedback to improve system performance in specific clinical contexts [2].
The interface of our NER system applied in the study case involving the clinical reports of cardiology.
References
[1] Liang, S., Hartmann, M. and Sonntag, D., 2023, July. Cross-domain German medical named entity recognition using a pre-trained language model and unified medical semantic types. In Proceedings of the 5th Clinical Natural Language Processing Workshop (pp. 259-271).
[2] Liang, S. and Sonntag, D., 2024, June. Building A German Clinical Named Entity Recognition System without In-domain Training Data. In Proceedings of the 6th Clinical Natural Language Processing Workshop (pp. 70-81).