Figure 1. An overview of the transfer learning framework with BERT-SNER.

Information extraction from clinical text has the potential for clinical research and personalized care, but annotating large data for customized requirements is prohibitive. We present a German medical Named Entity Recognition (NER) system (Liang et al., 2023) that transfers knowledge across domains. Our approach, BERT-SNER (Figure 1), builds on insights from recent transfer learning research. We leverage Pre-trained Language Model (PLM) for downstream tasks, prompting PLMs for low-resource NER, and using the UMLS Metathesaurus for medical term mining. Initially, we train the model on a generic medical corpus with UMLS labels and apply it to clinical NER tasks with limited data. Evaluating on two German datasets from different clinics in zero- and few-shot settings, our approach surpasses task-specific Condition Random Fields (CRF) classifiers in accuracy. BERT-SNER serves as an initial model for domain-specific applications, requiring much less fine-tuning data than training from scratch. Figure 2 demonstrates BERT-SNER’s use as an automatic annotation tool on clinical documents. Future work involves exploring active learning to reduce manual annotation workload and fine-tuning for better performance.

Figure 2. The user interface of showing automatic annotation results with the model trained in this work and applied to the dataset of CARDIO:DE (Richter-Pechanski et al., 2023). Snippet A serves as a sidebar for users to select the clinic field, document, and specific sections. Snippet B contains default labels for semantic types to be annotated. Snippet C displays multi-layer automatic annotation results, divided by semantic groups, for the selected document and sections. Snippet D explains how the model’s confidence in predictions changes when altering preceding semantic types in input sentences.


Liang, Siting, Mareike Hartmann, and Daniel Sonntag. “Cross-domain German Medical Named Entity Recognition using a Pre-Trained Language Model and Unified Medical Semantic Types.” Proceedings of the 5th Clinical Natural Language Processing Workshop. 2023.

Richter-Pechanski, Phillip, et al. “A distributable German clinical corpus containing cardiovascular clinical routine doctor’s letters.” Scientific Data 10.1 (2023): 207.