2022 |
Journal Articles |
Albert, Iannis; Burkard, Nicole; Queck, Dirk; Herrlich, Marc The Effect of Auditory-Motor Synchronization in Exergames on the Example of the VR Rhythm Game BeatSaber Journal Article Proceedings of the ACM on Human-Computer Interaction, 6 , pp. 1-26, 2022. @article{12841, title = {The Effect of Auditory-Motor Synchronization in Exergames on the Example of the VR Rhythm Game BeatSaber}, author = {Iannis Albert and Nicole Burkard and Dirk Queck and Marc Herrlich}, year = {2022}, date = {2022-01-01}, journal = {Proceedings of the ACM on Human-Computer Interaction}, volume = {6}, pages = {1-26}, publisher = {Association for Computing Machinery}, abstract = {Physical inactivity and an increasingly sedentary lifestyle constitute a significant public health concern. Exergames try to tackle this problem by combining exercising with motivational gameplay. Another approach in sports science is the use of auditory-motor synchronization, the entrainment of movements to the rhythm of music. There are already commercially successful games making use of the combination of both, such as the popular VR rhythm game BeatSaber. However, unlike traditional exercise settings often relying on periodic movements that can be easily entrained to a rhythmic pulse, exergames typically offer an additional cognitive challenge through their gameplay and might be based more on reaction or memorization. That poses the question as to what extent the effects of auditory-motor synchronization can be transferred to exergames, and if the synchronization of music and gameplay facilitates the playing experience. We conducted a user study (N = 54) to investigate the effects of different degrees of synchronization between music and gameplay using the VR rhythm game BeatSaber. Results show significant effects on performance, perceived workload, and player experience between the synchronized and non-synchronized conditions, but the results seem to be strongly mediated by the ability of the participants to consciously perceive the synchronization differences.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Physical inactivity and an increasingly sedentary lifestyle constitute a significant public health concern. Exergames try to tackle this problem by combining exercising with motivational gameplay. Another approach in sports science is the use of auditory-motor synchronization, the entrainment of movements to the rhythm of music. There are already commercially successful games making use of the combination of both, such as the popular VR rhythm game BeatSaber. However, unlike traditional exercise settings often relying on periodic movements that can be easily entrained to a rhythmic pulse, exergames typically offer an additional cognitive challenge through their gameplay and might be based more on reaction or memorization. That poses the question as to what extent the effects of auditory-motor synchronization can be transferred to exergames, and if the synchronization of music and gameplay facilitates the playing experience. We conducted a user study (N = 54) to investigate the effects of different degrees of synchronization between music and gameplay using the VR rhythm game BeatSaber. Results show significant effects on performance, perceived workload, and player experience between the synchronized and non-synchronized conditions, but the results seem to be strongly mediated by the ability of the participants to consciously perceive the synchronization differences. |
Inproceedings |
Liang, Siting; Kades, Klaus; Fink, Matthias A; Full, Peter M; Weber, Tim F; Kleesiek, Jens; Strube, Michael; Maier-Hein, Klaus Fine-tuning BERT Models for Summarizing German Radiology Findings Inproceedings Naumann, Tristan; Bethard, Steven; Roberts, Kirk; Rumshisky, Anna (Ed.): Proceedings of the 4th Clinical Natural Language Processing Workshop, Association for Computational Linguistics, 2022. @inproceedings{12809, title = {Fine-tuning BERT Models for Summarizing German Radiology Findings}, author = {Siting Liang and Klaus Kades and Matthias A Fink and Peter M Full and Tim F Weber and Jens Kleesiek and Michael Strube and Klaus Maier-Hein}, editor = {Tristan Naumann and Steven Bethard and Kirk Roberts and Anna Rumshisky}, url = {https://www.dfki.de/fileadmin/user_upload/import/12809_2022.clinicalnlp-1.4.pdf}, year = {2022}, date = {2022-07-01}, booktitle = {Proceedings of the 4th Clinical Natural Language Processing Workshop}, publisher = {Association for Computational Linguistics}, abstract = {Writing the conclusion section of radiology reports is essential for communicating the radiology findings and its assessment to physicians in a condensed form. In this work, we employ a transformer-based Seq2Seq model for generating the conclusion section of German radiology reports. The model is initialized with the pre-trained parameters of a German BERT model and fine-tuned in our downstream task on our domain data. We proposed two strategies to improve the factual correctness of the model. In the first method, next to the abstractive learning objective, we introduce an extraction learning objective to train the decoder in the model to both generate one summary sequence and extract the key findings from the source input. The second approach is to integrate the pointer mechanism into the transformer-based Seq2Seq model. The pointer network helps the Seq2Seq model to choose between generating tokens from the vocabulary or copying parts from the source input during generation. The results of the automatic and human evaluations show that the enhanced Seq2Seq model is capable of generating human-like radiology conclusions and that the improved models effectively reduce the factual errors in the generations despite the small amount of training data.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Writing the conclusion section of radiology reports is essential for communicating the radiology findings and its assessment to physicians in a condensed form. In this work, we employ a transformer-based Seq2Seq model for generating the conclusion section of German radiology reports. The model is initialized with the pre-trained parameters of a German BERT model and fine-tuned in our downstream task on our domain data. We proposed two strategies to improve the factual correctness of the model. In the first method, next to the abstractive learning objective, we introduce an extraction learning objective to train the decoder in the model to both generate one summary sequence and extract the key findings from the source input. The second approach is to integrate the pointer mechanism into the transformer-based Seq2Seq model. The pointer network helps the Seq2Seq model to choose between generating tokens from the vocabulary or copying parts from the source input during generation. The results of the automatic and human evaluations show that the enhanced Seq2Seq model is capable of generating human-like radiology conclusions and that the improved models effectively reduce the factual errors in the generations despite the small amount of training data. |
Liang, Siting; Hartmann, Mareike; Sonntag, Daniel Cross-lingual German Biomedical Information Extraction: from Zero-shot to Human-in-the-Loop Inproceedings 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, 2022. @inproceedings{12839, title = {Cross-lingual German Biomedical Information Extraction: from Zero-shot to Human-in-the-Loop}, author = {Siting Liang and Mareike Hartmann and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/12839_2022_HCI+NLP.3.1.pdf}, year = {2022}, date = {2022-07-01}, booktitle = {2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics}, publisher = {Association for Computational Linguistics}, abstract = {This paper presents our project proposal for extracting biomedical information from German clinical narratives with limited amounts of annotations. We first describe the applied strategies in transfer learning and active learning for solving our problem. After that, we discuss the design of the user interface for both supplying model inspection and obtaining user annotations in the interactive environment.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } This paper presents our project proposal for extracting biomedical information from German clinical narratives with limited amounts of annotations. We first describe the applied strategies in transfer learning and active learning for solving our problem. After that, we discuss the design of the user interface for both supplying model inspection and obtaining user annotations in the interactive environment. |
Kath, Hannes; Stone, Simon; Rapp, Stefan; Birkholz, Peter Carina – A Corpus of Aligned German Read Speech Including Annotations Inproceedings ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6157-6161, Institute of Electrical and Electronics Engineers (IEEE), 2022. @inproceedings{14712, title = {Carina – A Corpus of Aligned German Read Speech Including Annotations}, author = {Hannes Kath and Simon Stone and Stefan Rapp and Peter Birkholz}, url = {https://www.dfki.de/fileadmin/user_upload/import/14712_Carina__A_Corpus_of_Aligned_German_Read_Speech_Including_Annotations.pdf}, doi = {https://doi.org/10.1109/ICASSP43922.2022.9746160}, year = {2022}, date = {2022-05-01}, booktitle = {ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, pages = {6157-6161}, publisher = {Institute of Electrical and Electronics Engineers (IEEE)}, abstract = {This paper presents the semi-automatically created Corpus of Aligned Read Speech Including Annotations (CARInA), a speech corpus based on the German Spoken Wikipedia Corpus (GSWC). CARInA tokenizes, consolidates and organizes the vast, but rather unstructured material contained in GSWC. The contents are grouped by annotation completeness, and extended by canonic, morphosyntactic and prosodic annotations. The annotations are provided in BPF and TextGrid format. It contains 194 hours of speech material from 327 speakers, of which 124 hours are fully phonetically aligned and 30 hours are fully aligned at all annotation levels. CARInA is freely available 1 , designed to grow and improve over time, and suitable for large-scale speech analyses or machine learning tasks as illustrated by two examples shown in this paper.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } This paper presents the semi-automatically created Corpus of Aligned Read Speech Including Annotations (CARInA), a speech corpus based on the German Spoken Wikipedia Corpus (GSWC). CARInA tokenizes, consolidates and organizes the vast, but rather unstructured material contained in GSWC. The contents are grouped by annotation completeness, and extended by canonic, morphosyntactic and prosodic annotations. The annotations are provided in BPF and TextGrid format. It contains 194 hours of speech material from 327 speakers, of which 124 hours are fully phonetically aligned and 30 hours are fully aligned at all annotation levels. CARInA is freely available 1 , designed to grow and improve over time, and suitable for large-scale speech analyses or machine learning tasks as illustrated by two examples shown in this paper. |
Queck, Dirk; Albert, Iannis; Burkard, Nicole; Zimmer, Philipp; Volkmar, Georg; Dänekas, Bastian; Malaka, Rainer; Herrlich, Marc SpiderClip: Towards an Open Source System for Wearable Device Simulation in Virtual Reality Inproceedings CHI EA '22: Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, 2022. @inproceedings{12550, title = {SpiderClip: Towards an Open Source System for Wearable Device Simulation in Virtual Reality}, author = {Dirk Queck and Iannis Albert and Nicole Burkard and Philipp Zimmer and Georg Volkmar and Bastian Dänekas and Rainer Malaka and Marc Herrlich}, url = {https://dl.acm.org/doi/abs/10.1145/3491101.3519758#sec-supp}, year = {2022}, date = {2022-04-01}, booktitle = {CHI EA '22: Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems}, publisher = {Association for Computing Machinery}, abstract = {Smartwatches and fitness trackers integrate different sensors from inertial measurement units to heart rate sensors in a very compact and affordable form factor. This makes them interesting and relevant research tools. One potential application domain is virtual reality, e.g., for health related applications such as exergames or simulation approaches. However, commercial devices complicate and limit the collection of raw and real-time data, suffer from privacy issues and are not tailored to using them with VR tracking systems. We address these issues with an open source design to facilitate the construction of VR-enabled wearables for conducting scientific experiments. Our work is motivated by research in mixed realities in pervasive computing environments. We introduce our system and present a proof-of-concept study with 17 participants. Our results show that the wearable reliably measures high-quality data comparable to commercially available fitness trackers and that it does not impede movements or interfere with VR tracking.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Smartwatches and fitness trackers integrate different sensors from inertial measurement units to heart rate sensors in a very compact and affordable form factor. This makes them interesting and relevant research tools. One potential application domain is virtual reality, e.g., for health related applications such as exergames or simulation approaches. However, commercial devices complicate and limit the collection of raw and real-time data, suffer from privacy issues and are not tailored to using them with VR tracking systems. We address these issues with an open source design to facilitate the construction of VR-enabled wearables for conducting scientific experiments. Our work is motivated by research in mixed realities in pervasive computing environments. We introduce our system and present a proof-of-concept study with 17 participants. Our results show that the wearable reliably measures high-quality data comparable to commercially available fitness trackers and that it does not impede movements or interfere with VR tracking. |
Valdunciel, Pablo; Bhatti, Omair Shahzad; Barz, Michael; Sonntag, Daniel Interactive Assessment Tool for Gaze-based Machine Learning Models in Information Retrieval Inproceedings ACM SIGIR Conference on Human Information Interaction and Retrieval, Association for Computing Machinery, 2022. @inproceedings{12287, title = {Interactive Assessment Tool for Gaze-based Machine Learning Models in Information Retrieval}, author = {Pablo Valdunciel and Omair Shahzad Bhatti and Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/12287_3498366.3505834.pdf}, year = {2022}, date = {2022-03-01}, booktitle = {ACM SIGIR Conference on Human Information Interaction and Retrieval}, publisher = {Association for Computing Machinery}, abstract = {Eye movements were shown to be an effective source of implicit relevance feedback in information retrieval tasks. They can be used to, e.g., estimate the relevance of read documents and expand search queries using machine learning. In this paper, we present the Reading Model Assessment tool (ReMA), an interactive tool for assessing gaze-based relevance estimation models. Our tool allows experimenters to easily browse recorded trials, compare the model output to a ground truth, and visualize gaze-based features at the token- and paragraph-level that serve as model input. Our goal is to facilitate the understanding of the relation between eye movements and the human relevance estimation process, to understand the strengths and weaknesses of a model at hand, and, eventually, to enable researchers to build more effective models.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Eye movements were shown to be an effective source of implicit relevance feedback in information retrieval tasks. They can be used to, e.g., estimate the relevance of read documents and expand search queries using machine learning. In this paper, we present the Reading Model Assessment tool (ReMA), an interactive tool for assessing gaze-based relevance estimation models. Our tool allows experimenters to easily browse recorded trials, compare the model output to a ground truth, and visualize gaze-based features at the token- and paragraph-level that serve as model input. Our goal is to facilitate the understanding of the relation between eye movements and the human relevance estimation process, to understand the strengths and weaknesses of a model at hand, and, eventually, to enable researchers to build more effective models. |
Lauer, Luisa; Javaheri, Hamraz; Altmeyer, Kristin; Malone, Sarah; Grünerbl, Agnes; Barz, Michael; Peschel, Markus; Brünken, Roland; Lukowicz, Paul Encountering Students' Learning Difficulties in Electrics - Didactical Concept and Prototype of Augmented Reality-Toolkit Inproceedings Fostering scientific citizenship in an uncertain world - ESERA 2021 e-Proceedings, University of Minho, 2022. @inproceedings{12121, title = {Encountering Students' Learning Difficulties in Electrics - Didactical Concept and Prototype of Augmented Reality-Toolkit}, author = {Luisa Lauer and Hamraz Javaheri and Kristin Altmeyer and Sarah Malone and Agnes Grünerbl and Michael Barz and Markus Peschel and Roland Brünken and Paul Lukowicz}, url = {https://www.dfki.de/fileadmin/user_upload/import/12121_2022_Encountering_Students'_Learning_Difficulties_in_Electrics_-_Didactical_Concept_and_Prototype_of_Augmented_Reality-Toolkit.pdf}, year = {2022}, date = {2022-01-01}, booktitle = {Fostering scientific citizenship in an uncertain world - ESERA 2021 e-Proceedings}, publisher = {University of Minho}, abstract = {Real-time visualization of electrical circuit scematics in accordance to the components’ semantic connection• Use of the toolkit may faciliate the acquisition of representational competencies (concerning the matching of components and symbols and the matching of circuits and circuit schematics)• Usable with either handheld AR-devices or head-mounted AR-devices}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Real-time visualization of electrical circuit scematics in accordance to the components’ semantic connection• Use of the toolkit may faciliate the acquisition of representational competencies (concerning the matching of components and symbols and the matching of circuits and circuit schematics)• Usable with either handheld AR-devices or head-mounted AR-devices |
Nguyen, Ho Minh Duy; Henschel, Roberto; Rosenhahn, Bodo; Sonntag, Daniel; Swoboda, Paul LMGP: Lifted Multicut Meets Geometry Projections for Multi-Camera Multi-Object Tracking Inproceedings Conference on Computer Vision and Pattern Recognition (CVPR) 2022, IEEE/CVF, 2022. @inproceedings{12286, title = {LMGP: Lifted Multicut Meets Geometry Projections for Multi-Camera Multi-Object Tracking}, author = {Ho Minh Duy Nguyen and Roberto Henschel and Bodo Rosenhahn and Daniel Sonntag and Paul Swoboda}, url = {https://arxiv.org/pdf/2111.11892.pdf}, year = {2022}, date = {2022-01-01}, booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR) 2022}, publisher = {IEEE/CVF}, abstract = {Multi-Camera Multi-Object Tracking is currently drawing attention in the computer vision field due to its superior performance in real-world applications such as video surveillance with crowded scenes or in vast space. In this work, we propose a mathematically elegant multi-camera multiple object tracking approach based on a spatial-temporal lifted multicut formulation. Our model utilizes state-of-the-art tracklets produced by single-camera trackers as proposals. As these tracklets may contain ID-Switch errors, we refine them through a novel pre-clustering obtained from 3D geometry projections. As a result, we derive a better tracking graph without ID switches and more precise affinity costs for the data association phase. Tracklets are then matched to multi-camera trajectories by solving a global lifted multicut formulation that incorporates short and long-range temporal interactions on tracklets located in the same camera as well as inter-camera ones. Experimental results on the WildTrack dataset yield near-perfect result, outperforming state-of-the-art trackers on Campus while being on par on the PETS-09 dataset. We will make our implementations available upon acceptance of the paper.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Multi-Camera Multi-Object Tracking is currently drawing attention in the computer vision field due to its superior performance in real-world applications such as video surveillance with crowded scenes or in vast space. In this work, we propose a mathematically elegant multi-camera multiple object tracking approach based on a spatial-temporal lifted multicut formulation. Our model utilizes state-of-the-art tracklets produced by single-camera trackers as proposals. As these tracklets may contain ID-Switch errors, we refine them through a novel pre-clustering obtained from 3D geometry projections. As a result, we derive a better tracking graph without ID switches and more precise affinity costs for the data association phase. Tracklets are then matched to multi-camera trajectories by solving a global lifted multicut formulation that incorporates short and long-range temporal interactions on tracklets located in the same camera as well as inter-camera ones. Experimental results on the WildTrack dataset yield near-perfect result, outperforming state-of-the-art trackers on Campus while being on par on the PETS-09 dataset. We will make our implementations available upon acceptance of the paper. |
Gouvea, Thiago; Troshani, Ilira; Herrlich, Marc; Sonntag, Daniel Annotating sound events through interactive design of interpretable features Inproceedings Proceedings of the First International Conference on Hybrid Human-Machine Intelligence, IOS Press, 2022. @inproceedings{12428, title = {Annotating sound events through interactive design of interpretable features}, author = {Thiago Gouvea and Ilira Troshani and Marc Herrlich and Daniel Sonntag}, url = {https://www.hhai-conference.org/wp-content/uploads/2022/06/hhai2022-pd_paper_7726.pdf}, year = {2022}, date = {2022-01-01}, booktitle = {Proceedings of the First International Conference on Hybrid Human-Machine Intelligence}, publisher = {IOS Press}, abstract = {Professionals of all domains of expertise expect to take part in the benefits of the machine learning (ML) revolution, but realisation is often slowed down by lack of training in ML concepts and tools, as well as low availability of annotated data for supervised methods. Inspired by the problem of assessing the impact of human-generated activity on marine ecosystems through passive acoustic monitoring (PAM), we are developing Seadash, an interactive tool for event detection and classification in multivariate time series.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Professionals of all domains of expertise expect to take part in the benefits of the machine learning (ML) revolution, but realisation is often slowed down by lack of training in ML concepts and tools, as well as low availability of annotated data for supervised methods. Inspired by the problem of assessing the impact of human-generated activity on marine ecosystems through passive acoustic monitoring (PAM), we are developing Seadash, an interactive tool for event detection and classification in multivariate time series. |
Gouvêa, Thiago S; Troshani, Ilira; Herrlich, Marc; Sonntag, Daniel Interactive design of interpretable features for marine soundscape data annotation Inproceedings Workshop on Human-centered Design of Symbiotic Hybrid Intelligence, HHAI, 2022. @inproceedings{12429, title = {Interactive design of interpretable features for marine soundscape data annotation}, author = {Thiago S Gouvêa and Ilira Troshani and Marc Herrlich and Daniel Sonntag}, year = {2022}, date = {2022-01-01}, booktitle = {Workshop on Human-centered Design of Symbiotic Hybrid Intelligence}, publisher = {HHAI}, abstract = {Machine learning (ML) is increasingly used in different application domains. However, to reach its full potential it is important that experts without extensive ML training be able to create and effectively apply models in their domain. This requires forms of co-learning that need to be facilitated by effective interfaces and interaction paradigms. Inspired by the problem of detecting and classifying sound events in marine soundscapes, we are developing Seadash. Through a rapid, iterative data exploration workflow, the user designs and curates features that capture meaningful structure in the data, and uses these to efficiently annotate the dataset. While the tool is still in early stages, we present the concept and discuss future directions.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Machine learning (ML) is increasingly used in different application domains. However, to reach its full potential it is important that experts without extensive ML training be able to create and effectively apply models in their domain. This requires forms of co-learning that need to be facilitated by effective interfaces and interaction paradigms. Inspired by the problem of detecting and classifying sound events in marine soundscapes, we are developing Seadash. Through a rapid, iterative data exploration workflow, the user designs and curates features that capture meaningful structure in the data, and uses these to efficiently annotate the dataset. While the tool is still in early stages, we present the concept and discuss future directions. |
Hartmann, Mareike; Sonntag, Daniel A survey on improving NLP models with human explanations Inproceedings Proceedings of the First Workshop on Learning with Natural Language Supervision, Association for Computational Linguistics, 2022. @inproceedings{12519, title = {A survey on improving NLP models with human explanations}, author = {Mareike Hartmann and Daniel Sonntag}, url = {https://aclanthology.org/2022.lnls-1.5.pdf}, year = {2022}, date = {2022-01-01}, booktitle = {Proceedings of the First Workshop on Learning with Natural Language Supervision}, publisher = {Association for Computational Linguistics}, abstract = {Training a model with access to human explanations can improve data efficiency and model performance on in- and out-of-domain data. Adding to these empirical findings, similarity with the process of human learning makes learning from explanations a promising way to establish a fruitful human-machine interaction. Several methods have been proposed for improving natural language processing (NLP) models with human explanations, that rely on different explanation types and mechanism for integrating these explanations into the learning process. These methods are rarely compared with each other, making it hard for practitioners to choose the best combination of explanation type and integration mechanism for a specific use-case. In this paper, we give an overview of different methods for learning from human explanations, and discuss different factors that can inform the decision of which method to choose for a specific use-case.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Training a model with access to human explanations can improve data efficiency and model performance on in- and out-of-domain data. Adding to these empirical findings, similarity with the process of human learning makes learning from explanations a promising way to establish a fruitful human-machine interaction. Several methods have been proposed for improving natural language processing (NLP) models with human explanations, that rely on different explanation types and mechanism for integrating these explanations into the learning process. These methods are rarely compared with each other, making it hard for practitioners to choose the best combination of explanation type and integration mechanism for a specific use-case. In this paper, we give an overview of different methods for learning from human explanations, and discuss different factors that can inform the decision of which method to choose for a specific use-case. |
Graf, Linda; Altmeyer, Maximilian; Emmerich, Katharina; Herrlich, Marc; Krekhov, Andrey; Spiel, Katta Development and Validation of a German Version of the Player Experience Inventory (PXI) Inproceedings Proceedings of the Mensch und Computer Conference, ACM, 2022. @inproceedings{12535, title = {Development and Validation of a German Version of the Player Experience Inventory (PXI)}, author = {Linda Graf and Maximilian Altmeyer and Katharina Emmerich and Marc Herrlich and Andrey Krekhov and Katta Spiel}, url = {https://www.dfki.de/fileadmin/user_upload/import/12535_MuC22__German_PXI_Version.pdf}, year = {2022}, date = {2022-01-01}, booktitle = {Proceedings of the Mensch und Computer Conference}, publisher = {ACM}, abstract = {The Player Experience Inventory (PXI), initially developed by Abeele et al. (2020), measures player experiences among English-speaking players. However, empirically validated translations of the PXI are sparse, limiting the use of the scale among non-English speaking players. In this paper, we address this issue by providing a translated version of the scale in German, the most widely spoken first language in the European Union. After translating the original items, we conducted a confirmatory factor analysis (N=506) to validate the German version of the PXI. Our results confirmed a 10-factor model - which the original authors of the instrument suggested - and show that the German PXI has valid psychometric properties. While model fit, internal consistency and convergent validity were acceptable, there was room for improvement regarding discriminant validity. Based on our results, we advocate for the German PXI as a valid and reliable instrument for assessing player experiences in German-speaking samples.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The Player Experience Inventory (PXI), initially developed by Abeele et al. (2020), measures player experiences among English-speaking players. However, empirically validated translations of the PXI are sparse, limiting the use of the scale among non-English speaking players. In this paper, we address this issue by providing a translated version of the scale in German, the most widely spoken first language in the European Union. After translating the original items, we conducted a confirmatory factor analysis (N=506) to validate the German version of the PXI. Our results confirmed a 10-factor model - which the original authors of the instrument suggested - and show that the German PXI has valid psychometric properties. While model fit, internal consistency and convergent validity were acceptable, there was room for improvement regarding discriminant validity. Based on our results, we advocate for the German PXI as a valid and reliable instrument for assessing player experiences in German-speaking samples. |
Rekrut, Maurice; Selim, Abdulrahman Mohamed; Krüger, Antonio Improving Silent Speech BCI Training Procedures through Transfer from Overt to Silent Speech Inproceedings Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, IEEE, 2022. @inproceedings{12619, title = {Improving Silent Speech BCI Training Procedures through Transfer from Overt to Silent Speech}, author = {Maurice Rekrut and Abdulrahman Mohamed Selim and Antonio Krüger}, year = {2022}, date = {2022-01-01}, booktitle = {Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics}, publisher = {IEEE}, abstract = {Silent speech Brain-Computer Interfaces (BCIs) try to decode imagined speech from brain activity. Those BCIs require a tremendous amount of training data usually collected during mentally and physically exhausting sessions in which participants silently repeat words presented on a screen for several hours. Within this work we present an approach to overcome those exhausting sessions by training a silent speech classifier on data recorded while speaking certain words and transferring this classifier to EEG data recorded during silent repetition of the same words. This approach does not only allow for a less mentally and physically exhausting training procedure but also for a more productive one as the overt speech output can be used for interaction while the classifier for silent speech is trained simultaneously. We evaluated our approach in a study in which 15 participants navigated a virtual robot on a screen in a game like scenario through a maze once with 5 overtly spoken and once with the same 5 silently spoken command words. In an offline analysis we trained a classifier on overt speech data and let it predict silent speech data. Our classification results do not only show successful results for the transfer (61.78%) significantly above chance level but also comparable results to a standard silents speech classifier (71.48%) trained and tested on the same data. These results illustrate the potential of the method to replace the currently tedious training procedures for silent speech BCIs with a more comfortable, engaging and productive approach by a transfer from overt to silent speech.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Silent speech Brain-Computer Interfaces (BCIs) try to decode imagined speech from brain activity. Those BCIs require a tremendous amount of training data usually collected during mentally and physically exhausting sessions in which participants silently repeat words presented on a screen for several hours. Within this work we present an approach to overcome those exhausting sessions by training a silent speech classifier on data recorded while speaking certain words and transferring this classifier to EEG data recorded during silent repetition of the same words. This approach does not only allow for a less mentally and physically exhausting training procedure but also for a more productive one as the overt speech output can be used for interaction while the classifier for silent speech is trained simultaneously. We evaluated our approach in a study in which 15 participants navigated a virtual robot on a screen in a game like scenario through a maze once with 5 overtly spoken and once with the same 5 silently spoken command words. In an offline analysis we trained a classifier on overt speech data and let it predict silent speech data. Our classification results do not only show successful results for the transfer (61.78%) significantly above chance level but also comparable results to a standard silents speech classifier (71.48%) trained and tested on the same data. These results illustrate the potential of the method to replace the currently tedious training procedures for silent speech BCIs with a more comfortable, engaging and productive approach by a transfer from overt to silent speech. |
Kuznetsov, Konstantin; Barz, Michael; Sonntag, Daniel SpellInk: Interactive correction of spelling mistakes in handwritten text Inproceedings Proceedings of the First International Conference on Hybrid Human-Machine Intelligence, pp. 278-280, IOS Press, De Boelelaan 1105, 1081 HV Amsterdam, Netherlands, 2022. @inproceedings{12621, title = {SpellInk: Interactive correction of spelling mistakes in handwritten text}, author = {Konstantin Kuznetsov and Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/12621_hhai22_demo_spellink.pdf https://www.hhai-conference.org/demos/pd_paper_5349/}, year = {2022}, date = {2022-01-01}, booktitle = {Proceedings of the First International Conference on Hybrid Human-Machine Intelligence}, volume = {354}, pages = {278-280}, publisher = {IOS Press}, address = {De Boelelaan 1105, 1081 HV Amsterdam, Netherlands}, abstract = {Despite the current dominance of typed text, writing by hand remains the most natural mean of written communication and information keeping. Still, digital pen input provides limited user experience and lacks flexibility, as most of the manipulations are performed on a digitalized version of the text. In this paper, we present our prototype that enables spellchecking for handwritten text: it allows users to interactively correct misspellings directly in a handwritten script. We plan to study the usability of the proposed user interface and its acceptance by users. Also, we aim to investigate how user feedback can be used to incrementally improve the underlying recognition models.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Despite the current dominance of typed text, writing by hand remains the most natural mean of written communication and information keeping. Still, digital pen input provides limited user experience and lacks flexibility, as most of the manipulations are performed on a digitalized version of the text. In this paper, we present our prototype that enables spellchecking for handwritten text: it allows users to interactively correct misspellings directly in a handwritten script. We plan to study the usability of the proposed user interface and its acceptance by users. Also, we aim to investigate how user feedback can be used to incrementally improve the underlying recognition models. |
Céard-Falkenberg, Felix; Kuznetsov, Konstantin; Prange, Alexander; Barz, Michael; Sonntag, Daniel pEncode: A Tool for Visualizing Pen Signal Encodings in Real-time Inproceedings Proceedings of the First International Conference on Hybrid Human-Machine Intelligence, pp. 281-284, IOS Press, De Boelelaan 1105, 1081 HV Amsterdam, Netherlands, 2022. @inproceedings{12622, title = {pEncode: A Tool for Visualizing Pen Signal Encodings in Real-time}, author = {Felix Céard-Falkenberg and Konstantin Kuznetsov and Alexander Prange and Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/12622_hhai22_demo_pencode.pdf https://www.youtube.com/watch?v=t80aa2E5jKo}, year = {2022}, date = {2022-01-01}, booktitle = {Proceedings of the First International Conference on Hybrid Human-Machine Intelligence}, volume = {354}, pages = {281-284}, publisher = {IOS Press}, address = {De Boelelaan 1105, 1081 HV Amsterdam, Netherlands}, abstract = {Many features have been proposed for encoding the input signal from digital pens and touch-based interaction. They are widely used for analyzing and classifying handwritten texts, sketches, or gestures. Although they are well defined mathematically, many features are non-trivial and therefore difficult to understand for a human. In this paper, we present an application that visualizes a subset from 114 digital pen features in real-time while drawing. It provides an easy-to-use interface that allows application developers and machine learning practitioners to learn how digital pen features encode their inputs, helps in the feature selection process, and enables rapid prototyping of sketch and gesture classifiers.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Many features have been proposed for encoding the input signal from digital pens and touch-based interaction. They are widely used for analyzing and classifying handwritten texts, sketches, or gestures. Although they are well defined mathematically, many features are non-trivial and therefore difficult to understand for a human. In this paper, we present an application that visualizes a subset from 114 digital pen features in real-time while drawing. It provides an easy-to-use interface that allows application developers and machine learning practitioners to learn how digital pen features encode their inputs, helps in the feature selection process, and enables rapid prototyping of sketch and gesture classifiers. |
Bhatti, Omair Shahzad; Barz, Michael; Sonntag, Daniel Leveraging Implicit Gaze-Based User Feedback for Interactive Machine Learning Inproceedings Rodermund, Stephanie C; Timm, Ingo J; Malburg, Lukas; Bergmann, Ralph (Ed.): KI 2022: Advances in Artificial Intelligence, pp. 9-16, Springer International Publishing, 2022. @inproceedings{12633, title = {Leveraging Implicit Gaze-Based User Feedback for Interactive Machine Learning}, author = {Omair Shahzad Bhatti and Michael Barz and Daniel Sonntag}, editor = {Stephanie C Rodermund and Ingo J Timm and Lukas Malburg and Ralph Bergmann}, url = {https://www.dfki.de/fileadmin/user_upload/import/12633_Leveraging_implicit_gaze_based_user_feedback_for_interactive_machine_learning__KI_22__Accepted__(6).pdf}, doi = {https://doi.org/10.1007/978-3-031-15791-2}, year = {2022}, date = {2022-01-01}, booktitle = {KI 2022: Advances in Artificial Intelligence}, pages = {9-16}, publisher = {Springer International Publishing}, abstract = {Interactive Machine Learning (IML) systems incorporate humans into the learning process to enable iterative and continuous model improvements. The interactive process can be designed to leverage the expertise of domain experts with no background in machine learning, for instance, through repeated user feedback requests. However, excessive requests can be perceived as annoying and cumbersome and could reduce user trust. Hence, it is mandatory to establish an efficient dialog between a user and a machine learning system. We aim to detect when a domain expert disagrees with the output of a machine learning system by observing its eye movements and facial expressions. In this paper, we describe our approach for modelling user disagreement and discuss how such a model could be used for triggering user feedback requests in the context of interactive machine learning.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Interactive Machine Learning (IML) systems incorporate humans into the learning process to enable iterative and continuous model improvements. The interactive process can be designed to leverage the expertise of domain experts with no background in machine learning, for instance, through repeated user feedback requests. However, excessive requests can be perceived as annoying and cumbersome and could reduce user trust. Hence, it is mandatory to establish an efficient dialog between a user and a machine learning system. We aim to detect when a domain expert disagrees with the output of a machine learning system by observing its eye movements and facial expressions. In this paper, we describe our approach for modelling user disagreement and discuss how such a model could be used for triggering user feedback requests in the context of interactive machine learning. |
Miscellaneous |
Szeier, Szilvia; Baffy, Benjámin; Baranyi, Gábor; Skaf, Joul; Kopácsi, László; Sonntag, Daniel; Sörös, Gábor; andrincz, András Lő 3D Semantic Label Transfer and Matching in Human-Robot Collaboration Miscellaneous 2022. @misc{12900, title = {3D Semantic Label Transfer and Matching in Human-Robot Collaboration}, author = {Szilvia Szeier and Benjámin Baffy and Gábor Baranyi and Joul Skaf and László Kopácsi and Daniel Sonntag and Gábor Sörös and András Lő andrincz}, url = {https://www.dfki.de/fileadmin/user_upload/import/12900_0003_paper.pdf https://learn3dg.github.io/}, year = {2022}, date = {2022-10-01}, publisher = {Learning to Generate 3D Shapes and Scenes, ECCV 2022 Workshop}, abstract = {Semantic 3D maps are highly useful for human-robot collaboration and joint task planning. We build upon an existing real-time 3D semantic reconstruction pipeline and extend it with semantic matching across human and robot viewpoints, which is required if class labels differ or are missing due to different perspectives during collaborative reconstruction. We use deep recognition networks, which usually perform well from higher (human) viewpoints but are inferior from ground robot viewpoints. Therefore, we propose several approaches for acquiring semantic labels for unusual perspectives. We group the pixels from the lower viewpoint, project voxel class labels of the upper perspective to the lower perspective and apply majority voting to obtain labels for the robot. The quality of the reconstruction is evaluated in the Habitat simulator and in a real environment using a robot car equipped with an RGBD camera. The proposed approach can provide high-quality semantic segmentation from the robot perspective with accuracy similar to the human perspective. Furthermore, as computations are close to real time, the approach enables interactive applications.}, keywords = {}, pubstate = {published}, tppubtype = {misc} } Semantic 3D maps are highly useful for human-robot collaboration and joint task planning. We build upon an existing real-time 3D semantic reconstruction pipeline and extend it with semantic matching across human and robot viewpoints, which is required if class labels differ or are missing due to different perspectives during collaborative reconstruction. We use deep recognition networks, which usually perform well from higher (human) viewpoints but are inferior from ground robot viewpoints. Therefore, we propose several approaches for acquiring semantic labels for unusual perspectives. We group the pixels from the lower viewpoint, project voxel class labels of the upper perspective to the lower perspective and apply majority voting to obtain labels for the robot. The quality of the reconstruction is evaluated in the Habitat simulator and in a real environment using a robot car equipped with an RGBD camera. The proposed approach can provide high-quality semantic segmentation from the robot perspective with accuracy similar to the human perspective. Furthermore, as computations are close to real time, the approach enables interactive applications. |
Anagnostopoulou, Aliki; Hartmann, Mareike; Sonntag, Daniel Putting Humans in the Image Captioning Loop Miscellaneous Bridging Human-Computer Interaction and Natural Language Processing (NAACL 2022), 2022. @misc{12516, title = {Putting Humans in the Image Captioning Loop}, author = {Aliki Anagnostopoulou and Mareike Hartmann and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/12516_5.pdf https://drive.google.com/file/d/1WT1Emfc76Myv_PujMXaWI4ucqF9eegqC/view}, year = {2022}, date = {2022-07-01}, abstract = {Image Captioning (IC) models can highly benefit from human feedback in the training process, especially in cases where data is limited. We present work-in-progress on adapting an IC system to integrate human feedback, with the goal to make it easily adaptable to user-specific data. Our approach builds on a base IC model pre-trained on the MS COCO dataset, which generates captions for unseen images. The user will then be able to offer feedback on the image and the generated/predicted caption, which will be augmented to create additional training instances for the adaptation of the model. The additional instances are integrated into the model using step-wise updates, and a sparse memory replay component is used to avoid catastrophic forgetting. We hope that this approach, while leading to improved results, will also result in customizable IC models.}, howpublished = {Bridging Human-Computer Interaction and Natural Language Processing (NAACL 2022)}, keywords = {}, pubstate = {published}, tppubtype = {misc} } Image Captioning (IC) models can highly benefit from human feedback in the training process, especially in cases where data is limited. We present work-in-progress on adapting an IC system to integrate human feedback, with the goal to make it easily adaptable to user-specific data. Our approach builds on a base IC model pre-trained on the MS COCO dataset, which generates captions for unseen images. The user will then be able to offer feedback on the image and the generated/predicted caption, which will be augmented to create additional training instances for the adaptation of the model. The additional instances are integrated into the model using step-wise updates, and a sparse memory replay component is used to avoid catastrophic forgetting. We hope that this approach, while leading to improved results, will also result in customizable IC models. |
Hartmann, Mareike; Anagnostopoulou, Aliki; Sonntag, Daniel Interactive Machine Learning for Image Captioning Miscellaneous The AAAI-22 Workshop on Interactive Machine Learning, 2022. @misc{12167, title = {Interactive Machine Learning for Image Captioning}, author = {Mareike Hartmann and Aliki Anagnostopoulou and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/12167_interactive_learning_for_image_captioning.pdf}, year = {2022}, date = {2022-02-01}, abstract = {We propose an approach for interactive learning for an image captioning model. As human feedback is expensive and modern neural network based approaches often require large amounts of supervised data to be trained, we envision a system that exploits human feedback as good as possible by multiplying the feedback using data augmentation methods, and integrating the resulting training examples into the model in a smart way. This approach has three key components, for which we need to find suitable practical implementations: feedback collection, data augmentation, and model update. We outline our idea and review different possibilities to address these tasks.}, howpublished = {The AAAI-22 Workshop on Interactive Machine Learning}, keywords = {}, pubstate = {published}, tppubtype = {misc} } We propose an approach for interactive learning for an image captioning model. As human feedback is expensive and modern neural network based approaches often require large amounts of supervised data to be trained, we envision a system that exploits human feedback as good as possible by multiplying the feedback using data augmentation methods, and integrating the resulting training examples into the model in a smart way. This approach has three key components, for which we need to find suitable practical implementations: feedback collection, data augmentation, and model update. We outline our idea and review different possibilities to address these tasks. |
2021 |
Journal Articles |
Sonntag, Daniel Künstliche Intelligenz in der Medizin und Gynäkologie – Holzweg oder Heilversprechen? Journal Article Der Gynäkologe, 1 , pp. 1-7, 2021. @article{11612, title = {Künstliche Intelligenz in der Medizin und Gynäkologie – Holzweg oder Heilversprechen?}, author = {Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11612_sonntag-gyn.pdf}, year = {2021}, date = {2021-04-01}, journal = {Der Gynäkologe}, volume = {1}, pages = {1-7}, publisher = {Springer}, abstract = {Künstliche Intelligenz (KI) hat in den letzten Jahren eine neue Reifephase erreicht und entwickelt sich zum Treiber der Digitalisierung in allen Lebensbereichen. Die KI ist eine Querschnittstechnologie, die für alle Bereiche der Medizin mit Bild‑, Text- und Biodaten von großer Bedeutung ist. Es gibt keinen medizinischen Bereich, der nicht von KI beeinflusst werden wird. Dabei spielt die klinische Entscheidungsunterstützung eine wichtige Rolle. KI-Methoden etablieren sich gerade beim medizinischen Workflow-Management und bei der Vorhersage des Behandlungserfolgs bzw. des Behandlungsergebnisses. KI-Systeme können bereits in Bilddiagnose und im Patientenmanagement unterstützen, aber keine kritischen Entscheidungen vorschlagen. Die jeweiligen Präventions- oder Therapiemaßnahmen können mit KI-Unterstützung sinnvoller bewertet werden, allerdings ist die Abdeckung der Krankheiten noch viel zu gering, um robuste Systeme für den klinischen Alltag zu erstellen. Der flächendeckende Einsatz setzt Fortbildungsmaßnahmen für Ärzte voraus, um die Entscheidung treffen zu können, wann auf automatische Entscheidungsunterstützung vertraut werden kann. Artificial intelligence (AI) has attained a new level of maturity in recent years and is becoming the driver of digitalization in all areas of life. AI is a cross-sectional technology with great importance for all areas of medicine employing image data, text data and bio-data. There is no medical field that will remain unaffected by AI, with AI-assisted clinical decision-making assuming a particularly important role. AI methods are becoming established in medical workflow management and for prediction of treatment success or treatment outcome. AI systems are already able to lend support to imaging-based diagnosis and patient management, but cannot suggest critical decisions. The corresponding preventive or therapeutic measures can be more rationally assessed with the help of AI, although the number of diseases covered is currently too low to create robust systems for routine clinical use. Prerequisite for the widespread use of AI systems is appropriate training to enable physicians to decide when computer-assisted decision-making can be relied upon.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Künstliche Intelligenz (KI) hat in den letzten Jahren eine neue Reifephase erreicht und entwickelt sich zum Treiber der Digitalisierung in allen Lebensbereichen. Die KI ist eine Querschnittstechnologie, die für alle Bereiche der Medizin mit Bild‑, Text- und Biodaten von großer Bedeutung ist. Es gibt keinen medizinischen Bereich, der nicht von KI beeinflusst werden wird. Dabei spielt die klinische Entscheidungsunterstützung eine wichtige Rolle. KI-Methoden etablieren sich gerade beim medizinischen Workflow-Management und bei der Vorhersage des Behandlungserfolgs bzw. des Behandlungsergebnisses. KI-Systeme können bereits in Bilddiagnose und im Patientenmanagement unterstützen, aber keine kritischen Entscheidungen vorschlagen. Die jeweiligen Präventions- oder Therapiemaßnahmen können mit KI-Unterstützung sinnvoller bewertet werden, allerdings ist die Abdeckung der Krankheiten noch viel zu gering, um robuste Systeme für den klinischen Alltag zu erstellen. Der flächendeckende Einsatz setzt Fortbildungsmaßnahmen für Ärzte voraus, um die Entscheidung treffen zu können, wann auf automatische Entscheidungsunterstützung vertraut werden kann. Artificial intelligence (AI) has attained a new level of maturity in recent years and is becoming the driver of digitalization in all areas of life. AI is a cross-sectional technology with great importance for all areas of medicine employing image data, text data and bio-data. There is no medical field that will remain unaffected by AI, with AI-assisted clinical decision-making assuming a particularly important role. AI methods are becoming established in medical workflow management and for prediction of treatment success or treatment outcome. AI systems are already able to lend support to imaging-based diagnosis and patient management, but cannot suggest critical decisions. The corresponding preventive or therapeutic measures can be more rationally assessed with the help of AI, although the number of diseases covered is currently too low to create robust systems for routine clinical use. Prerequisite for the widespread use of AI systems is appropriate training to enable physicians to decide when computer-assisted decision-making can be relied upon. |
Kapp, Sebastian; Barz, Michael; Mukhametov, Sergey; Sonntag, Daniel; Kuhn, Jochen ARETT: Augmented Reality Eye Tracking Toolkit for Head Mounted Displays Journal Article Sensors - Open Access Journal, 21 , pp. 18, 2021. @article{11528, title = {ARETT: Augmented Reality Eye Tracking Toolkit for Head Mounted Displays}, author = {Sebastian Kapp and Michael Barz and Sergey Mukhametov and Daniel Sonntag and Jochen Kuhn}, url = {https://www.dfki.de/fileadmin/user_upload/import/11528_2021_ARETT-_Augmented_Reality_Eye_Tracking_Toolkit_for_Head_Mounted_Displays.pdf https://www.mdpi.com/1424-8220/21/6/2234}, year = {2021}, date = {2021-01-01}, journal = {Sensors - Open Access Journal}, volume = {21}, pages = {18}, publisher = {Multidisciplinary Digital Publishing Institute (MDPI)}, abstract = {Currently an increasing number of head mounted displays (HMD) for virtual and augmented reality (VR/AR) are equipped with integrated eye trackers. Use cases of these integrated eye trackers include rendering optimization and gaze-based user interaction. In addition, visual attention in VR and AR is interesting for applied research based on eye tracking in cognitive or educational sciences for example. While some research toolkits for VR already exist, only a few target AR scenarios. In this work, we present an open-source eye tracking toolkit for reliable gaze data acquisition in AR based on Unity 3D and the Microsoft HoloLens 2, as well as an R package for seamless data analysis. Furthermore, we evaluate the spatial accuracy and precision of the integrated eye tracker for fixation targets with different distances and angles to the user (n=21). On average, we found that gaze estimates are reported with an angular accuracy of 0.83 degrees and a precision of 0.27 degrees while the user is resting, which is on par with state-of-the-art mobile eye trackers.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Currently an increasing number of head mounted displays (HMD) for virtual and augmented reality (VR/AR) are equipped with integrated eye trackers. Use cases of these integrated eye trackers include rendering optimization and gaze-based user interaction. In addition, visual attention in VR and AR is interesting for applied research based on eye tracking in cognitive or educational sciences for example. While some research toolkits for VR already exist, only a few target AR scenarios. In this work, we present an open-source eye tracking toolkit for reliable gaze data acquisition in AR based on Unity 3D and the Microsoft HoloLens 2, as well as an R package for seamless data analysis. Furthermore, we evaluate the spatial accuracy and precision of the integrated eye tracker for fixation targets with different distances and angles to the user (n=21). On average, we found that gaze estimates are reported with an angular accuracy of 0.83 degrees and a precision of 0.27 degrees while the user is resting, which is on par with state-of-the-art mobile eye trackers. |
Somfai, Ellák; Baffy, Benjámin; Fenech, Kristian; Guo, Changlu; Hosszú, Rita; Korózs, Dorina; Nunnari, Fabrizio; Pólik, Marcell; Sonntag, Daniel; Ulbert, Attila; Lorincz, András Minimizing false negative rate in melanoma detection and providing insight into the causes of classification Journal Article Computing Research Repository eprint Journal, abs/2102.09199 , pp. 1-14, 2021. @article{11613, title = {Minimizing false negative rate in melanoma detection and providing insight into the causes of classification}, author = {Ellák Somfai and Benjámin Baffy and Kristian Fenech and Changlu Guo and Rita Hosszú and Dorina Korózs and Fabrizio Nunnari and Marcell Pólik and Daniel Sonntag and Attila Ulbert and András Lorincz}, url = {https://www.dfki.de/fileadmin/user_upload/import/11613_2021_Minimizing_false_negative_rate_in_melanoma_detection_and_providing_insight_into_the_causes_of_classification.pdf https://arxiv.org/abs/2102.09199}, year = {2021}, date = {2021-01-01}, journal = {Computing Research Repository eprint Journal}, volume = {abs/2102.09199}, pages = {1-14}, publisher = {arXiv}, abstract = {Our goal is to bridge human and machine intelligence in melanoma detection. We develop a classification system exploiting a combination of visual pre-processing, deep learning, and ensembling for providing explanations to experts and to minimize false negative rate while maintaining high accuracy in melanoma detection. Source images are first automatically segmented using a U-net CNN. The result of the segmentation is then used to extract image sub-areas and specific parameters relevant in human evaluation, namely center, border, and asymmetry measures. These data are then processed by tailored neural networks which include structure searching algorithms. Partial results are then ensembled by a committee machine. Our evaluation on the largest skin lesion dataset which is publicly available today, ISIC-2019, shows improvement in all evaluated metrics over a baseline using the original images only. We also showed that indicative scores computed by the feature classifiers can provide useful insight into the various features on which the decision can be based.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Our goal is to bridge human and machine intelligence in melanoma detection. We develop a classification system exploiting a combination of visual pre-processing, deep learning, and ensembling for providing explanations to experts and to minimize false negative rate while maintaining high accuracy in melanoma detection. Source images are first automatically segmented using a U-net CNN. The result of the segmentation is then used to extract image sub-areas and specific parameters relevant in human evaluation, namely center, border, and asymmetry measures. These data are then processed by tailored neural networks which include structure searching algorithms. Partial results are then ensembled by a committee machine. Our evaluation on the largest skin lesion dataset which is publicly available today, ISIC-2019, shows improvement in all evaluated metrics over a baseline using the original images only. We also showed that indicative scores computed by the feature classifiers can provide useful insight into the various features on which the decision can be based. |
Barz, Michael; Sonntag, Daniel Automatic Visual Attention Detection for Mobile Eye Tracking Using Pre-Trained Computer Vision Models and Human Gaze Journal Article Sensors - Open Access Journal, 21 , pp. 21, 2021. @article{11668, title = {Automatic Visual Attention Detection for Mobile Eye Tracking Using Pre-Trained Computer Vision Models and Human Gaze}, author = {Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11668_sensors-21-04143-v2.pdf https://www.mdpi.com/1424-8220/21/12/4143}, year = {2021}, date = {2021-01-01}, journal = {Sensors - Open Access Journal}, volume = {21}, pages = {21}, publisher = {MDPI}, abstract = {Processing visual stimuli in a scene is essential for the human brain to make situation-aware decisions. These stimuli, which are prevalent subjects of diagnostic eye tracking studies, are commonly encoded as rectangular areas of interest (AOIs) per frame. Because it is a tedious manual annotation task, the automatic detection and annotation of visual attention to AOIs can accelerate and objectify eye tracking research, in particular for mobile eye tracking with egocentric video feeds. In this work, we implement two methods to automatically detect visual attention to AOIs using pre-trained deep learning models for image classification and object detection. Furthermore, we develop an evaluation framework based on the VISUS dataset and well-known performance metrics from the field of activity recognition. We systematically evaluate our methods within this framework, discuss potentials and limitations, and propose ways to improve the performance of future automatic visual attention detection methods.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Processing visual stimuli in a scene is essential for the human brain to make situation-aware decisions. These stimuli, which are prevalent subjects of diagnostic eye tracking studies, are commonly encoded as rectangular areas of interest (AOIs) per frame. Because it is a tedious manual annotation task, the automatic detection and annotation of visual attention to AOIs can accelerate and objectify eye tracking research, in particular for mobile eye tracking with egocentric video feeds. In this work, we implement two methods to automatically detect visual attention to AOIs using pre-trained deep learning models for image classification and object detection. Furthermore, we develop an evaluation framework based on the VISUS dataset and well-known performance metrics from the field of activity recognition. We systematically evaluate our methods within this framework, discuss potentials and limitations, and propose ways to improve the performance of future automatic visual attention detection methods. |
Lauer, Luisa; Altmeyer, Kristin; Malone, Sarah; Barz, Michael; Brünken, Roland; Sonntag, Daniel; Peschel, Markus Investigating the Usability of a Head-Mounted Display Augmented Reality Device in Elementary School Children Journal Article Sensors - Open Access Journal, 21 , pp. 20, 2021. @article{11866, title = {Investigating the Usability of a Head-Mounted Display Augmented Reality Device in Elementary School Children}, author = {Luisa Lauer and Kristin Altmeyer and Sarah Malone and Michael Barz and Roland Brünken and Daniel Sonntag and Markus Peschel}, url = {https://www.dfki.de/fileadmin/user_upload/import/11866_sensors-21-06623.pdf https://www.mdpi.com/1424-8220/21/19/6623}, year = {2021}, date = {2021-01-01}, journal = {Sensors - Open Access Journal}, volume = {21}, pages = {20}, publisher = {MDPI}, abstract = {Augmenting reality via head-mounted displays (HMD-AR) is an emerging technology in education. The interactivity provided by HMD-AR devices is particularly promising for learning, but presents a challenge to human activity recognition, especially with children. Recent technological advances regarding speech and gesture recognition concerning Microsoft’s HoloLens 2 may address this prevailing issue. In a within-subjects study with 47 elementary school children (2nd to 6th grade), we examined the usability of the HoloLens 2 using a standardized tutorial on multimodal interaction in AR. The overall system usability was rated “good”. However, several behavioral metrics indicated that specific interaction modes differed in their efficiency. The results are of major importance for the development of learning applications in HMD-AR as they partially deviate from previous findings. In particular, the well-functioning recognition of children’s voice commands that we observed represents a novelty. Furthermore, we found different interaction preferences in HMD-AR among the children. We also found the use of HMD-AR to have a positive effect on children’s activity-related achievement emotions. Overall, our findings can serve as a basis for determining general requirements, possibilities, and limitations of the implementation of educational HMD-AR environments in elementary school classrooms.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Augmenting reality via head-mounted displays (HMD-AR) is an emerging technology in education. The interactivity provided by HMD-AR devices is particularly promising for learning, but presents a challenge to human activity recognition, especially with children. Recent technological advances regarding speech and gesture recognition concerning Microsoft’s HoloLens 2 may address this prevailing issue. In a within-subjects study with 47 elementary school children (2nd to 6th grade), we examined the usability of the HoloLens 2 using a standardized tutorial on multimodal interaction in AR. The overall system usability was rated “good”. However, several behavioral metrics indicated that specific interaction modes differed in their efficiency. The results are of major importance for the development of learning applications in HMD-AR as they partially deviate from previous findings. In particular, the well-functioning recognition of children’s voice commands that we observed represents a novelty. Furthermore, we found different interaction preferences in HMD-AR among the children. We also found the use of HMD-AR to have a positive effect on children’s activity-related achievement emotions. Overall, our findings can serve as a basis for determining general requirements, possibilities, and limitations of the implementation of educational HMD-AR environments in elementary school classrooms. |
Incollections |
Barz, Michael; Sonntag, Daniel Incremental Improvement of a Question Answering System by Re-ranking Answer Candidates Using Machine Learning Incollection Marchi, Erik; Siniscalchi, Sabato Marco; Cumani, Sandro; Salerno, Valerio Mario; Li, Haizhou (Ed.): Increasing Naturalness and Flexibility in Spoken Dialogue Interaction: 10th International Workshop on Spoken Dialogue Systems, pp. 367-379, Springer, 2021. @incollection{11522, title = {Incremental Improvement of a Question Answering System by Re-ranking Answer Candidates Using Machine Learning}, author = {Michael Barz and Daniel Sonntag}, editor = {Erik Marchi and Sabato Marco Siniscalchi and Sandro Cumani and Valerio Mario Salerno and Haizhou Li}, url = {https://www.dfki.de/fileadmin/user_upload/import/11522_2019_Incremental_Improvement_of_a_Question_Answering_System_by_Re-ranking_Answer_Candidates_using_Machine_Learning.pdf}, doi = {https://doi.org/10.1007/978-981-15-9323-9_34}, year = {2021}, date = {2021-01-01}, booktitle = {Increasing Naturalness and Flexibility in Spoken Dialogue Interaction: 10th International Workshop on Spoken Dialogue Systems}, pages = {367-379}, publisher = {Springer}, abstract = {We implement a method for re-ranking top-10 results of a state-of-the-art question answering (QA) system. The goal of our re-ranking approach is to improve the answer selection given the user question and the top-10 candidates. We focus on improving deployed QA systems that do not allow re-training or when re-training comes at a high cost. Our re-ranking approach learns a similarity function using n-gram based features using the query, the answer and the initial system confidence as input. Our contributions are: (1) we generate a QA training corpus starting from 877 answers from the customer care domain of T-Mobile Austria, (2) we implement a state-of-the-art QA pipeline using neural sentence embeddings that encode queries in the same space than the answer index, and (3) we evaluate the QA pipeline and our re-ranking approach using a separately provided test set. The test set can be considered to be available after deployment of the system, e.g., based on feedback of users. Our results show that the system performance, in terms of top-n accuracy and the mean reciprocal rank, benefits from re-ranking using gradient boosted regression trees. On average, the mean reciprocal rank improves by 9.15%9.15%9.15textbackslash%.}, keywords = {}, pubstate = {published}, tppubtype = {incollection} } We implement a method for re-ranking top-10 results of a state-of-the-art question answering (QA) system. The goal of our re-ranking approach is to improve the answer selection given the user question and the top-10 candidates. We focus on improving deployed QA systems that do not allow re-training or when re-training comes at a high cost. Our re-ranking approach learns a similarity function using n-gram based features using the query, the answer and the initial system confidence as input. Our contributions are: (1) we generate a QA training corpus starting from 877 answers from the customer care domain of T-Mobile Austria, (2) we implement a state-of-the-art QA pipeline using neural sentence embeddings that encode queries in the same space than the answer index, and (3) we evaluate the QA pipeline and our re-ranking approach using a separately provided test set. The test set can be considered to be available after deployment of the system, e.g., based on feedback of users. Our results show that the system performance, in terms of top-n accuracy and the mean reciprocal rank, benefits from re-ranking using gradient boosted regression trees. On average, the mean reciprocal rank improves by 9.15%9.15%9.15textbackslash%. |
Inproceedings |
Biswas, Rajarshi; Barz, Michael; Hartmann, Mareike; Sonntag, Daniel Improving German Image Captions using Machine Translation and Transfer Learning Inproceedings Espinosa-Anke, Luis; Martin-Vide, Carlos; Spasic, Irena (Ed.): Statistical Language and Speech Processing SLSP 2021, Springer, Council Chamber Glamorgan Building King Edward VII Ave Cathays Park Cardiff CF10 3WT, 2021. @inproceedings{11805, title = {Improving German Image Captions using Machine Translation and Transfer Learning}, author = {Rajarshi Biswas and Michael Barz and Mareike Hartmann and Daniel Sonntag}, editor = {Luis Espinosa-Anke and Carlos Martin-Vide and Irena Spasic}, url = {https://www.dfki.de/fileadmin/user_upload/import/11805_SLSP2021Paper.pdf}, year = {2021}, date = {2021-11-01}, booktitle = {Statistical Language and Speech Processing SLSP 2021}, publisher = {Springer}, address = {Council Chamber Glamorgan Building King Edward VII Ave Cathays Park Cardiff CF10 3WT}, abstract = {Image captioning is a complex artificial intelligence task that involves many fundamental questions of data representation, learning, and natural language processing. In addition, most of the work in this domain addresses the English language because of the high availability of annotated training data compared to other languages. Therefore, we investigate methods for image captioning in German that transfer knowledge from English training data. We explore four different methods for generating image captions in German, two baseline methods and two more advanced ones based on transfer learning. The baseline methods are based on a state-of-the-art model which we train using a translated version of the English MS COCO dataset and the smaller German Multi30K dataset, respectively. Both advanced methods are pre-trained using the translated MS COCO dataset and fine-tuned for German on the Multi30K dataset. One of these methods uses an alternative attention mechanism from the literature that showed a good performance in English image captioning. We compare the performance of all methods for the Multi30K test set in German using common automatic evaluation metrics. We show that our advanced method with the alternative attention mechanism presents a new baseline for German BLEU, ROUGE, CIDEr, and SPICE scores, and achieves a relative improvement of 21.2 % in BLEU-4 score compared to the current state-of-the-art in German image captioning.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Image captioning is a complex artificial intelligence task that involves many fundamental questions of data representation, learning, and natural language processing. In addition, most of the work in this domain addresses the English language because of the high availability of annotated training data compared to other languages. Therefore, we investigate methods for image captioning in German that transfer knowledge from English training data. We explore four different methods for generating image captions in German, two baseline methods and two more advanced ones based on transfer learning. The baseline methods are based on a state-of-the-art model which we train using a translated version of the English MS COCO dataset and the smaller German Multi30K dataset, respectively. Both advanced methods are pre-trained using the translated MS COCO dataset and fine-tuned for German on the Multi30K dataset. One of these methods uses an alternative attention mechanism from the literature that showed a good performance in English image captioning. We compare the performance of all methods for the Multi30K test set in German using common automatic evaluation metrics. We show that our advanced method with the alternative attention mechanism presents a new baseline for German BLEU, ROUGE, CIDEr, and SPICE scores, and achieves a relative improvement of 21.2 % in BLEU-4 score compared to the current state-of-the-art in German image captioning. |
Hartmann, Mareike; de Lhoneux, Miryam; Hershcovich, Daniel; Kementchedjhieva, Yova; Nielsen, Lukas; Qiu, Chen; Søgaard, Anders A Multilingual Benchmark for Probing Negation-Awareness with Minimal Pairs Inproceedings Proceedings of the 25th Conference on Computational Natural Language Learning (CoNLL), pp. 224-257, Association for Computational Linguistics, 2021. @inproceedings{11846, title = {A Multilingual Benchmark for Probing Negation-Awareness with Minimal Pairs}, author = {Mareike Hartmann and Miryam de Lhoneux and Daniel Hershcovich and Yova Kementchedjhieva and Lukas Nielsen and Chen Qiu and Anders Søgaard}, url = {https://www.dfki.de/fileadmin/user_upload/import/11846_2021.conll-1.19.pdf https://aclanthology.org/2021.conll-1.19/}, year = {2021}, date = {2021-11-01}, booktitle = {Proceedings of the 25th Conference on Computational Natural Language Learning (CoNLL)}, pages = {224-257}, publisher = {Association for Computational Linguistics}, abstract = {Negation is one of the most fundamental concepts in human cognition and language, and several natural language inference (NLI) probes have been designed to investigate pretrained language models' ability to detect and reason with negation. However, the existing probing datasets are limited to English only, and do not enable controlled probing of performance in the absence or presence of negation. In response, we present a multilingual (English, Bulgarian, German, French and Chinese) benchmark collection of NLI examples that are grammatical and correctly labeled, as a result of manual inspection and editing. We use the benchmark to probe the negation-awareness of multilingual language models and find that models that correctly predict examples with negation cues often fail to correctly predict their counter-examples em without negation cues, even when the cues are irrelevant for semantic inference.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Negation is one of the most fundamental concepts in human cognition and language, and several natural language inference (NLI) probes have been designed to investigate pretrained language models' ability to detect and reason with negation. However, the existing probing datasets are limited to English only, and do not enable controlled probing of performance in the absence or presence of negation. In response, we present a multilingual (English, Bulgarian, German, French and Chinese) benchmark collection of NLI examples that are grammatical and correctly labeled, as a result of manual inspection and editing. We use the benchmark to probe the negation-awareness of multilingual language models and find that models that correctly predict examples with negation cues often fail to correctly predict their counter-examples em without negation cues, even when the cues are irrelevant for semantic inference. |
Jørgensen, Rasmus Kær; Hartmann, Mareike; Dai, Xiang; Elliott, Desmond mDAPT: Multilingual Domain Adaptive Pretraining in a Single Model Inproceedings Findings of the Association for Computational Linguistics - EMNLP 2021, pp. 3404-3018, Association for Computational Linguistics, 2021. @inproceedings{11845, title = {mDAPT: Multilingual Domain Adaptive Pretraining in a Single Model}, author = {Rasmus Kær Jørgensen and Mareike Hartmann and Xiang Dai and Desmond Elliott}, url = {https://www.dfki.de/fileadmin/user_upload/import/11845_2021.findings-emnlp.290.pdf}, year = {2021}, date = {2021-11-01}, booktitle = {Findings of the Association for Computational Linguistics - EMNLP 2021}, volume = {1}, pages = {3404-3018}, publisher = {Association for Computational Linguistics}, abstract = {Domain adaptive pretraining, i.e. the continued unsupervised pretraining of a language model on domain-specific text, improves the modelling of text for downstream tasks within the domain. Numerous real-world applications are based on domain-specific text, e.g. working with financial or biomedical documents, and these applications often need to support multiple languages. However, large-scale domain-specific multilingual pretraining data for such scenarios can be difficult to obtain, due to regulations, legislation, or simply a lack of language- and domain-specific text. One solution is to train a single multilingual model, taking advantage of the data available in as many languages as possible. In this work, we explore the benefits of domain adaptive pretraining with a focus on adapting to multiple languages within a specific domain. We propose different techniques to compose pretraining corpora that enable a language model to both become domain-specific and multilingual. Evaluation on nine domain-specific datasets---for biomedical named entity recognition and financial sentence classification---covering seven different languages show that a single multilingual domain-specific model can outperform the general multilingual model, and performs close to its monolingual counterpart. This finding holds across two different pretraining methods, adapter-based pretraining and full model pretraining.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Domain adaptive pretraining, i.e. the continued unsupervised pretraining of a language model on domain-specific text, improves the modelling of text for downstream tasks within the domain. Numerous real-world applications are based on domain-specific text, e.g. working with financial or biomedical documents, and these applications often need to support multiple languages. However, large-scale domain-specific multilingual pretraining data for such scenarios can be difficult to obtain, due to regulations, legislation, or simply a lack of language- and domain-specific text. One solution is to train a single multilingual model, taking advantage of the data available in as many languages as possible. In this work, we explore the benefits of domain adaptive pretraining with a focus on adapting to multiple languages within a specific domain. We propose different techniques to compose pretraining corpora that enable a language model to both become domain-specific and multilingual. Evaluation on nine domain-specific datasets---for biomedical named entity recognition and financial sentence classification---covering seven different languages show that a single multilingual domain-specific model can outperform the general multilingual model, and performs close to its monolingual counterpart. This finding holds across two different pretraining methods, adapter-based pretraining and full model pretraining. |
Erlemeyer, Fabian; Rehtanz, Christian; Hermanns, Annegret; Lüers, Bengt; Nebel-Wenner, Marvin; Eilers, Reef Janes Live Testing of Flexibilities on Distribution Grid Level – Simulation Setup and Lessons Learned Inproceedings IEEE Electric Power and Energy Conference, IEEE Xplore, IEEE Operations Center 445 Hoes Lane Piscataway, NJ 08854-4141 USA Phone: +1 732 981 0060, 2021. @inproceedings{11927, title = {Live Testing of Flexibilities on Distribution Grid Level – Simulation Setup and Lessons Learned}, author = {Fabian Erlemeyer and Christian Rehtanz and Annegret Hermanns and Bengt Lüers and Marvin Nebel-Wenner and Reef Janes Eilers}, url = {https://www.dfki.de/fileadmin/user_upload/import/11927_2021199998.pdf}, year = {2021}, date = {2021-10-01}, booktitle = {IEEE Electric Power and Energy Conference}, publisher = {IEEE Xplore}, address = {IEEE Operations Center 445 Hoes Lane Piscataway, NJ 08854-4141 USA Phone: +1 732 981 0060}, abstract = {In the DESIGNETZ project real flexibility units were connected to a distribution grid simulation to investigate the integration of decentralized flexibilities for different use-cases. The simulation determines the demand for unit flexibility and communicates the demand to the flexibilities. In return, the response of the flexibilities is integrated back into the simulation to consider not-simulated effects, too. This paper presents the simulation setup and discusses lessons learnt from deploying the simulation into operation.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } In the DESIGNETZ project real flexibility units were connected to a distribution grid simulation to investigate the integration of decentralized flexibilities for different use-cases. The simulation determines the demand for unit flexibility and communicates the demand to the flexibilities. In return, the response of the flexibilities is integrated back into the simulation to consider not-simulated effects, too. This paper presents the simulation setup and discusses lessons learnt from deploying the simulation into operation. |
Barz, Michael; Kapp, Sebastian; Kuhn, Jochen; Sonntag, Daniel Automatic Recognition and Augmentation of Attended Objects in Real-Time Using Eye Tracking and a Head-Mounted Display Inproceedings ACM Symposium on Eye Tracking Research and Applications, pp. 4, Association for Computing Machinery, 2021. @inproceedings{11614, title = {Automatic Recognition and Augmentation of Attended Objects in Real-Time Using Eye Tracking and a Head-Mounted Display}, author = {Michael Barz and Sebastian Kapp and Jochen Kuhn and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11614_etra_ar_video.pdf}, doi = {https://doi.org/10.1145/3450341.3458766}, year = {2021}, date = {2021-05-01}, booktitle = {ACM Symposium on Eye Tracking Research and Applications}, pages = {4}, publisher = {Association for Computing Machinery}, abstract = {Scanning and processing visual stimuli in a scene is essential for the human brain to make situation-aware decisions. Adding the ability to observe the scanning behavior and scene processing to intelligent mobile user interfaces can facilitate a new class of cognition-aware user interfaces. As a first step in this direction, we implement an augmented reality (AR) system that classifies objects at the user’s point of regard, detects visual attention to them, and augments the real objects with virtual labels that stick to the objects in real-time. We use a head-mounted AR device (Microsoft HoloLens 2) with integrated eye tracking capabilities and a front-facing camera for implementing our prototype.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Scanning and processing visual stimuli in a scene is essential for the human brain to make situation-aware decisions. Adding the ability to observe the scanning behavior and scene processing to intelligent mobile user interfaces can facilitate a new class of cognition-aware user interfaces. As a first step in this direction, we implement an augmented reality (AR) system that classifies objects at the user’s point of regard, detects visual attention to them, and augments the real objects with virtual labels that stick to the objects in real-time. We use a head-mounted AR device (Microsoft HoloLens 2) with integrated eye tracking capabilities and a front-facing camera for implementing our prototype. |
Nguyen, Ho Minh Duy; Nguyen, Duy M; Vu, Huong; Nguyen, Binh T; Nunnari, Fabrizio; Sonntag, Daniel An Attention Mechanism using Multiple Knowledge Sources for COVID-19 Detection from CT Images Inproceedings The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), AAAI, 2021. @inproceedings{11369, title = {An Attention Mechanism using Multiple Knowledge Sources for COVID-19 Detection from CT Images}, author = {Ho Minh Duy Nguyen and Duy M Nguyen and Huong Vu and Binh T Nguyen and Fabrizio Nunnari and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11369_AAAI_Workshop_TrustworthyHealthcare_v3.pdf}, year = {2021}, date = {2021-01-01}, booktitle = {The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)}, publisher = {AAAI}, abstract = {Besides principal polymerase chain reaction (PCR) tests, automatically identifying positive samples based on computed tomography (CT) scans can present a promising option in the early diagnosis of COVID-19. Recently, there have been increasing efforts to utilize deep networks for COVID-19 diagnosis based on CT scans. While these approaches mostly focus on introducing novel architectures, transfer learning techniques or construction of large scale data, we propose a novel strategy to improve several performance baselines by leveraging multiple useful information sources relevant to doctors' judgments. Specifically, infected regions and heat-map features extracted from learned networks are integrated with the global image via an attention mechanism during the learning process. This procedure makes our system more robust to noise and guides the network focusing on local lesion areas. Extensive experiments illustrate the superior performance of our approach compared to recent baselines. Furthermore, our learned network guidance presents an explainable feature to doctors to understand the connection between input and output in a grey-box model.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Besides principal polymerase chain reaction (PCR) tests, automatically identifying positive samples based on computed tomography (CT) scans can present a promising option in the early diagnosis of COVID-19. Recently, there have been increasing efforts to utilize deep networks for COVID-19 diagnosis based on CT scans. While these approaches mostly focus on introducing novel architectures, transfer learning techniques or construction of large scale data, we propose a novel strategy to improve several performance baselines by leveraging multiple useful information sources relevant to doctors' judgments. Specifically, infected regions and heat-map features extracted from learned networks are integrated with the global image via an attention mechanism during the learning process. This procedure makes our system more robust to noise and guides the network focusing on local lesion areas. Extensive experiments illustrate the superior performance of our approach compared to recent baselines. Furthermore, our learned network guidance presents an explainable feature to doctors to understand the connection between input and output in a grey-box model. |
Prange, Alexander; Barz, Michael; Heimann-Steinert, Anika; Sonntag, Daniel Explainable Automatic Evaluation of the Trail Making Test for Dementia Screening Inproceedings Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, 2021. @inproceedings{11432, title = {Explainable Automatic Evaluation of the Trail Making Test for Dementia Screening}, author = {Alexander Prange and Michael Barz and Anika Heimann-Steinert and Daniel Sonntag}, doi = {https://doi.org/10.1145/3411764.3445046}, year = {2021}, date = {2021-01-01}, booktitle = {Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems}, publisher = {Association for Computing Machinery}, abstract = {The Trail Making Test (TMT) is a frequently used neuropsychological test for assessing cognitive performance. The subject connects a sequence of numbered nodes by using a pen on normal paper. We present an automatic cognitive assessment tool that analyzes samples of the TMT which we record using a digital pen. This enables us to analyze digital pen features that are difficult or impossible to evaluate manually. Our system automatically measures several pen features, including the completion time which is the main performance indicator used by clinicians to score the TMT in practice. In addition, our system provides a structured report of the analysis of the test, for example indicating missed or erroneously connected nodes, thereby offering more objective, transparent and explainable results to the clinician. We evaluate our system with 40 elderly subjects from a geriatrics daycare clinic of a large hospital.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The Trail Making Test (TMT) is a frequently used neuropsychological test for assessing cognitive performance. The subject connects a sequence of numbered nodes by using a pen on normal paper. We present an automatic cognitive assessment tool that analyzes samples of the TMT which we record using a digital pen. This enables us to analyze digital pen features that are difficult or impossible to evaluate manually. Our system automatically measures several pen features, including the completion time which is the main performance indicator used by clinicians to score the TMT in practice. In addition, our system provides a structured report of the analysis of the test, for example indicating missed or erroneously connected nodes, thereby offering more objective, transparent and explainable results to the clinician. We evaluate our system with 40 elderly subjects from a geriatrics daycare clinic of a large hospital. |
Bhatti, Omair Shahzad; Barz, Michael; Sonntag, Daniel EyeLogin - Calibration-Free Authentication Method for Public Displays Using Eye Gaze Inproceedings ACM Symposium on Eye Tracking Research and Applications, Association for Computing Machinery, 2021. @inproceedings{11616, title = {EyeLogin - Calibration-Free Authentication Method for Public Displays Using Eye Gaze}, author = {Omair Shahzad Bhatti and Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11616_EyeLogin.pdf}, doi = {https://doi.org/10.1145/3448018.3458001}, year = {2021}, date = {2021-01-01}, booktitle = {ACM Symposium on Eye Tracking Research and Applications}, publisher = {Association for Computing Machinery}, abstract = {The usage of interactive public displays has increased including the number of sensitive applications and, hence, the demand for user authentication methods. In this context, gaze-based authentication was shown to be effective and more secure, but significantly slower than touch- or gesture-based methods. We implement a calibration-free and fast authentication method for situated displays based on saccadic eye movements. In a user study (n = 10), we compare our new method with CueAuth from Khamis et al. (IMWUT’18), an authentication method based on smooth pursuit eye movements. The results show a significant improvement in accuracy from 82.94% to 95.88%. At the same time, we found that the entry speed can be increased enormously with our method, on average, 18.28s down to 5.12s, which is comparable to touch-based input.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The usage of interactive public displays has increased including the number of sensitive applications and, hence, the demand for user authentication methods. In this context, gaze-based authentication was shown to be effective and more secure, but significantly slower than touch- or gesture-based methods. We implement a calibration-free and fast authentication method for situated displays based on saccadic eye movements. In a user study (n = 10), we compare our new method with CueAuth from Khamis et al. (IMWUT’18), an authentication method based on smooth pursuit eye movements. The results show a significant improvement in accuracy from 82.94% to 95.88%. At the same time, we found that the entry speed can be increased enormously with our method, on average, 18.28s down to 5.12s, which is comparable to touch-based input. |
Nunnari, Fabrizio; Sonntag, Daniel A Software Toolbox for Deploying Deep Learning Decision Support Systems with XAI Capabilities Inproceedings Companion of the 2021 ACM SIGCHI Symposium on Engineering Interactive Computing Systems, Association for Computing Machinery, 2021. @inproceedings{11664, title = {A Software Toolbox for Deploying Deep Learning Decision Support Systems with XAI Capabilities}, author = {Fabrizio Nunnari and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11664_nunnari21EICS-TIML.pdf}, doi = {https://doi.org/10.1145/3459926.3464753}, year = {2021}, date = {2021-01-01}, booktitle = {Companion of the 2021 ACM SIGCHI Symposium on Engineering Interactive Computing Systems}, publisher = {Association for Computing Machinery}, abstract = {We describe the software architecture of a toolbox of reusable components for the configuration of convolutional neural networks (CNNs) for classification and labeling problems. The toolbox architecture has been designed to maximize the reuse of established algorithms and to include domain experts in the development and evaluation process across different projects and challenges. In addition, we implemented easy-to-edit input formats and modules for XAI (eXplainable AI) through visual inspection capabilities. The toolbox is available for the research community to implement applied artificial intelligence projects.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We describe the software architecture of a toolbox of reusable components for the configuration of convolutional neural networks (CNNs) for classification and labeling problems. The toolbox architecture has been designed to maximize the reuse of established algorithms and to include domain experts in the development and evaluation process across different projects and challenges. In addition, we implemented easy-to-edit input formats and modules for XAI (eXplainable AI) through visual inspection capabilities. The toolbox is available for the research community to implement applied artificial intelligence projects. |
Prange, Alexander; Sonntag, Daniel Assessing Cognitive Test Performance Using Automatic Digital Pen Features Analysis Inproceedings Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization, Association for Computing Machinery, 2021. @inproceedings{11703, title = {Assessing Cognitive Test Performance Using Automatic Digital Pen Features Analysis}, author = {Alexander Prange and Daniel Sonntag}, year = {2021}, date = {2021-01-01}, booktitle = {Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization}, publisher = {Association for Computing Machinery}, abstract = {Most cognitive assessments, for dementia screening for example, are conducted with a pen on normal paper. We record these tests with a digital pen as part of a new interactive cognitive assessment tool with automatic analysis of pen input. The clinician can, first, observe the sketching process in real-time on a mobile tablet, e.g., in telemedicine settings or to follow Covid-19 distancing regulations. Second, the results of an automatic test analysis are presented to the clinician in real-time, thereby reducing manual scoring effort and producing objective reports. The presented research describes the architecture of our cognitive assessment tool and examines how accurately different machine learning (ML) models can automatically score cognitive tests, without a semantic content analysis. Our system uses a set of more than 170 pen features, calculated directly from the raw digital pen signal. We evaluate our system with 40 subjects from a geriatrics daycare clinic. Using standard ML techniques our feature set outperforms previous approaches on the cognitive tests we consider, i.e., the Clock Drawing, the Rey-Osterrieth Complex Figure, and the Trail Making Test, by automatically scoring tests with up to 82% accuracy in a binary classification task.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Most cognitive assessments, for dementia screening for example, are conducted with a pen on normal paper. We record these tests with a digital pen as part of a new interactive cognitive assessment tool with automatic analysis of pen input. The clinician can, first, observe the sketching process in real-time on a mobile tablet, e.g., in telemedicine settings or to follow Covid-19 distancing regulations. Second, the results of an automatic test analysis are presented to the clinician in real-time, thereby reducing manual scoring effort and producing objective reports. The presented research describes the architecture of our cognitive assessment tool and examines how accurately different machine learning (ML) models can automatically score cognitive tests, without a semantic content analysis. Our system uses a set of more than 170 pen features, calculated directly from the raw digital pen signal. We evaluate our system with 40 subjects from a geriatrics daycare clinic. Using standard ML techniques our feature set outperforms previous approaches on the cognitive tests we consider, i.e., the Clock Drawing, the Rey-Osterrieth Complex Figure, and the Trail Making Test, by automatically scoring tests with up to 82% accuracy in a binary classification task. |
Nguyen, Ho Minh Duy; Mai, Truong Thanh-Nhat; Than, Ngoc Trong Tuong; Prange, Alexander; Sonntag, Daniel Self-Supervised Domain Adaptation for Diabetic Retinopathy Grading using Vessel Image Reconstruction Inproceedings Proceedings of the 44th German Conference on Artificial Intelligence, Springer, 2021. @inproceedings{11715, title = {Self-Supervised Domain Adaptation for Diabetic Retinopathy Grading using Vessel Image Reconstruction}, author = {Ho Minh Duy Nguyen and Truong Thanh-Nhat Mai and Ngoc Trong Tuong Than and Alexander Prange and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11715_KI_2021_Self_Supervised_Domain_Adaptation_for_Diabetic_Retinopathy_Grading.pdf}, year = {2021}, date = {2021-01-01}, booktitle = {Proceedings of the 44th German Conference on Artificial Intelligence}, publisher = {Springer}, abstract = {This paper investigates the problem of domain adaptation for diabetic retinopathy (DR) grading. We learn invariant target-domain features by defining a novel self-supervised task based on retinal vessel image reconstructions, inspired by medical domain knowledge. Then, a benchmark of current state-of-the-art unsupervised domain adaptation methods on the DR problem is provided. It can be shown that our approach outperforms existing domain adaption strategies. Furthermore, when utilizing entire training data in the target domain, we are able to compete with several state-of-the-art approaches in final classification accuracy just by applying standard network architectures and using image-level labels.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } This paper investigates the problem of domain adaptation for diabetic retinopathy (DR) grading. We learn invariant target-domain features by defining a novel self-supervised task based on retinal vessel image reconstructions, inspired by medical domain knowledge. Then, a benchmark of current state-of-the-art unsupervised domain adaptation methods on the DR problem is provided. It can be shown that our approach outperforms existing domain adaption strategies. Furthermore, when utilizing entire training data in the target domain, we are able to compete with several state-of-the-art approaches in final classification accuracy just by applying standard network architectures and using image-level labels. |
Nunnari, Fabrizio; Kadir, Md Abdul; Sonntag, Daniel On the Overlap Between Grad-CAM Saliency Maps and Explainable Visual Features in Skin Cancer Images Inproceedings Holzinger, Andreas; Kieseberg, Peter; Tjoa, Min A; Weippl, Edgar (Ed.): Machine Learning and Knowledge Extraction, pp. 241-253, Springer International Publishing, 2021. @inproceedings{11802, title = {On the Overlap Between Grad-CAM Saliency Maps and Explainable Visual Features in Skin Cancer Images}, author = {Fabrizio Nunnari and Md Abdul Kadir and Daniel Sonntag}, editor = {Andreas Holzinger and Peter Kieseberg and Min A Tjoa and Edgar Weippl}, url = {https://www.dfki.de/fileadmin/user_upload/import/11802_2021_CD_MAKE_XAI_and_SkinFeatures.pdf}, doi = {https://doi.org/10.1007/978-3-030-84060-0_16}, year = {2021}, date = {2021-01-01}, booktitle = {Machine Learning and Knowledge Extraction}, volume = {12844}, pages = {241-253}, publisher = {Springer International Publishing}, abstract = {Dermatologists recognize melanomas by inspecting images in which they identify human-comprehensible visual features. In this paper, we investigate to what extent such features correspond to the saliency areas identified on CNNs trained for classification. Our experiments, conducted on two neural architectures characterized by different depth and different resolution of the last convolutional layer, quantify to what extent thresholded Grad-CAM saliency maps can be used to identify visual features of skin cancer. We found that the best threshold value, i.e., the threshold at which we can measure the highest Jaccard index, varies significantly among features; ranging from 0.3 to 0.7. In addition, we measured Jaccard indices as high as 0.143, which is almost 50% of the performance of state-of-the-art architectures specialized in feature mask prediction at pixel-level, such as U-Net. Finally, a breakdown test between malignancy and classification correctness shows that higher resolution saliency maps could help doctors in spotting wrong classifications.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Dermatologists recognize melanomas by inspecting images in which they identify human-comprehensible visual features. In this paper, we investigate to what extent such features correspond to the saliency areas identified on CNNs trained for classification. Our experiments, conducted on two neural architectures characterized by different depth and different resolution of the last convolutional layer, quantify to what extent thresholded Grad-CAM saliency maps can be used to identify visual features of skin cancer. We found that the best threshold value, i.e., the threshold at which we can measure the highest Jaccard index, varies significantly among features; ranging from 0.3 to 0.7. In addition, we measured Jaccard indices as high as 0.143, which is almost 50% of the performance of state-of-the-art architectures specialized in feature mask prediction at pixel-level, such as U-Net. Finally, a breakdown test between malignancy and classification correctness shows that higher resolution saliency maps could help doctors in spotting wrong classifications. |
Nunnari, Fabrizio; Alam, Hasan Md Tusfiqur; Sonntag, Daniel Anomaly Detection for Skin Lesion Images Using Replicator Neural Networks Inproceedings Holzinger, Andreas; Kieseberg, Peter; Tjoa, Min A; Weippl, Edgar (Ed.): Machine Learning and Knowledge Extraction, pp. 225-240, Springer International Publishing, 2021. @inproceedings{11803, title = {Anomaly Detection for Skin Lesion Images Using Replicator Neural Networks}, author = {Fabrizio Nunnari and Hasan Md Tusfiqur Alam and Daniel Sonntag}, editor = {Andreas Holzinger and Peter Kieseberg and Min A Tjoa and Edgar Weippl}, url = {https://www.dfki.de/fileadmin/user_upload/import/11803_2021_CD_MAKE_AnomalyDetection.pdf}, doi = {https://doi.org/10.1007/978-3-030-84060-0_15}, year = {2021}, date = {2021-01-01}, booktitle = {Machine Learning and Knowledge Extraction}, volume = {12844}, pages = {225-240}, publisher = {Springer International Publishing}, abstract = {This paper presents an investigation on the task of anomaly detection for images of skin lesions. The goal is to provide a decision support system with an extra filtering layer to inform users if a classifier should not be used for a given sample. We tested anomaly detectors based on autoencoders and three discrimination methods: feature vector distance, replicator neural networks, and support vector data description fine-tuning. Results show that neural-based detectors can perfectly discriminate between skin lesions and open world images, but class discrimination cannot easily be accomplished and requires further investigation.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } This paper presents an investigation on the task of anomaly detection for images of skin lesions. The goal is to provide a decision support system with an extra filtering layer to inform users if a classifier should not be used for a given sample. We tested anomaly detectors based on autoencoders and three discrimination methods: feature vector distance, replicator neural networks, and support vector data description fine-tuning. Results show that neural-based detectors can perfectly discriminate between skin lesions and open world images, but class discrimination cannot easily be accomplished and requires further investigation. |
Nunnari, Fabrizio; Ezema, Abraham; Sonntag, Daniel Crop It, but Not Too Much: The Effects of Masking on the Classification of Melanoma Images Inproceedings Edelkamp, Stefan; Rueckert, Elmar; Möller, Ralf (Ed.): KI 2021: Advances in Artificial Intelligence, pp. 179-193, Springer International Publishing, 2021. @inproceedings{11859, title = {Crop It, but Not Too Much: The Effects of Masking on the Classification of Melanoma Images}, author = {Fabrizio Nunnari and Abraham Ezema and Daniel Sonntag}, editor = {Stefan Edelkamp and Elmar Rueckert and Ralf Möller}, url = {https://www.dfki.de/fileadmin/user_upload/import/11859_2021_KIconference_SkinLesionMasking.pdf https://link.springer.com/chapter/10.1007/978-3-030-87626-5_13}, year = {2021}, date = {2021-01-01}, booktitle = {KI 2021: Advances in Artificial Intelligence}, pages = {179-193}, publisher = {Springer International Publishing}, abstract = {To improve the accuracy of convolutional neural networks in discriminating between nevi and melanomas, we test nine different combinations of masking and cropping on three datasets of skin lesion images (ISIC2016, ISIC2018, and MedNode). Our experiments, confirmed by 10-fold cross-validation, show that cropping increases classification performances, but specificity decreases when cropping is applied together with masking out healthy skin regions. An analysis of Grad-CAM saliency maps shows that in fact our CNN models have the tendency to focus on healthy skin at the border when a nevus is classified.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } To improve the accuracy of convolutional neural networks in discriminating between nevi and melanomas, we test nine different combinations of masking and cropping on three datasets of skin lesion images (ISIC2016, ISIC2018, and MedNode). Our experiments, confirmed by 10-fold cross-validation, show that cropping increases classification performances, but specificity decreases when cropping is applied together with masking out healthy skin regions. An analysis of Grad-CAM saliency maps shows that in fact our CNN models have the tendency to focus on healthy skin at the border when a nevus is classified. |
Prange, Alexander; Sonntag, Daniel A Demonstrator for Interactive Image Clustering and Fine-Tuning Neural Networks in Virtual Reality Inproceedings Edelkamp, Stefan; Rueckert, Elmar; Möller, Ralf (Ed.): KI 2021: Advances in Artificial Intelligence, pp. 194-203, Springer International Publishing, 2021. @inproceedings{11886, title = {A Demonstrator for Interactive Image Clustering and Fine-Tuning Neural Networks in Virtual Reality}, author = {Alexander Prange and Daniel Sonntag}, editor = {Stefan Edelkamp and Elmar Rueckert and Ralf Möller}, url = {https://link.springer.com/chapter/10.1007/978-3-030-87626-5_14}, year = {2021}, date = {2021-01-01}, booktitle = {KI 2021: Advances in Artificial Intelligence}, pages = {194-203}, publisher = {Springer International Publishing}, abstract = {We present a virtual reality (VR) application that enables us to interactively explore and manipulate image clusters based on layer activations of convolutional neural networks (CNNs). We apply dimensionality reduction techniques to project images into the 3D space, where the user can directly interact with the model. The user can change the position of an image by using natural hand gestures. This manipulation triggers additional training steps of the network, based on the new spatial information and new label of the image. After the training step is finished, the visualization is updated according to the new output of the CNN. The goal is to visualize and improve the cluster output of the model, and at the same time, to improve the understanding of the model. We discuss two different approaches for calculating the VR projection, a combined PCA/t-SNE dimensionality reduction based approach and a variational auto-encoder (VAE) based approach.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We present a virtual reality (VR) application that enables us to interactively explore and manipulate image clusters based on layer activations of convolutional neural networks (CNNs). We apply dimensionality reduction techniques to project images into the 3D space, where the user can directly interact with the model. The user can change the position of an image by using natural hand gestures. This manipulation triggers additional training steps of the network, based on the new spatial information and new label of the image. After the training step is finished, the visualization is updated according to the new output of the CNN. The goal is to visualize and improve the cluster output of the model, and at the same time, to improve the understanding of the model. We discuss two different approaches for calculating the VR projection, a combined PCA/t-SNE dimensionality reduction based approach and a variational auto-encoder (VAE) based approach. |
Barz, Michael; Bhatti, Omair Shahzad; Lüers, Bengt; Prange, Alexander; Sonntag, Daniel Multisensor-Pipeline: A Lightweight, Flexible, and Extensible Framework for Building Multimodal-Multisensor Interfaces Inproceedings Companion Publication of the 2021 International Conference on Multimodal Interaction, pp. 13-18, Association for Computing Machinery, 2021. @inproceedings{11981, title = {Multisensor-Pipeline: A Lightweight, Flexible, and Extensible Framework for Building Multimodal-Multisensor Interfaces}, author = {Michael Barz and Omair Shahzad Bhatti and Bengt Lüers and Alexander Prange and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11981_icmi_cr.pdf}, year = {2021}, date = {2021-01-01}, booktitle = {Companion Publication of the 2021 International Conference on Multimodal Interaction}, pages = {13-18}, publisher = {Association for Computing Machinery}, abstract = {We present the multisensor-pipeline (MSP), a lightweight, flexible, and extensible framework for prototyping multimodal-multisensor interfaces based on real-time sensor input. Our open-source framework (available on GitHub) enables researchers and developers to easily integrate multiple sensors or other data streams via source modules, to add stream and event processing capabilities via processor modules, and to connect user interfaces or databases via sink modules in a graph-based processing pipeline. Our framework is implemented in Python with a low number of dependencies, which enables a quick setup process, execution across multiple operating systems, and direct access to cutting-edge machine learning libraries and models. We showcase the functionality and capabilities of MSP through a sample application that connects a mobile eye tracker to classify image patches surrounding the user’s fixation points and visualizes the classification results in real-time.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We present the multisensor-pipeline (MSP), a lightweight, flexible, and extensible framework for prototyping multimodal-multisensor interfaces based on real-time sensor input. Our open-source framework (available on GitHub) enables researchers and developers to easily integrate multiple sensors or other data streams via source modules, to add stream and event processing capabilities via processor modules, and to connect user interfaces or databases via sink modules in a graph-based processing pipeline. Our framework is implemented in Python with a low number of dependencies, which enables a quick setup process, execution across multiple operating systems, and direct access to cutting-edge machine learning libraries and models. We showcase the functionality and capabilities of MSP through a sample application that connects a mobile eye tracker to classify image patches surrounding the user’s fixation points and visualizes the classification results in real-time. |
Miscellaneous |
Hartmann, Mareike; Kruijff-Korbayová, Ivana; Sonntag, Daniel Interaction with Explanations in the XAINES Project Miscellaneous Trustworthy AI in the Wild Workshop 2021, 2021. @misc{11867, title = {Interaction with Explanations in the XAINES Project}, author = {Mareike Hartmann and Ivana Kruijff-Korbayová and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11867_AI_in_the_wild__Xaines.pdf}, year = {2021}, date = {2021-09-01}, booktitle = {Trustworthy AI in the wild}, publisher = {-}, abstract = {AI systems are increasingly pervasive and their large-scale adoption makes it necessary to explain their behaviour, for example to their users who are impacted by their decisions, or to their developers who need to ensure their functionality. This requires, on the one hand, to obtain an accurate representation of the chain of events that caused the system to behave in a certain way (e.g., to make a specific decision). On the other hand, this causal chain needs to be communicated to the users depending on their needs and expectations. In this phase of explanation delivery, allowing interaction between user and model has the potential to improve both model quality and user experience. In this abstract, we present our planned and on-going work on the interaction with explanations as part of the XAINES project. The project investigates the explanation of AI systems through narratives targeted to the needs of a specific audience, and our work focuses on the question of how and in which way human-model interaction can enable successful explanation.}, howpublished = {Trustworthy AI in the Wild Workshop 2021}, keywords = {}, pubstate = {published}, tppubtype = {misc} } AI systems are increasingly pervasive and their large-scale adoption makes it necessary to explain their behaviour, for example to their users who are impacted by their decisions, or to their developers who need to ensure their functionality. This requires, on the one hand, to obtain an accurate representation of the chain of events that caused the system to behave in a certain way (e.g., to make a specific decision). On the other hand, this causal chain needs to be communicated to the users depending on their needs and expectations. In this phase of explanation delivery, allowing interaction between user and model has the potential to improve both model quality and user experience. In this abstract, we present our planned and on-going work on the interaction with explanations as part of the XAINES project. The project investigates the explanation of AI systems through narratives targeted to the needs of a specific audience, and our work focuses on the question of how and in which way human-model interaction can enable successful explanation. |
Malone, Sarah; Altmeyer, Kristin; Barz, Michael; Lauer, Luisa; Sonntag, Daniel; Peschel, Markus; Brünken, Roland Measuring Intrisic and Extraneous Cognitive Load in Elementary School Students Using Subjective Ratings and Smart Pen Data Miscellaneous 13th International Cognitive Load Theory Conference, 2021. @misc{11868, title = {Measuring Intrisic and Extraneous Cognitive Load in Elementary School Students Using Subjective Ratings and Smart Pen Data}, author = {Sarah Malone and Kristin Altmeyer and Michael Barz and Luisa Lauer and Daniel Sonntag and Markus Peschel and Roland Brünken}, url = {https://www.dfki.de/fileadmin/user_upload/import/11868_Cl_measurement_in_children.pdf}, year = {2021}, date = {2021-01-01}, abstract = {New methods are constantly being developed to optimize and adapt cognitive load measurement to different contexts (Korbach et al., 2018). It is noteworthy, however, that research on cognitive load measurement in elementary school students is rare. Although there is some evidence that they might be able to report their total cognitive load (Ayres, 2006), there are also reasons to doubt the quality of children’s self-reports (e.g., Chambers & Johnson, 2002). To avoid these issues, behavioral and objective online-measures are promising. A novel approach – the use of smartpen data generated by natural use of a pen during task completion – seems particularly encouraging as these measures proved to be predictive of cognitive load in adults (e.g., Yu, Epps, & Chen, 2011). Moreover, Barz et al. (2020) demonstrated the predictive power of smartpen data for performance in children. The present research addressed two prevailing gaps in research on cognitive load assessment in elementary school students. We developed a subjective rating scale and investigated whether this instrument can provide valid measurements of ICL and ECL (Research Question 1). Moreover, we researched whether smartpen data can be used as a valid process measurement of cognitive load (Research Question 2).}, howpublished = {13th International Cognitive Load Theory Conference}, keywords = {}, pubstate = {published}, tppubtype = {misc} } New methods are constantly being developed to optimize and adapt cognitive load measurement to different contexts (Korbach et al., 2018). It is noteworthy, however, that research on cognitive load measurement in elementary school students is rare. Although there is some evidence that they might be able to report their total cognitive load (Ayres, 2006), there are also reasons to doubt the quality of children’s self-reports (e.g., Chambers & Johnson, 2002). To avoid these issues, behavioral and objective online-measures are promising. A novel approach – the use of smartpen data generated by natural use of a pen during task completion – seems particularly encouraging as these measures proved to be predictive of cognitive load in adults (e.g., Yu, Epps, & Chen, 2011). Moreover, Barz et al. (2020) demonstrated the predictive power of smartpen data for performance in children. The present research addressed two prevailing gaps in research on cognitive load assessment in elementary school students. We developed a subjective rating scale and investigated whether this instrument can provide valid measurements of ICL and ECL (Research Question 1). Moreover, we researched whether smartpen data can be used as a valid process measurement of cognitive load (Research Question 2). |
Altmeyer, Kristin; Malone, Sarah; Kapp, Sebastian; Barz, Michael; Lauer, Luisa; Thees, Michael; Kuhn, Jochen; Peschel, Markus; Sonntag, Daniel; Brünken, Roland 13th International Cognitive Load Theory Conference, 2021. @misc{11870, title = {The effect of augmented reality on global coherence formation processes during STEM laboratory work in elementary school children}, author = {Kristin Altmeyer and Sarah Malone and Sebastian Kapp and Michael Barz and Luisa Lauer and Michael Thees and Jochen Kuhn and Markus Peschel and Daniel Sonntag and Roland Brünken}, url = {https://www.dfki.de/fileadmin/user_upload/import/11870_ICLTC_2021_Altmeyer_final.pdf}, year = {2021}, date = {2021-01-01}, abstract = {In science education, hands-on student experiments are used to explore cause and effect relationships. Conventional lab work requires students to interact with physical experimentation objects and observe additional information like measurement values to deduce scientific laws and interrelations. The observable information, however, are usually presented in physical distance to the setting, e.g., on a separate display of a measuring device. The resulting spatial split (Chandler & Sweller, 1991) between representations hampers global coherence formation (Seufert & Brünken, 2004): Mapping processes between the spatially distant sources of information are assumed to lead to an increase in extraneous cognitive load (ECL; Ayres & Sweller, 2014). Consequently, learning outcomes can be impaired (Kalyuga et al., 1999). Augmented Reality (AR) can be used to overcome the split-attention effect by allowing additional information to be virtually integrated into the real-world set-up (Azuma, 1997). A study by Altmeyer et al. (2020) with university students showed that AR-support during experimentation led to a higher conceptual knowledge gain but had no effect on ECL. The current study provides a conceptual replication of Altmeyer et al.’s (2020) research and focuses on three main objectives: First, we aimed at investigating the generalizability of the advantage of AR on experimental learning in a sample of elementary school children. Second, we examined if low prior-knowledge of children even amplifies the split-attention effect, as proposed by Kalyuga et al. (1998). Finally, we focused on obtaining deeper insights into global coherence formation processes during lab work using specific tests and eye tracking measures.}, howpublished = {13th International Cognitive Load Theory Conference}, keywords = {}, pubstate = {published}, tppubtype = {misc} } In science education, hands-on student experiments are used to explore cause and effect relationships. Conventional lab work requires students to interact with physical experimentation objects and observe additional information like measurement values to deduce scientific laws and interrelations. The observable information, however, are usually presented in physical distance to the setting, e.g., on a separate display of a measuring device. The resulting spatial split (Chandler & Sweller, 1991) between representations hampers global coherence formation (Seufert & Brünken, 2004): Mapping processes between the spatially distant sources of information are assumed to lead to an increase in extraneous cognitive load (ECL; Ayres & Sweller, 2014). Consequently, learning outcomes can be impaired (Kalyuga et al., 1999). Augmented Reality (AR) can be used to overcome the split-attention effect by allowing additional information to be virtually integrated into the real-world set-up (Azuma, 1997). A study by Altmeyer et al. (2020) with university students showed that AR-support during experimentation led to a higher conceptual knowledge gain but had no effect on ECL. The current study provides a conceptual replication of Altmeyer et al.’s (2020) research and focuses on three main objectives: First, we aimed at investigating the generalizability of the advantage of AR on experimental learning in a sample of elementary school children. Second, we examined if low prior-knowledge of children even amplifies the split-attention effect, as proposed by Kalyuga et al. (1998). Finally, we focused on obtaining deeper insights into global coherence formation processes during lab work using specific tests and eye tracking measures. |
Altmeyer, Kristin; Malone, Sarah; Kapp, Sebastian; Barz, Michael; Lauer, Luisa; Thees, Michael; Kuhn, Jochen; Peschel, Markus; Sonntag, Daniel; Brünken, Roland Augmented Reality zur Förderung globaler Kohärenzbildungsprozesse beim Experimentieren im Sachunterricht Miscellaneous Tagung der Fachgruppe Pädagogische Psychologie, 2021. @misc{11871, title = {Augmented Reality zur Förderung globaler Kohärenzbildungsprozesse beim Experimentieren im Sachunterricht}, author = {Kristin Altmeyer and Sarah Malone and Sebastian Kapp and Michael Barz and Luisa Lauer and Michael Thees and Jochen Kuhn and Markus Peschel and Daniel Sonntag and Roland Brünken}, url = {https://www.dfki.de/fileadmin/user_upload/import/11871_v3_Altmeyer_VR_Symposium_PAEPSY_2021.pdf}, year = {2021}, date = {2021-01-01}, abstract = {Augmented Reality (AR) lässt sich als eine Form virtueller Umgebungen auf einem Realitäts-Virtualitäts-Kontinuum (Milgram & Kishino, 1994) der gemischten Realität zuordnen. AR erweitert die Realität durch die Integration virtueller Objekte. Ein vielversprechendes Anwendungsgebiet für AR im Bildungsbereich bietet das technologiegestützte Experimentieren: Experimente bilden ein wesentliches Merkmal der Naturwissenschaften und werden im MINT-Unterricht eingesetzt, um Zusammenhänge zu untersuchen. Bisherige Forschung deutet darauf hin, dass bereits Kinder im Grundschulalter (natur)wissenschaftliches Denken und die Fähigkeit zum Experimentieren entwickeln können (z.B. Osterhaus et al., 2015). Um Ursache-Wirkung-Beziehungen aus einem Experiment abzuleiten, müssen Lernende meist reale Informationen der Experimentierumgebung mit virtuellen Informationen, wie z.B. Messwerten auf Messwertdisplays, mental verknüpfen. Im Sinne der Cognitive Theory of Multimedia Learning (Mayer, 2005) und der Cognitive Load Theory (Sweller et al., 1998) stellt die Verknüpfung räumlich getrennter Informationen eine besondere Herausforderung an das Arbeitsgedächtnis dar. AR kann dazu genutzt werden, reale und virtuelle Informationen beim Experimentieren integriert darzustellen. Vorausgehende Studienergebnisse (z.B. Altmeyer et al., 2020) implizieren, dass AR die globale Kohärenzbildung (Seufert & Brünken, 2004) unterstützt und zu besseren Lernergebnissen führen kann (Altmeyer et. al., 2020). In der vorliegenden Studie wurde der Effekt von AR-Unterstützung beim Experimentieren in einer Stichprobe von Grundschulkindern untersucht. Nach einem Vorwissenstest führten 59 Kinder Experimente zu elektrischen Schaltkreisen durch. Einer Gruppe wurden Echzeit-Messwerte für die Stromstärke in einer Tabelle auf einem separaten Tabletbildschirm präsentiert. Dagegen sah die AR-unterstützte Gruppe die Messwerte beim Blick durch eine Tabletkamera in die Experimentierumgebung integriert. Während des Experimentierens wurden die Blickbewegungen der Kinder erfasst. Danach bearbeiteten beide Gruppen Posttests, welche in ihren Anforderungen an die globale Kohärenzbildung zwischen realen und virtuellen Elementen beim Experimentieren variierten. Erste Ergebnisse zeigen, dass Kinder insbesondere hinsichtlich Aufgaben, die eine starke globale Kohärenz erfordern, von der AR-Umgebung profitieren. Blickbewegungsanalysen sollen weitere Aufschlüsse über den Prozess der Kohärenzbildung während des Experimentierens in AR geben.}, howpublished = {Tagung der Fachgruppe Pädagogische Psychologie}, keywords = {}, pubstate = {published}, tppubtype = {misc} } Augmented Reality (AR) lässt sich als eine Form virtueller Umgebungen auf einem Realitäts-Virtualitäts-Kontinuum (Milgram & Kishino, 1994) der gemischten Realität zuordnen. AR erweitert die Realität durch die Integration virtueller Objekte. Ein vielversprechendes Anwendungsgebiet für AR im Bildungsbereich bietet das technologiegestützte Experimentieren: Experimente bilden ein wesentliches Merkmal der Naturwissenschaften und werden im MINT-Unterricht eingesetzt, um Zusammenhänge zu untersuchen. Bisherige Forschung deutet darauf hin, dass bereits Kinder im Grundschulalter (natur)wissenschaftliches Denken und die Fähigkeit zum Experimentieren entwickeln können (z.B. Osterhaus et al., 2015). Um Ursache-Wirkung-Beziehungen aus einem Experiment abzuleiten, müssen Lernende meist reale Informationen der Experimentierumgebung mit virtuellen Informationen, wie z.B. Messwerten auf Messwertdisplays, mental verknüpfen. Im Sinne der Cognitive Theory of Multimedia Learning (Mayer, 2005) und der Cognitive Load Theory (Sweller et al., 1998) stellt die Verknüpfung räumlich getrennter Informationen eine besondere Herausforderung an das Arbeitsgedächtnis dar. AR kann dazu genutzt werden, reale und virtuelle Informationen beim Experimentieren integriert darzustellen. Vorausgehende Studienergebnisse (z.B. Altmeyer et al., 2020) implizieren, dass AR die globale Kohärenzbildung (Seufert & Brünken, 2004) unterstützt und zu besseren Lernergebnissen führen kann (Altmeyer et. al., 2020). In der vorliegenden Studie wurde der Effekt von AR-Unterstützung beim Experimentieren in einer Stichprobe von Grundschulkindern untersucht. Nach einem Vorwissenstest führten 59 Kinder Experimente zu elektrischen Schaltkreisen durch. Einer Gruppe wurden Echzeit-Messwerte für die Stromstärke in einer Tabelle auf einem separaten Tabletbildschirm präsentiert. Dagegen sah die AR-unterstützte Gruppe die Messwerte beim Blick durch eine Tabletkamera in die Experimentierumgebung integriert. Während des Experimentierens wurden die Blickbewegungen der Kinder erfasst. Danach bearbeiteten beide Gruppen Posttests, welche in ihren Anforderungen an die globale Kohärenzbildung zwischen realen und virtuellen Elementen beim Experimentieren variierten. Erste Ergebnisse zeigen, dass Kinder insbesondere hinsichtlich Aufgaben, die eine starke globale Kohärenz erfordern, von der AR-Umgebung profitieren. Blickbewegungsanalysen sollen weitere Aufschlüsse über den Prozess der Kohärenzbildung während des Experimentierens in AR geben. |
Technical Reports |
Profitlich, Hans-Jürgen; Sonntag, Daniel BMBF Bundesministerium für Bildung und Forschung Kapelle-Ufer 1 D-10117 Berlin, , 2021. @techreport{11611, title = {A Case Study on Pros and Cons of Regular Expression Detection and Dependency Parsing for Negation Extraction from German Medical Documents. Technical Report}, author = {Hans-Jürgen Profitlich and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11611_CaseStudy_TR_final.pdf http://arxiv.org/abs/2105.09702}, year = {2021}, date = {2021-05-01}, volume = {1}, pages = {30}, address = {Bundesministerium für Bildung und Forschung Kapelle-Ufer 1 D-10117 Berlin}, institution = {BMBF}, abstract = {We describe our work on information extraction in medical documents written in German, especially detecting negations using an architecture based on the UIMA pipeline. Based on our previous work on software modules to cover medical concepts like diagnoses, examinations, etc. we employ a version of the NegEx regular expression algorithm with a large set of triggers as a baseline. We show how a significantly smaller trigger set is sufficient to achieve similar results, in order to reduce adaptation times to new text types. We elaborate on the question whether dependency parsing (based on the Stanford CoreNLP model) is a good alternative and describe the potentials and shortcomings of both approaches.}, keywords = {}, pubstate = {published}, tppubtype = {techreport} } We describe our work on information extraction in medical documents written in German, especially detecting negations using an architecture based on the UIMA pipeline. Based on our previous work on software modules to cover medical concepts like diagnoses, examinations, etc. we employ a version of the NegEx regular expression algorithm with a large set of triggers as a baseline. We show how a significantly smaller trigger set is sufficient to achieve similar results, in order to reduce adaptation times to new text types. We elaborate on the question whether dependency parsing (based on the Stanford CoreNLP model) is a good alternative and describe the potentials and shortcomings of both approaches. |
2020 |
Journal Articles |
Biswas, Rajarshi; Barz, Michael; Sonntag, Daniel Towards Explanatory Interactive Image Captioning Using Top-Down and Bottom-Up Features, Beam Search and Re-ranking Journal Article KI - Künstliche Intelligenz, German Journal on Artificial Intelligence - Organ des Fachbereiches "Künstliche Intelligenz" der Gesellschaft für Informatik e.V., 36 , pp. 1-14, 2020. @article{11236, title = {Towards Explanatory Interactive Image Captioning Using Top-Down and Bottom-Up Features, Beam Search and Re-ranking}, author = {Rajarshi Biswas and Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11236_2021_TOWARDS_EXPLANATORY_INTERACTIVE_IMAGE_CAPTIONING_USING_TOP-DOWN_AND_BOTTOM-UP_FEATURES,_BEAM_SEARCH_AND_RE-RANKING.pdf}, doi = {https://doi.org/10.1007/s13218-020-00679-2}, year = {2020}, date = {2020-07-01}, journal = {KI - Künstliche Intelligenz, German Journal on Artificial Intelligence - Organ des Fachbereiches "Künstliche Intelligenz" der Gesellschaft für Informatik e.V.}, volume = {36}, pages = {1-14}, publisher = {Springer}, abstract = {Image captioning is a challenging multimodal task. Significant improvements could be obtained by deep learning. Yet, captions generated by humans are still considered better, which makes it an interesting application for interactive machine learning and explainable artificial intelligence methods. In this work, we aim at improving the performance and explainability of the state-of-the-art method Show, Attend and Tell by augmenting their attention mechanism using additional bottom-up features. We compute visual attention on the joint embedding space formed by the union of high-level features and the low-level features obtained from the object specific salient regions of the input image. We embed the content of bounding boxes from a pre-trained Mask R-CNN model. This delivers state-of-the-art performance, while it provides explanatory features. Further, we discuss how interactive model improvement can be realized through re-ranking caption candidates using beam search decoders and explanatory features. We show that interactive re-ranking of beam search candidates has the potential to outperform the state-of-the-art in image captioning.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Image captioning is a challenging multimodal task. Significant improvements could be obtained by deep learning. Yet, captions generated by humans are still considered better, which makes it an interesting application for interactive machine learning and explainable artificial intelligence methods. In this work, we aim at improving the performance and explainability of the state-of-the-art method Show, Attend and Tell by augmenting their attention mechanism using additional bottom-up features. We compute visual attention on the joint embedding space formed by the union of high-level features and the low-level features obtained from the object specific salient regions of the input image. We embed the content of bounding boxes from a pre-trained Mask R-CNN model. This delivers state-of-the-art performance, while it provides explanatory features. Further, we discuss how interactive model improvement can be realized through re-ranking caption candidates using beam search decoders and explanatory features. We show that interactive re-ranking of beam search candidates has the potential to outperform the state-of-the-art in image captioning. |
Heimann-Steinert, A; Latendorf, A; Prange, Alexander; Sonntag, Daniel; Müller-Werdan, U Digital pen technology for conducting cognitive assessments: a cross-over study with older adults Journal Article Psychological Research, 85 , pp. 1-9, 2020. @article{11374, title = {Digital pen technology for conducting cognitive assessments: a cross-over study with older adults}, author = {A Heimann-Steinert and A Latendorf and Alexander Prange and Daniel Sonntag and U Müller-Werdan}, url = {https://www.dfki.de/fileadmin/user_upload/import/11374_Heimann-Steinert-2020-DigitalPenTechnologyForConduct.pdf https://link.springer.com/article/10.1007/s00426-020-01452-8#citeas}, year = {2020}, date = {2020-01-01}, journal = {Psychological Research}, volume = {85}, pages = {1-9}, publisher = {Springer}, abstract = {Many digitalized cognitive assessments exist to increase reliability, standardization, and objectivity. Particularly in older adults, the performance of digitized cognitive assessments can lead to poorer test results if they are unfamiliar with the computer, mouse, keyboard, or touch screen. In a cross-over design study, 40 older adults (age M = 74.4 ± 4.1 years) conducted the Trail Making Test A and B with a digital pen (digital pen tests, DPT) and a regular pencil (pencil tests, PT) to identify differences in performance. Furthermore, the tests conducted with a digital pen were analyzed manually (manual results, MR) and electronically (electronic results, ER) by an automized system algorithm to determine the possibilities of digital pen evaluation. ICC(2,k) showed a good level of agreement for TMT A (ICC(2,k) = 0.668) and TMT B (ICC(2,k) = 0.734) between PT and DPT. When comparing MR and ER, ICC(2,k) showed an excellent level of agreement in TMT A (ICC(2,k) = 0.999) and TMT B (ICC(2,k) = 0.994). The frequency of pen lifting correlates significantly with the execution time in TMT A (r = 0.372, p = 0.030) and TMT B (r = 0.567, p < 0.001). A digital pen can be used to perform the Trail Making Test, as it has been shown that there is no difference in the results due to the type of pen used. With a digital pen, the advantages of digitized testing can be used without having to accept the disadvantages.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Many digitalized cognitive assessments exist to increase reliability, standardization, and objectivity. Particularly in older adults, the performance of digitized cognitive assessments can lead to poorer test results if they are unfamiliar with the computer, mouse, keyboard, or touch screen. In a cross-over design study, 40 older adults (age M = 74.4 ± 4.1 years) conducted the Trail Making Test A and B with a digital pen (digital pen tests, DPT) and a regular pencil (pencil tests, PT) to identify differences in performance. Furthermore, the tests conducted with a digital pen were analyzed manually (manual results, MR) and electronically (electronic results, ER) by an automized system algorithm to determine the possibilities of digital pen evaluation. ICC(2,k) showed a good level of agreement for TMT A (ICC(2,k) = 0.668) and TMT B (ICC(2,k) = 0.734) between PT and DPT. When comparing MR and ER, ICC(2,k) showed an excellent level of agreement in TMT A (ICC(2,k) = 0.999) and TMT B (ICC(2,k) = 0.994). The frequency of pen lifting correlates significantly with the execution time in TMT A (r = 0.372, p = 0.030) and TMT B (r = 0.567, p < 0.001). A digital pen can be used to perform the Trail Making Test, as it has been shown that there is no difference in the results due to the type of pen used. With a digital pen, the advantages of digitized testing can be used without having to accept the disadvantages. |
Inproceedings |
Nunnari, Fabrizio; Ezema, Abraham; Sonntag, Daniel The effects of masking in melanoma image classification with CNNs towards international standards for image preprocessing Inproceedings 2020 EAI International Symposium on Medical Artificial Intelligence, EAI, 2020. @inproceedings{11368, title = {The effects of masking in melanoma image classification with CNNs towards international standards for image preprocessing}, author = {Fabrizio Nunnari and Abraham Ezema and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11368_2020_EAI_MedAI_StudyOnDatasetBias.pdf}, year = {2020}, date = {2020-12-01}, booktitle = {2020 EAI International Symposium on Medical Artificial Intelligence}, publisher = {EAI}, abstract = {The classification of skin lesion images is known to be biased by artifacts of the surrounding skin, but it is still not clear to what extent masking out healthy skin pixels influences classification performances, and why. To better understand this phenomenon, we apply different strategies of image masking (rectangular masks, circular masks, full masking, and image cropping) to three datasets of skin lesion images (ISIC2016, ISIC2018, and MedNode). We train CNN-based classifiers, provide performance metrics through a 10-fold cross-validation, and analyse the behaviour of Grad-CAM saliency maps through an automated visual inspection. Our experiments show that cropping is the best strategy to maintain classification performance and to significantly re- duce training times as well. Our analysis through visual inspection shows that CNNs have the tendency to focus on pixels of healthy skin when no malignant features can be identified. This suggests that CNNs have the tendency of "eagerly" looking for pixel areas to justify a classification choice, potentially leading to biased discriminators. To mitigate this effect, and to standardize image preprocessing, we suggest to crop images during dataset construction or before the learning step.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The classification of skin lesion images is known to be biased by artifacts of the surrounding skin, but it is still not clear to what extent masking out healthy skin pixels influences classification performances, and why. To better understand this phenomenon, we apply different strategies of image masking (rectangular masks, circular masks, full masking, and image cropping) to three datasets of skin lesion images (ISIC2016, ISIC2018, and MedNode). We train CNN-based classifiers, provide performance metrics through a 10-fold cross-validation, and analyse the behaviour of Grad-CAM saliency maps through an automated visual inspection. Our experiments show that cropping is the best strategy to maintain classification performance and to significantly re- duce training times as well. Our analysis through visual inspection shows that CNNs have the tendency to focus on pixels of healthy skin when no malignant features can be identified. This suggests that CNNs have the tendency of "eagerly" looking for pixel areas to justify a classification choice, potentially leading to biased discriminators. To mitigate this effect, and to standardize image preprocessing, we suggest to crop images during dataset construction or before the learning step. |
Nguyen, Ho Minh Duy; Ezema, Abraham; Nunnari, Fabrizio; Sonntag, Daniel A Visually Explainable Learning System for Skin Lesion Detection Using Multiscale Input with Attention U-Net Inproceedings KI 2020: Advances in Artificial Intelligence, pp. 313-319, Springer, 2020. @inproceedings{11178, title = {A Visually Explainable Learning System for Skin Lesion Detection Using Multiscale Input with Attention U-Net}, author = {Ho Minh Duy Nguyen and Abraham Ezema and Fabrizio Nunnari and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11178_KI_2020.pdf https://link.springer.com/chapter/10.1007/978-3-030-58285-2_28}, year = {2020}, date = {2020-09-01}, booktitle = {KI 2020: Advances in Artificial Intelligence}, volume = {12325}, pages = {313-319}, publisher = {Springer}, abstract = {In this work, we propose a new approach to automatically predict the locations of visual dermoscopic attributes for Task 2 of the ISIC 2018 Challenge. Our method is based on the Attention U-Net with multi-scale images as input. We apply a new strategy based on transfer learning, i.e., training the deep network for feature extraction by adapting the weights of the network trained for segmentation. Our tests show that, first, the proposed algorithm is on par or outperforms the best ISIC 2018 architectures (LeHealth and NMN) in the extraction of two visual features. Secondly, it uses only 1/30 of the training parameters; we observed less computation and memory requirements, which are particularly useful for future implementations on mobile devices. Finally, our approach generates visually explainable behaviour with uncertainty estimations to help doctors in diagnosis and treatment decisions.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } In this work, we propose a new approach to automatically predict the locations of visual dermoscopic attributes for Task 2 of the ISIC 2018 Challenge. Our method is based on the Attention U-Net with multi-scale images as input. We apply a new strategy based on transfer learning, i.e., training the deep network for feature extraction by adapting the weights of the network trained for segmentation. Our tests show that, first, the proposed algorithm is on par or outperforms the best ISIC 2018 architectures (LeHealth and NMN) in the extraction of two visual features. Secondly, it uses only 1/30 of the training parameters; we observed less computation and memory requirements, which are particularly useful for future implementations on mobile devices. Finally, our approach generates visually explainable behaviour with uncertainty estimations to help doctors in diagnosis and treatment decisions. |
2022 |
Journal Articles |
The Effect of Auditory-Motor Synchronization in Exergames on the Example of the VR Rhythm Game BeatSaber Journal Article Proceedings of the ACM on Human-Computer Interaction, 6 , pp. 1-26, 2022. |
Inproceedings |
Fine-tuning BERT Models for Summarizing German Radiology Findings Inproceedings Naumann, Tristan; Bethard, Steven; Roberts, Kirk; Rumshisky, Anna (Ed.): Proceedings of the 4th Clinical Natural Language Processing Workshop, Association for Computational Linguistics, 2022. |
Cross-lingual German Biomedical Information Extraction: from Zero-shot to Human-in-the-Loop Inproceedings 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, 2022. |
Carina – A Corpus of Aligned German Read Speech Including Annotations Inproceedings ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6157-6161, Institute of Electrical and Electronics Engineers (IEEE), 2022. |
SpiderClip: Towards an Open Source System for Wearable Device Simulation in Virtual Reality Inproceedings CHI EA '22: Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, 2022. |
Interactive Assessment Tool for Gaze-based Machine Learning Models in Information Retrieval Inproceedings ACM SIGIR Conference on Human Information Interaction and Retrieval, Association for Computing Machinery, 2022. |
Encountering Students' Learning Difficulties in Electrics - Didactical Concept and Prototype of Augmented Reality-Toolkit Inproceedings Fostering scientific citizenship in an uncertain world - ESERA 2021 e-Proceedings, University of Minho, 2022. |
LMGP: Lifted Multicut Meets Geometry Projections for Multi-Camera Multi-Object Tracking Inproceedings Conference on Computer Vision and Pattern Recognition (CVPR) 2022, IEEE/CVF, 2022. |
Annotating sound events through interactive design of interpretable features Inproceedings Proceedings of the First International Conference on Hybrid Human-Machine Intelligence, IOS Press, 2022. |
Interactive design of interpretable features for marine soundscape data annotation Inproceedings Workshop on Human-centered Design of Symbiotic Hybrid Intelligence, HHAI, 2022. |
A survey on improving NLP models with human explanations Inproceedings Proceedings of the First Workshop on Learning with Natural Language Supervision, Association for Computational Linguistics, 2022. |
Development and Validation of a German Version of the Player Experience Inventory (PXI) Inproceedings Proceedings of the Mensch und Computer Conference, ACM, 2022. |
Improving Silent Speech BCI Training Procedures through Transfer from Overt to Silent Speech Inproceedings Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, IEEE, 2022. |
SpellInk: Interactive correction of spelling mistakes in handwritten text Inproceedings Proceedings of the First International Conference on Hybrid Human-Machine Intelligence, pp. 278-280, IOS Press, De Boelelaan 1105, 1081 HV Amsterdam, Netherlands, 2022. |
pEncode: A Tool for Visualizing Pen Signal Encodings in Real-time Inproceedings Proceedings of the First International Conference on Hybrid Human-Machine Intelligence, pp. 281-284, IOS Press, De Boelelaan 1105, 1081 HV Amsterdam, Netherlands, 2022. |
Leveraging Implicit Gaze-Based User Feedback for Interactive Machine Learning Inproceedings Rodermund, Stephanie C; Timm, Ingo J; Malburg, Lukas; Bergmann, Ralph (Ed.): KI 2022: Advances in Artificial Intelligence, pp. 9-16, Springer International Publishing, 2022. |
Miscellaneous |
3D Semantic Label Transfer and Matching in Human-Robot Collaboration Miscellaneous 2022. |
Putting Humans in the Image Captioning Loop Miscellaneous Bridging Human-Computer Interaction and Natural Language Processing (NAACL 2022), 2022. |
Interactive Machine Learning for Image Captioning Miscellaneous The AAAI-22 Workshop on Interactive Machine Learning, 2022. |
2021 |
Journal Articles |
Künstliche Intelligenz in der Medizin und Gynäkologie – Holzweg oder Heilversprechen? Journal Article Der Gynäkologe, 1 , pp. 1-7, 2021. |
ARETT: Augmented Reality Eye Tracking Toolkit for Head Mounted Displays Journal Article Sensors - Open Access Journal, 21 , pp. 18, 2021. |
Minimizing false negative rate in melanoma detection and providing insight into the causes of classification Journal Article Computing Research Repository eprint Journal, abs/2102.09199 , pp. 1-14, 2021. |
Automatic Visual Attention Detection for Mobile Eye Tracking Using Pre-Trained Computer Vision Models and Human Gaze Journal Article Sensors - Open Access Journal, 21 , pp. 21, 2021. |
Investigating the Usability of a Head-Mounted Display Augmented Reality Device in Elementary School Children Journal Article Sensors - Open Access Journal, 21 , pp. 20, 2021. |
Incollections |
Incremental Improvement of a Question Answering System by Re-ranking Answer Candidates Using Machine Learning Incollection Marchi, Erik; Siniscalchi, Sabato Marco; Cumani, Sandro; Salerno, Valerio Mario; Li, Haizhou (Ed.): Increasing Naturalness and Flexibility in Spoken Dialogue Interaction: 10th International Workshop on Spoken Dialogue Systems, pp. 367-379, Springer, 2021. |
Inproceedings |
Improving German Image Captions using Machine Translation and Transfer Learning Inproceedings Espinosa-Anke, Luis; Martin-Vide, Carlos; Spasic, Irena (Ed.): Statistical Language and Speech Processing SLSP 2021, Springer, Council Chamber Glamorgan Building King Edward VII Ave Cathays Park Cardiff CF10 3WT, 2021. |
A Multilingual Benchmark for Probing Negation-Awareness with Minimal Pairs Inproceedings Proceedings of the 25th Conference on Computational Natural Language Learning (CoNLL), pp. 224-257, Association for Computational Linguistics, 2021. |
mDAPT: Multilingual Domain Adaptive Pretraining in a Single Model Inproceedings Findings of the Association for Computational Linguistics - EMNLP 2021, pp. 3404-3018, Association for Computational Linguistics, 2021. |
Live Testing of Flexibilities on Distribution Grid Level – Simulation Setup and Lessons Learned Inproceedings IEEE Electric Power and Energy Conference, IEEE Xplore, IEEE Operations Center 445 Hoes Lane Piscataway, NJ 08854-4141 USA Phone: +1 732 981 0060, 2021. |
Automatic Recognition and Augmentation of Attended Objects in Real-Time Using Eye Tracking and a Head-Mounted Display Inproceedings ACM Symposium on Eye Tracking Research and Applications, pp. 4, Association for Computing Machinery, 2021. |
An Attention Mechanism using Multiple Knowledge Sources for COVID-19 Detection from CT Images Inproceedings The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), AAAI, 2021. |
Explainable Automatic Evaluation of the Trail Making Test for Dementia Screening Inproceedings Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, 2021. |
EyeLogin - Calibration-Free Authentication Method for Public Displays Using Eye Gaze Inproceedings ACM Symposium on Eye Tracking Research and Applications, Association for Computing Machinery, 2021. |
A Software Toolbox for Deploying Deep Learning Decision Support Systems with XAI Capabilities Inproceedings Companion of the 2021 ACM SIGCHI Symposium on Engineering Interactive Computing Systems, Association for Computing Machinery, 2021. |
Assessing Cognitive Test Performance Using Automatic Digital Pen Features Analysis Inproceedings Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization, Association for Computing Machinery, 2021. |
Self-Supervised Domain Adaptation for Diabetic Retinopathy Grading using Vessel Image Reconstruction Inproceedings Proceedings of the 44th German Conference on Artificial Intelligence, Springer, 2021. |
On the Overlap Between Grad-CAM Saliency Maps and Explainable Visual Features in Skin Cancer Images Inproceedings Holzinger, Andreas; Kieseberg, Peter; Tjoa, Min A; Weippl, Edgar (Ed.): Machine Learning and Knowledge Extraction, pp. 241-253, Springer International Publishing, 2021. |
Anomaly Detection for Skin Lesion Images Using Replicator Neural Networks Inproceedings Holzinger, Andreas; Kieseberg, Peter; Tjoa, Min A; Weippl, Edgar (Ed.): Machine Learning and Knowledge Extraction, pp. 225-240, Springer International Publishing, 2021. |
Crop It, but Not Too Much: The Effects of Masking on the Classification of Melanoma Images Inproceedings Edelkamp, Stefan; Rueckert, Elmar; Möller, Ralf (Ed.): KI 2021: Advances in Artificial Intelligence, pp. 179-193, Springer International Publishing, 2021. |
A Demonstrator for Interactive Image Clustering and Fine-Tuning Neural Networks in Virtual Reality Inproceedings Edelkamp, Stefan; Rueckert, Elmar; Möller, Ralf (Ed.): KI 2021: Advances in Artificial Intelligence, pp. 194-203, Springer International Publishing, 2021. |
Multisensor-Pipeline: A Lightweight, Flexible, and Extensible Framework for Building Multimodal-Multisensor Interfaces Inproceedings Companion Publication of the 2021 International Conference on Multimodal Interaction, pp. 13-18, Association for Computing Machinery, 2021. |
Miscellaneous |
Interaction with Explanations in the XAINES Project Miscellaneous Trustworthy AI in the Wild Workshop 2021, 2021. |
Measuring Intrisic and Extraneous Cognitive Load in Elementary School Students Using Subjective Ratings and Smart Pen Data Miscellaneous 13th International Cognitive Load Theory Conference, 2021. |
13th International Cognitive Load Theory Conference, 2021. |
Augmented Reality zur Förderung globaler Kohärenzbildungsprozesse beim Experimentieren im Sachunterricht Miscellaneous Tagung der Fachgruppe Pädagogische Psychologie, 2021. |
Technical Reports |
BMBF Bundesministerium für Bildung und Forschung Kapelle-Ufer 1 D-10117 Berlin, , 2021. |
2020 |
Journal Articles |
Towards Explanatory Interactive Image Captioning Using Top-Down and Bottom-Up Features, Beam Search and Re-ranking Journal Article KI - Künstliche Intelligenz, German Journal on Artificial Intelligence - Organ des Fachbereiches "Künstliche Intelligenz" der Gesellschaft für Informatik e.V., 36 , pp. 1-14, 2020. |
Digital pen technology for conducting cognitive assessments: a cross-over study with older adults Journal Article Psychological Research, 85 , pp. 1-9, 2020. |
Inproceedings |
The effects of masking in melanoma image classification with CNNs towards international standards for image preprocessing Inproceedings 2020 EAI International Symposium on Medical Artificial Intelligence, EAI, 2020. |
A Visually Explainable Learning System for Skin Lesion Detection Using Multiscale Input with Attention U-Net Inproceedings KI 2020: Advances in Artificial Intelligence, pp. 313-319, Springer, 2020. |