2021 |
Inproceedings |
Bhatti, Omair Shahzad; Barz, Michael; Sonntag, Daniel EyeLogin - Calibration-Free Authentication Method for Public Displays Using Eye Gaze Inproceedings ACM Symposium on Eye Tracking Research and Applications, Association for Computing Machinery, 2021. @inproceedings{11616, title = {EyeLogin - Calibration-Free Authentication Method for Public Displays Using Eye Gaze}, author = {Omair Shahzad Bhatti and Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11616_EyeLogin.pdf}, doi = {https://doi.org/10.1145/3448018.3458001}, year = {2021}, date = {2021-01-01}, booktitle = {ACM Symposium on Eye Tracking Research and Applications}, publisher = {Association for Computing Machinery}, abstract = {The usage of interactive public displays has increased including the number of sensitive applications and, hence, the demand for user authentication methods. In this context, gaze-based authentication was shown to be effective and more secure, but significantly slower than touch- or gesture-based methods. We implement a calibration-free and fast authentication method for situated displays based on saccadic eye movements. In a user study (n = 10), we compare our new method with CueAuth from Khamis et al. (IMWUT’18), an authentication method based on smooth pursuit eye movements. The results show a significant improvement in accuracy from 82.94% to 95.88%. At the same time, we found that the entry speed can be increased enormously with our method, on average, 18.28s down to 5.12s, which is comparable to touch-based input.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The usage of interactive public displays has increased including the number of sensitive applications and, hence, the demand for user authentication methods. In this context, gaze-based authentication was shown to be effective and more secure, but significantly slower than touch- or gesture-based methods. We implement a calibration-free and fast authentication method for situated displays based on saccadic eye movements. In a user study (n = 10), we compare our new method with CueAuth from Khamis et al. (IMWUT’18), an authentication method based on smooth pursuit eye movements. The results show a significant improvement in accuracy from 82.94% to 95.88%. At the same time, we found that the entry speed can be increased enormously with our method, on average, 18.28s down to 5.12s, which is comparable to touch-based input. |
Nunnari, Fabrizio; Sonntag, Daniel A Software Toolbox for Deploying Deep Learning Decision Support Systems with XAI Capabilities Inproceedings Companion of the 2021 ACM SIGCHI Symposium on Engineering Interactive Computing Systems, Association for Computing Machinery, 2021. @inproceedings{11664, title = {A Software Toolbox for Deploying Deep Learning Decision Support Systems with XAI Capabilities}, author = {Fabrizio Nunnari and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11664_nunnari21EICS-TIML.pdf}, doi = {https://doi.org/10.1145/3459926.3464753}, year = {2021}, date = {2021-01-01}, booktitle = {Companion of the 2021 ACM SIGCHI Symposium on Engineering Interactive Computing Systems}, publisher = {Association for Computing Machinery}, abstract = {We describe the software architecture of a toolbox of reusable components for the configuration of convolutional neural networks (CNNs) for classification and labeling problems. The toolbox architecture has been designed to maximize the reuse of established algorithms and to include domain experts in the development and evaluation process across different projects and challenges. In addition, we implemented easy-to-edit input formats and modules for XAI (eXplainable AI) through visual inspection capabilities. The toolbox is available for the research community to implement applied artificial intelligence projects.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We describe the software architecture of a toolbox of reusable components for the configuration of convolutional neural networks (CNNs) for classification and labeling problems. The toolbox architecture has been designed to maximize the reuse of established algorithms and to include domain experts in the development and evaluation process across different projects and challenges. In addition, we implemented easy-to-edit input formats and modules for XAI (eXplainable AI) through visual inspection capabilities. The toolbox is available for the research community to implement applied artificial intelligence projects. |
Prange, Alexander; Sonntag, Daniel Assessing Cognitive Test Performance Using Automatic Digital Pen Features Analysis Inproceedings Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization, Association for Computing Machinery, 2021. @inproceedings{11703, title = {Assessing Cognitive Test Performance Using Automatic Digital Pen Features Analysis}, author = {Alexander Prange and Daniel Sonntag}, year = {2021}, date = {2021-01-01}, booktitle = {Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization}, publisher = {Association for Computing Machinery}, abstract = {Most cognitive assessments, for dementia screening for example, are conducted with a pen on normal paper. We record these tests with a digital pen as part of a new interactive cognitive assessment tool with automatic analysis of pen input. The clinician can, first, observe the sketching process in real-time on a mobile tablet, e.g., in telemedicine settings or to follow Covid-19 distancing regulations. Second, the results of an automatic test analysis are presented to the clinician in real-time, thereby reducing manual scoring effort and producing objective reports. The presented research describes the architecture of our cognitive assessment tool and examines how accurately different machine learning (ML) models can automatically score cognitive tests, without a semantic content analysis. Our system uses a set of more than 170 pen features, calculated directly from the raw digital pen signal. We evaluate our system with 40 subjects from a geriatrics daycare clinic. Using standard ML techniques our feature set outperforms previous approaches on the cognitive tests we consider, i.e., the Clock Drawing, the Rey-Osterrieth Complex Figure, and the Trail Making Test, by automatically scoring tests with up to 82% accuracy in a binary classification task.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Most cognitive assessments, for dementia screening for example, are conducted with a pen on normal paper. We record these tests with a digital pen as part of a new interactive cognitive assessment tool with automatic analysis of pen input. The clinician can, first, observe the sketching process in real-time on a mobile tablet, e.g., in telemedicine settings or to follow Covid-19 distancing regulations. Second, the results of an automatic test analysis are presented to the clinician in real-time, thereby reducing manual scoring effort and producing objective reports. The presented research describes the architecture of our cognitive assessment tool and examines how accurately different machine learning (ML) models can automatically score cognitive tests, without a semantic content analysis. Our system uses a set of more than 170 pen features, calculated directly from the raw digital pen signal. We evaluate our system with 40 subjects from a geriatrics daycare clinic. Using standard ML techniques our feature set outperforms previous approaches on the cognitive tests we consider, i.e., the Clock Drawing, the Rey-Osterrieth Complex Figure, and the Trail Making Test, by automatically scoring tests with up to 82% accuracy in a binary classification task. |
Nguyen, Ho Minh Duy; Mai, Truong Thanh-Nhat; Than, Ngoc Trong Tuong; Prange, Alexander; Sonntag, Daniel Self-Supervised Domain Adaptation for Diabetic Retinopathy Grading using Vessel Image Reconstruction Inproceedings Proceedings of the 44th German Conference on Artificial Intelligence, Springer, 2021. @inproceedings{11715, title = {Self-Supervised Domain Adaptation for Diabetic Retinopathy Grading using Vessel Image Reconstruction}, author = {Ho Minh Duy Nguyen and Truong Thanh-Nhat Mai and Ngoc Trong Tuong Than and Alexander Prange and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11715_KI_2021_Self_Supervised_Domain_Adaptation_for_Diabetic_Retinopathy_Grading.pdf}, year = {2021}, date = {2021-01-01}, booktitle = {Proceedings of the 44th German Conference on Artificial Intelligence}, publisher = {Springer}, abstract = {This paper investigates the problem of domain adaptation for diabetic retinopathy (DR) grading. We learn invariant target-domain features by defining a novel self-supervised task based on retinal vessel image reconstructions, inspired by medical domain knowledge. Then, a benchmark of current state-of-the-art unsupervised domain adaptation methods on the DR problem is provided. It can be shown that our approach outperforms existing domain adaption strategies. Furthermore, when utilizing entire training data in the target domain, we are able to compete with several state-of-the-art approaches in final classification accuracy just by applying standard network architectures and using image-level labels.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } This paper investigates the problem of domain adaptation for diabetic retinopathy (DR) grading. We learn invariant target-domain features by defining a novel self-supervised task based on retinal vessel image reconstructions, inspired by medical domain knowledge. Then, a benchmark of current state-of-the-art unsupervised domain adaptation methods on the DR problem is provided. It can be shown that our approach outperforms existing domain adaption strategies. Furthermore, when utilizing entire training data in the target domain, we are able to compete with several state-of-the-art approaches in final classification accuracy just by applying standard network architectures and using image-level labels. |
Nunnari, Fabrizio; Kadir, Md Abdul; Sonntag, Daniel On the Overlap Between Grad-CAM Saliency Maps and Explainable Visual Features in Skin Cancer Images Inproceedings Holzinger, Andreas; Kieseberg, Peter; Tjoa, Min A; Weippl, Edgar (Ed.): Machine Learning and Knowledge Extraction, pp. 241-253, Springer International Publishing, 2021. @inproceedings{11802, title = {On the Overlap Between Grad-CAM Saliency Maps and Explainable Visual Features in Skin Cancer Images}, author = {Fabrizio Nunnari and Md Abdul Kadir and Daniel Sonntag}, editor = {Andreas Holzinger and Peter Kieseberg and Min A Tjoa and Edgar Weippl}, url = {https://www.dfki.de/fileadmin/user_upload/import/11802_2021_CD_MAKE_XAI_and_SkinFeatures.pdf}, doi = {https://doi.org/10.1007/978-3-030-84060-0_16}, year = {2021}, date = {2021-01-01}, booktitle = {Machine Learning and Knowledge Extraction}, volume = {12844}, pages = {241-253}, publisher = {Springer International Publishing}, abstract = {Dermatologists recognize melanomas by inspecting images in which they identify human-comprehensible visual features. In this paper, we investigate to what extent such features correspond to the saliency areas identified on CNNs trained for classification. Our experiments, conducted on two neural architectures characterized by different depth and different resolution of the last convolutional layer, quantify to what extent thresholded Grad-CAM saliency maps can be used to identify visual features of skin cancer. We found that the best threshold value, i.e., the threshold at which we can measure the highest Jaccard index, varies significantly among features; ranging from 0.3 to 0.7. In addition, we measured Jaccard indices as high as 0.143, which is almost 50% of the performance of state-of-the-art architectures specialized in feature mask prediction at pixel-level, such as U-Net. Finally, a breakdown test between malignancy and classification correctness shows that higher resolution saliency maps could help doctors in spotting wrong classifications.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Dermatologists recognize melanomas by inspecting images in which they identify human-comprehensible visual features. In this paper, we investigate to what extent such features correspond to the saliency areas identified on CNNs trained for classification. Our experiments, conducted on two neural architectures characterized by different depth and different resolution of the last convolutional layer, quantify to what extent thresholded Grad-CAM saliency maps can be used to identify visual features of skin cancer. We found that the best threshold value, i.e., the threshold at which we can measure the highest Jaccard index, varies significantly among features; ranging from 0.3 to 0.7. In addition, we measured Jaccard indices as high as 0.143, which is almost 50% of the performance of state-of-the-art architectures specialized in feature mask prediction at pixel-level, such as U-Net. Finally, a breakdown test between malignancy and classification correctness shows that higher resolution saliency maps could help doctors in spotting wrong classifications. |
Nunnari, Fabrizio; Alam, Hasan Md Tusfiqur; Sonntag, Daniel Anomaly Detection for Skin Lesion Images Using Replicator Neural Networks Inproceedings Holzinger, Andreas; Kieseberg, Peter; Tjoa, Min A; Weippl, Edgar (Ed.): Machine Learning and Knowledge Extraction, pp. 225-240, Springer International Publishing, 2021. @inproceedings{11803, title = {Anomaly Detection for Skin Lesion Images Using Replicator Neural Networks}, author = {Fabrizio Nunnari and Hasan Md Tusfiqur Alam and Daniel Sonntag}, editor = {Andreas Holzinger and Peter Kieseberg and Min A Tjoa and Edgar Weippl}, url = {https://www.dfki.de/fileadmin/user_upload/import/11803_2021_CD_MAKE_AnomalyDetection.pdf}, doi = {https://doi.org/10.1007/978-3-030-84060-0_15}, year = {2021}, date = {2021-01-01}, booktitle = {Machine Learning and Knowledge Extraction}, volume = {12844}, pages = {225-240}, publisher = {Springer International Publishing}, abstract = {This paper presents an investigation on the task of anomaly detection for images of skin lesions. The goal is to provide a decision support system with an extra filtering layer to inform users if a classifier should not be used for a given sample. We tested anomaly detectors based on autoencoders and three discrimination methods: feature vector distance, replicator neural networks, and support vector data description fine-tuning. Results show that neural-based detectors can perfectly discriminate between skin lesions and open world images, but class discrimination cannot easily be accomplished and requires further investigation.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } This paper presents an investigation on the task of anomaly detection for images of skin lesions. The goal is to provide a decision support system with an extra filtering layer to inform users if a classifier should not be used for a given sample. We tested anomaly detectors based on autoencoders and three discrimination methods: feature vector distance, replicator neural networks, and support vector data description fine-tuning. Results show that neural-based detectors can perfectly discriminate between skin lesions and open world images, but class discrimination cannot easily be accomplished and requires further investigation. |
Nunnari, Fabrizio; Ezema, Abraham; Sonntag, Daniel Crop It, but Not Too Much: The Effects of Masking on the Classification of Melanoma Images Inproceedings Edelkamp, Stefan; Rueckert, Elmar; Möller, Ralf (Ed.): KI 2021: Advances in Artificial Intelligence, pp. 179-193, Springer International Publishing, 2021. @inproceedings{11859, title = {Crop It, but Not Too Much: The Effects of Masking on the Classification of Melanoma Images}, author = {Fabrizio Nunnari and Abraham Ezema and Daniel Sonntag}, editor = {Stefan Edelkamp and Elmar Rueckert and Ralf Möller}, url = {https://www.dfki.de/fileadmin/user_upload/import/11859_2021_KIconference_SkinLesionMasking.pdf https://link.springer.com/chapter/10.1007/978-3-030-87626-5_13}, year = {2021}, date = {2021-01-01}, booktitle = {KI 2021: Advances in Artificial Intelligence}, pages = {179-193}, publisher = {Springer International Publishing}, abstract = {To improve the accuracy of convolutional neural networks in discriminating between nevi and melanomas, we test nine different combinations of masking and cropping on three datasets of skin lesion images (ISIC2016, ISIC2018, and MedNode). Our experiments, confirmed by 10-fold cross-validation, show that cropping increases classification performances, but specificity decreases when cropping is applied together with masking out healthy skin regions. An analysis of Grad-CAM saliency maps shows that in fact our CNN models have the tendency to focus on healthy skin at the border when a nevus is classified.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } To improve the accuracy of convolutional neural networks in discriminating between nevi and melanomas, we test nine different combinations of masking and cropping on three datasets of skin lesion images (ISIC2016, ISIC2018, and MedNode). Our experiments, confirmed by 10-fold cross-validation, show that cropping increases classification performances, but specificity decreases when cropping is applied together with masking out healthy skin regions. An analysis of Grad-CAM saliency maps shows that in fact our CNN models have the tendency to focus on healthy skin at the border when a nevus is classified. |
Prange, Alexander; Sonntag, Daniel A Demonstrator for Interactive Image Clustering and Fine-Tuning Neural Networks in Virtual Reality Inproceedings Edelkamp, Stefan; Rueckert, Elmar; Möller, Ralf (Ed.): KI 2021: Advances in Artificial Intelligence, pp. 194-203, Springer International Publishing, 2021. @inproceedings{11886, title = {A Demonstrator for Interactive Image Clustering and Fine-Tuning Neural Networks in Virtual Reality}, author = {Alexander Prange and Daniel Sonntag}, editor = {Stefan Edelkamp and Elmar Rueckert and Ralf Möller}, url = {https://link.springer.com/chapter/10.1007/978-3-030-87626-5_14}, year = {2021}, date = {2021-01-01}, booktitle = {KI 2021: Advances in Artificial Intelligence}, pages = {194-203}, publisher = {Springer International Publishing}, abstract = {We present a virtual reality (VR) application that enables us to interactively explore and manipulate image clusters based on layer activations of convolutional neural networks (CNNs). We apply dimensionality reduction techniques to project images into the 3D space, where the user can directly interact with the model. The user can change the position of an image by using natural hand gestures. This manipulation triggers additional training steps of the network, based on the new spatial information and new label of the image. After the training step is finished, the visualization is updated according to the new output of the CNN. The goal is to visualize and improve the cluster output of the model, and at the same time, to improve the understanding of the model. We discuss two different approaches for calculating the VR projection, a combined PCA/t-SNE dimensionality reduction based approach and a variational auto-encoder (VAE) based approach.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We present a virtual reality (VR) application that enables us to interactively explore and manipulate image clusters based on layer activations of convolutional neural networks (CNNs). We apply dimensionality reduction techniques to project images into the 3D space, where the user can directly interact with the model. The user can change the position of an image by using natural hand gestures. This manipulation triggers additional training steps of the network, based on the new spatial information and new label of the image. After the training step is finished, the visualization is updated according to the new output of the CNN. The goal is to visualize and improve the cluster output of the model, and at the same time, to improve the understanding of the model. We discuss two different approaches for calculating the VR projection, a combined PCA/t-SNE dimensionality reduction based approach and a variational auto-encoder (VAE) based approach. |
Barz, Michael; Bhatti, Omair Shahzad; Lüers, Bengt; Prange, Alexander; Sonntag, Daniel Multisensor-Pipeline: A Lightweight, Flexible, and Extensible Framework for Building Multimodal-Multisensor Interfaces Inproceedings Companion Publication of the 2021 International Conference on Multimodal Interaction, pp. 13-18, Association for Computing Machinery, 2021. @inproceedings{11981, title = {Multisensor-Pipeline: A Lightweight, Flexible, and Extensible Framework for Building Multimodal-Multisensor Interfaces}, author = {Michael Barz and Omair Shahzad Bhatti and Bengt Lüers and Alexander Prange and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11981_icmi_cr.pdf}, year = {2021}, date = {2021-01-01}, booktitle = {Companion Publication of the 2021 International Conference on Multimodal Interaction}, pages = {13-18}, publisher = {Association for Computing Machinery}, abstract = {We present the multisensor-pipeline (MSP), a lightweight, flexible, and extensible framework for prototyping multimodal-multisensor interfaces based on real-time sensor input. Our open-source framework (available on GitHub) enables researchers and developers to easily integrate multiple sensors or other data streams via source modules, to add stream and event processing capabilities via processor modules, and to connect user interfaces or databases via sink modules in a graph-based processing pipeline. Our framework is implemented in Python with a low number of dependencies, which enables a quick setup process, execution across multiple operating systems, and direct access to cutting-edge machine learning libraries and models. We showcase the functionality and capabilities of MSP through a sample application that connects a mobile eye tracker to classify image patches surrounding the user’s fixation points and visualizes the classification results in real-time.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We present the multisensor-pipeline (MSP), a lightweight, flexible, and extensible framework for prototyping multimodal-multisensor interfaces based on real-time sensor input. Our open-source framework (available on GitHub) enables researchers and developers to easily integrate multiple sensors or other data streams via source modules, to add stream and event processing capabilities via processor modules, and to connect user interfaces or databases via sink modules in a graph-based processing pipeline. Our framework is implemented in Python with a low number of dependencies, which enables a quick setup process, execution across multiple operating systems, and direct access to cutting-edge machine learning libraries and models. We showcase the functionality and capabilities of MSP through a sample application that connects a mobile eye tracker to classify image patches surrounding the user’s fixation points and visualizes the classification results in real-time. |
Miscellaneous |
Hartmann, Mareike; Kruijff-Korbayová, Ivana; Sonntag, Daniel Interaction with Explanations in the XAINES Project Miscellaneous Trustworthy AI in the Wild Workshop 2021, 2021. @misc{11867, title = {Interaction with Explanations in the XAINES Project}, author = {Mareike Hartmann and Ivana Kruijff-Korbayová and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11867_AI_in_the_wild__Xaines.pdf}, year = {2021}, date = {2021-09-01}, booktitle = {Trustworthy AI in the wild}, publisher = {-}, abstract = {AI systems are increasingly pervasive and their large-scale adoption makes it necessary to explain their behaviour, for example to their users who are impacted by their decisions, or to their developers who need to ensure their functionality. This requires, on the one hand, to obtain an accurate representation of the chain of events that caused the system to behave in a certain way (e.g., to make a specific decision). On the other hand, this causal chain needs to be communicated to the users depending on their needs and expectations. In this phase of explanation delivery, allowing interaction between user and model has the potential to improve both model quality and user experience. In this abstract, we present our planned and on-going work on the interaction with explanations as part of the XAINES project. The project investigates the explanation of AI systems through narratives targeted to the needs of a specific audience, and our work focuses on the question of how and in which way human-model interaction can enable successful explanation.}, howpublished = {Trustworthy AI in the Wild Workshop 2021}, keywords = {}, pubstate = {published}, tppubtype = {misc} } AI systems are increasingly pervasive and their large-scale adoption makes it necessary to explain their behaviour, for example to their users who are impacted by their decisions, or to their developers who need to ensure their functionality. This requires, on the one hand, to obtain an accurate representation of the chain of events that caused the system to behave in a certain way (e.g., to make a specific decision). On the other hand, this causal chain needs to be communicated to the users depending on their needs and expectations. In this phase of explanation delivery, allowing interaction between user and model has the potential to improve both model quality and user experience. In this abstract, we present our planned and on-going work on the interaction with explanations as part of the XAINES project. The project investigates the explanation of AI systems through narratives targeted to the needs of a specific audience, and our work focuses on the question of how and in which way human-model interaction can enable successful explanation. |
Malone, Sarah; Altmeyer, Kristin; Barz, Michael; Lauer, Luisa; Sonntag, Daniel; Peschel, Markus; Brünken, Roland Measuring Intrisic and Extraneous Cognitive Load in Elementary School Students Using Subjective Ratings and Smart Pen Data Miscellaneous 13th International Cognitive Load Theory Conference, 2021. @misc{11868, title = {Measuring Intrisic and Extraneous Cognitive Load in Elementary School Students Using Subjective Ratings and Smart Pen Data}, author = {Sarah Malone and Kristin Altmeyer and Michael Barz and Luisa Lauer and Daniel Sonntag and Markus Peschel and Roland Brünken}, url = {https://www.dfki.de/fileadmin/user_upload/import/11868_Cl_measurement_in_children.pdf}, year = {2021}, date = {2021-01-01}, abstract = {New methods are constantly being developed to optimize and adapt cognitive load measurement to different contexts (Korbach et al., 2018). It is noteworthy, however, that research on cognitive load measurement in elementary school students is rare. Although there is some evidence that they might be able to report their total cognitive load (Ayres, 2006), there are also reasons to doubt the quality of children’s self-reports (e.g., Chambers & Johnson, 2002). To avoid these issues, behavioral and objective online-measures are promising. A novel approach – the use of smartpen data generated by natural use of a pen during task completion – seems particularly encouraging as these measures proved to be predictive of cognitive load in adults (e.g., Yu, Epps, & Chen, 2011). Moreover, Barz et al. (2020) demonstrated the predictive power of smartpen data for performance in children. The present research addressed two prevailing gaps in research on cognitive load assessment in elementary school students. We developed a subjective rating scale and investigated whether this instrument can provide valid measurements of ICL and ECL (Research Question 1). Moreover, we researched whether smartpen data can be used as a valid process measurement of cognitive load (Research Question 2).}, howpublished = {13th International Cognitive Load Theory Conference}, keywords = {}, pubstate = {published}, tppubtype = {misc} } New methods are constantly being developed to optimize and adapt cognitive load measurement to different contexts (Korbach et al., 2018). It is noteworthy, however, that research on cognitive load measurement in elementary school students is rare. Although there is some evidence that they might be able to report their total cognitive load (Ayres, 2006), there are also reasons to doubt the quality of children’s self-reports (e.g., Chambers & Johnson, 2002). To avoid these issues, behavioral and objective online-measures are promising. A novel approach – the use of smartpen data generated by natural use of a pen during task completion – seems particularly encouraging as these measures proved to be predictive of cognitive load in adults (e.g., Yu, Epps, & Chen, 2011). Moreover, Barz et al. (2020) demonstrated the predictive power of smartpen data for performance in children. The present research addressed two prevailing gaps in research on cognitive load assessment in elementary school students. We developed a subjective rating scale and investigated whether this instrument can provide valid measurements of ICL and ECL (Research Question 1). Moreover, we researched whether smartpen data can be used as a valid process measurement of cognitive load (Research Question 2). |
Altmeyer, Kristin; Malone, Sarah; Kapp, Sebastian; Barz, Michael; Lauer, Luisa; Thees, Michael; Kuhn, Jochen; Peschel, Markus; Sonntag, Daniel; Brünken, Roland 13th International Cognitive Load Theory Conference, 2021. @misc{11870, title = {The effect of augmented reality on global coherence formation processes during STEM laboratory work in elementary school children}, author = {Kristin Altmeyer and Sarah Malone and Sebastian Kapp and Michael Barz and Luisa Lauer and Michael Thees and Jochen Kuhn and Markus Peschel and Daniel Sonntag and Roland Brünken}, url = {https://www.dfki.de/fileadmin/user_upload/import/11870_ICLTC_2021_Altmeyer_final.pdf}, year = {2021}, date = {2021-01-01}, abstract = {In science education, hands-on student experiments are used to explore cause and effect relationships. Conventional lab work requires students to interact with physical experimentation objects and observe additional information like measurement values to deduce scientific laws and interrelations. The observable information, however, are usually presented in physical distance to the setting, e.g., on a separate display of a measuring device. The resulting spatial split (Chandler & Sweller, 1991) between representations hampers global coherence formation (Seufert & Brünken, 2004): Mapping processes between the spatially distant sources of information are assumed to lead to an increase in extraneous cognitive load (ECL; Ayres & Sweller, 2014). Consequently, learning outcomes can be impaired (Kalyuga et al., 1999). Augmented Reality (AR) can be used to overcome the split-attention effect by allowing additional information to be virtually integrated into the real-world set-up (Azuma, 1997). A study by Altmeyer et al. (2020) with university students showed that AR-support during experimentation led to a higher conceptual knowledge gain but had no effect on ECL. The current study provides a conceptual replication of Altmeyer et al.’s (2020) research and focuses on three main objectives: First, we aimed at investigating the generalizability of the advantage of AR on experimental learning in a sample of elementary school children. Second, we examined if low prior-knowledge of children even amplifies the split-attention effect, as proposed by Kalyuga et al. (1998). Finally, we focused on obtaining deeper insights into global coherence formation processes during lab work using specific tests and eye tracking measures.}, howpublished = {13th International Cognitive Load Theory Conference}, keywords = {}, pubstate = {published}, tppubtype = {misc} } In science education, hands-on student experiments are used to explore cause and effect relationships. Conventional lab work requires students to interact with physical experimentation objects and observe additional information like measurement values to deduce scientific laws and interrelations. The observable information, however, are usually presented in physical distance to the setting, e.g., on a separate display of a measuring device. The resulting spatial split (Chandler & Sweller, 1991) between representations hampers global coherence formation (Seufert & Brünken, 2004): Mapping processes between the spatially distant sources of information are assumed to lead to an increase in extraneous cognitive load (ECL; Ayres & Sweller, 2014). Consequently, learning outcomes can be impaired (Kalyuga et al., 1999). Augmented Reality (AR) can be used to overcome the split-attention effect by allowing additional information to be virtually integrated into the real-world set-up (Azuma, 1997). A study by Altmeyer et al. (2020) with university students showed that AR-support during experimentation led to a higher conceptual knowledge gain but had no effect on ECL. The current study provides a conceptual replication of Altmeyer et al.’s (2020) research and focuses on three main objectives: First, we aimed at investigating the generalizability of the advantage of AR on experimental learning in a sample of elementary school children. Second, we examined if low prior-knowledge of children even amplifies the split-attention effect, as proposed by Kalyuga et al. (1998). Finally, we focused on obtaining deeper insights into global coherence formation processes during lab work using specific tests and eye tracking measures. |
Altmeyer, Kristin; Malone, Sarah; Kapp, Sebastian; Barz, Michael; Lauer, Luisa; Thees, Michael; Kuhn, Jochen; Peschel, Markus; Sonntag, Daniel; Brünken, Roland Augmented Reality zur Förderung globaler Kohärenzbildungsprozesse beim Experimentieren im Sachunterricht Miscellaneous Tagung der Fachgruppe Pädagogische Psychologie, 2021. @misc{11871, title = {Augmented Reality zur Förderung globaler Kohärenzbildungsprozesse beim Experimentieren im Sachunterricht}, author = {Kristin Altmeyer and Sarah Malone and Sebastian Kapp and Michael Barz and Luisa Lauer and Michael Thees and Jochen Kuhn and Markus Peschel and Daniel Sonntag and Roland Brünken}, url = {https://www.dfki.de/fileadmin/user_upload/import/11871_v3_Altmeyer_VR_Symposium_PAEPSY_2021.pdf}, year = {2021}, date = {2021-01-01}, abstract = {Augmented Reality (AR) lässt sich als eine Form virtueller Umgebungen auf einem Realitäts-Virtualitäts-Kontinuum (Milgram & Kishino, 1994) der gemischten Realität zuordnen. AR erweitert die Realität durch die Integration virtueller Objekte. Ein vielversprechendes Anwendungsgebiet für AR im Bildungsbereich bietet das technologiegestützte Experimentieren: Experimente bilden ein wesentliches Merkmal der Naturwissenschaften und werden im MINT-Unterricht eingesetzt, um Zusammenhänge zu untersuchen. Bisherige Forschung deutet darauf hin, dass bereits Kinder im Grundschulalter (natur)wissenschaftliches Denken und die Fähigkeit zum Experimentieren entwickeln können (z.B. Osterhaus et al., 2015). Um Ursache-Wirkung-Beziehungen aus einem Experiment abzuleiten, müssen Lernende meist reale Informationen der Experimentierumgebung mit virtuellen Informationen, wie z.B. Messwerten auf Messwertdisplays, mental verknüpfen. Im Sinne der Cognitive Theory of Multimedia Learning (Mayer, 2005) und der Cognitive Load Theory (Sweller et al., 1998) stellt die Verknüpfung räumlich getrennter Informationen eine besondere Herausforderung an das Arbeitsgedächtnis dar. AR kann dazu genutzt werden, reale und virtuelle Informationen beim Experimentieren integriert darzustellen. Vorausgehende Studienergebnisse (z.B. Altmeyer et al., 2020) implizieren, dass AR die globale Kohärenzbildung (Seufert & Brünken, 2004) unterstützt und zu besseren Lernergebnissen führen kann (Altmeyer et. al., 2020). In der vorliegenden Studie wurde der Effekt von AR-Unterstützung beim Experimentieren in einer Stichprobe von Grundschulkindern untersucht. Nach einem Vorwissenstest führten 59 Kinder Experimente zu elektrischen Schaltkreisen durch. Einer Gruppe wurden Echzeit-Messwerte für die Stromstärke in einer Tabelle auf einem separaten Tabletbildschirm präsentiert. Dagegen sah die AR-unterstützte Gruppe die Messwerte beim Blick durch eine Tabletkamera in die Experimentierumgebung integriert. Während des Experimentierens wurden die Blickbewegungen der Kinder erfasst. Danach bearbeiteten beide Gruppen Posttests, welche in ihren Anforderungen an die globale Kohärenzbildung zwischen realen und virtuellen Elementen beim Experimentieren variierten. Erste Ergebnisse zeigen, dass Kinder insbesondere hinsichtlich Aufgaben, die eine starke globale Kohärenz erfordern, von der AR-Umgebung profitieren. Blickbewegungsanalysen sollen weitere Aufschlüsse über den Prozess der Kohärenzbildung während des Experimentierens in AR geben.}, howpublished = {Tagung der Fachgruppe Pädagogische Psychologie}, keywords = {}, pubstate = {published}, tppubtype = {misc} } Augmented Reality (AR) lässt sich als eine Form virtueller Umgebungen auf einem Realitäts-Virtualitäts-Kontinuum (Milgram & Kishino, 1994) der gemischten Realität zuordnen. AR erweitert die Realität durch die Integration virtueller Objekte. Ein vielversprechendes Anwendungsgebiet für AR im Bildungsbereich bietet das technologiegestützte Experimentieren: Experimente bilden ein wesentliches Merkmal der Naturwissenschaften und werden im MINT-Unterricht eingesetzt, um Zusammenhänge zu untersuchen. Bisherige Forschung deutet darauf hin, dass bereits Kinder im Grundschulalter (natur)wissenschaftliches Denken und die Fähigkeit zum Experimentieren entwickeln können (z.B. Osterhaus et al., 2015). Um Ursache-Wirkung-Beziehungen aus einem Experiment abzuleiten, müssen Lernende meist reale Informationen der Experimentierumgebung mit virtuellen Informationen, wie z.B. Messwerten auf Messwertdisplays, mental verknüpfen. Im Sinne der Cognitive Theory of Multimedia Learning (Mayer, 2005) und der Cognitive Load Theory (Sweller et al., 1998) stellt die Verknüpfung räumlich getrennter Informationen eine besondere Herausforderung an das Arbeitsgedächtnis dar. AR kann dazu genutzt werden, reale und virtuelle Informationen beim Experimentieren integriert darzustellen. Vorausgehende Studienergebnisse (z.B. Altmeyer et al., 2020) implizieren, dass AR die globale Kohärenzbildung (Seufert & Brünken, 2004) unterstützt und zu besseren Lernergebnissen führen kann (Altmeyer et. al., 2020). In der vorliegenden Studie wurde der Effekt von AR-Unterstützung beim Experimentieren in einer Stichprobe von Grundschulkindern untersucht. Nach einem Vorwissenstest führten 59 Kinder Experimente zu elektrischen Schaltkreisen durch. Einer Gruppe wurden Echzeit-Messwerte für die Stromstärke in einer Tabelle auf einem separaten Tabletbildschirm präsentiert. Dagegen sah die AR-unterstützte Gruppe die Messwerte beim Blick durch eine Tabletkamera in die Experimentierumgebung integriert. Während des Experimentierens wurden die Blickbewegungen der Kinder erfasst. Danach bearbeiteten beide Gruppen Posttests, welche in ihren Anforderungen an die globale Kohärenzbildung zwischen realen und virtuellen Elementen beim Experimentieren variierten. Erste Ergebnisse zeigen, dass Kinder insbesondere hinsichtlich Aufgaben, die eine starke globale Kohärenz erfordern, von der AR-Umgebung profitieren. Blickbewegungsanalysen sollen weitere Aufschlüsse über den Prozess der Kohärenzbildung während des Experimentierens in AR geben. |
Technical Reports |
Profitlich, Hans-Jürgen; Sonntag, Daniel BMBF Bundesministerium für Bildung und Forschung Kapelle-Ufer 1 D-10117 Berlin, , 2021. @techreport{11611, title = {A Case Study on Pros and Cons of Regular Expression Detection and Dependency Parsing for Negation Extraction from German Medical Documents. Technical Report}, author = {Hans-Jürgen Profitlich and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11611_CaseStudy_TR_final.pdf http://arxiv.org/abs/2105.09702}, year = {2021}, date = {2021-05-01}, volume = {1}, pages = {30}, address = {Bundesministerium für Bildung und Forschung Kapelle-Ufer 1 D-10117 Berlin}, institution = {BMBF}, abstract = {We describe our work on information extraction in medical documents written in German, especially detecting negations using an architecture based on the UIMA pipeline. Based on our previous work on software modules to cover medical concepts like diagnoses, examinations, etc. we employ a version of the NegEx regular expression algorithm with a large set of triggers as a baseline. We show how a significantly smaller trigger set is sufficient to achieve similar results, in order to reduce adaptation times to new text types. We elaborate on the question whether dependency parsing (based on the Stanford CoreNLP model) is a good alternative and describe the potentials and shortcomings of both approaches.}, keywords = {}, pubstate = {published}, tppubtype = {techreport} } We describe our work on information extraction in medical documents written in German, especially detecting negations using an architecture based on the UIMA pipeline. Based on our previous work on software modules to cover medical concepts like diagnoses, examinations, etc. we employ a version of the NegEx regular expression algorithm with a large set of triggers as a baseline. We show how a significantly smaller trigger set is sufficient to achieve similar results, in order to reduce adaptation times to new text types. We elaborate on the question whether dependency parsing (based on the Stanford CoreNLP model) is a good alternative and describe the potentials and shortcomings of both approaches. |
Nguyen, Ho Minh Duy; Nguyen, Thu T; Vu, Huong; Pham, Quang; Nguyen, Manh-Duy; Nguyen, Binh T; Sonntag, Daniel TATL: Task Agnostic Transfer Learning for Skin Attributes Detection Technical Report DFKI , 2021. @techreport{11594, title = {TATL: Task Agnostic Transfer Learning for Skin Attributes Detection}, author = {Ho Minh Duy Nguyen and Thu T Nguyen and Huong Vu and Quang Pham and Manh-Duy Nguyen and Binh T Nguyen and Daniel Sonntag}, url = {https://arxiv.org/pdf/2104.01641.pdf}, year = {2021}, date = {2021-04-01}, volume = {01}, institution = {DFKI}, abstract = {Existing skin attributes detection methods usually initialize with a pre-trained Imagenet network and then fine-tune the medical target task. However, we argue that such approaches are suboptimal because medical datasets are largely different from ImageNet and often contain limited training samples. In this work, we propose Task Agnostic Transfer Learning (TATL), a novel framework motivated by dermatologists' behaviors in the skincare context. TATL learns an attribute-agnostic segmenter that detects lesion skin regions and then transfers this knowledge to a set of attribute-specific classifiers to detect each particular region's attributes. Since TATL's attribute-agnostic segmenter only detects abnormal skin regions, it enjoys ample data from all attributes, allows transferring knowledge among features, and compensates for the lack of training data from rare attributes. We extensively evaluate TATL on two popular skin attributes detection benchmarks and show that TATL outperforms state-of-the-art methods while enjoying minimal model and computational complexity. We also provide theoretical insights and explanations for why TATL works well in practice.}, keywords = {}, pubstate = {published}, tppubtype = {techreport} } Existing skin attributes detection methods usually initialize with a pre-trained Imagenet network and then fine-tune the medical target task. However, we argue that such approaches are suboptimal because medical datasets are largely different from ImageNet and often contain limited training samples. In this work, we propose Task Agnostic Transfer Learning (TATL), a novel framework motivated by dermatologists' behaviors in the skincare context. TATL learns an attribute-agnostic segmenter that detects lesion skin regions and then transfers this knowledge to a set of attribute-specific classifiers to detect each particular region's attributes. Since TATL's attribute-agnostic segmenter only detects abnormal skin regions, it enjoys ample data from all attributes, allows transferring knowledge among features, and compensates for the lack of training data from rare attributes. We extensively evaluate TATL on two popular skin attributes detection benchmarks and show that TATL outperforms state-of-the-art methods while enjoying minimal model and computational complexity. We also provide theoretical insights and explanations for why TATL works well in practice. |
2020 |
Journal Articles |
Biswas, Rajarshi; Barz, Michael; Sonntag, Daniel Towards Explanatory Interactive Image Captioning Using Top-Down and Bottom-Up Features, Beam Search and Re-ranking Journal Article KI - Künstliche Intelligenz, German Journal on Artificial Intelligence - Organ des Fachbereiches "Künstliche Intelligenz" der Gesellschaft für Informatik e.V., 36 , pp. 1-14, 2020. @article{11236, title = {Towards Explanatory Interactive Image Captioning Using Top-Down and Bottom-Up Features, Beam Search and Re-ranking}, author = {Rajarshi Biswas and Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11236_2021_TOWARDS_EXPLANATORY_INTERACTIVE_IMAGE_CAPTIONING_USING_TOP-DOWN_AND_BOTTOM-UP_FEATURES,_BEAM_SEARCH_AND_RE-RANKING.pdf}, doi = {https://doi.org/10.1007/s13218-020-00679-2}, year = {2020}, date = {2020-07-01}, journal = {KI - Künstliche Intelligenz, German Journal on Artificial Intelligence - Organ des Fachbereiches "Künstliche Intelligenz" der Gesellschaft für Informatik e.V.}, volume = {36}, pages = {1-14}, publisher = {Springer}, abstract = {Image captioning is a challenging multimodal task. Significant improvements could be obtained by deep learning. Yet, captions generated by humans are still considered better, which makes it an interesting application for interactive machine learning and explainable artificial intelligence methods. In this work, we aim at improving the performance and explainability of the state-of-the-art method Show, Attend and Tell by augmenting their attention mechanism using additional bottom-up features. We compute visual attention on the joint embedding space formed by the union of high-level features and the low-level features obtained from the object specific salient regions of the input image. We embed the content of bounding boxes from a pre-trained Mask R-CNN model. This delivers state-of-the-art performance, while it provides explanatory features. Further, we discuss how interactive model improvement can be realized through re-ranking caption candidates using beam search decoders and explanatory features. We show that interactive re-ranking of beam search candidates has the potential to outperform the state-of-the-art in image captioning.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Image captioning is a challenging multimodal task. Significant improvements could be obtained by deep learning. Yet, captions generated by humans are still considered better, which makes it an interesting application for interactive machine learning and explainable artificial intelligence methods. In this work, we aim at improving the performance and explainability of the state-of-the-art method Show, Attend and Tell by augmenting their attention mechanism using additional bottom-up features. We compute visual attention on the joint embedding space formed by the union of high-level features and the low-level features obtained from the object specific salient regions of the input image. We embed the content of bounding boxes from a pre-trained Mask R-CNN model. This delivers state-of-the-art performance, while it provides explanatory features. Further, we discuss how interactive model improvement can be realized through re-ranking caption candidates using beam search decoders and explanatory features. We show that interactive re-ranking of beam search candidates has the potential to outperform the state-of-the-art in image captioning. |
Heimann-Steinert, A; Latendorf, A; Prange, Alexander; Sonntag, Daniel; Müller-Werdan, U Digital pen technology for conducting cognitive assessments: a cross-over study with older adults Journal Article Psychological Research, 85 , pp. 1-9, 2020. @article{11374, title = {Digital pen technology for conducting cognitive assessments: a cross-over study with older adults}, author = {A Heimann-Steinert and A Latendorf and Alexander Prange and Daniel Sonntag and U Müller-Werdan}, url = {https://www.dfki.de/fileadmin/user_upload/import/11374_Heimann-Steinert-2020-DigitalPenTechnologyForConduct.pdf https://link.springer.com/article/10.1007/s00426-020-01452-8#citeas}, year = {2020}, date = {2020-01-01}, journal = {Psychological Research}, volume = {85}, pages = {1-9}, publisher = {Springer}, abstract = {Many digitalized cognitive assessments exist to increase reliability, standardization, and objectivity. Particularly in older adults, the performance of digitized cognitive assessments can lead to poorer test results if they are unfamiliar with the computer, mouse, keyboard, or touch screen. In a cross-over design study, 40 older adults (age M = 74.4 ± 4.1 years) conducted the Trail Making Test A and B with a digital pen (digital pen tests, DPT) and a regular pencil (pencil tests, PT) to identify differences in performance. Furthermore, the tests conducted with a digital pen were analyzed manually (manual results, MR) and electronically (electronic results, ER) by an automized system algorithm to determine the possibilities of digital pen evaluation. ICC(2,k) showed a good level of agreement for TMT A (ICC(2,k) = 0.668) and TMT B (ICC(2,k) = 0.734) between PT and DPT. When comparing MR and ER, ICC(2,k) showed an excellent level of agreement in TMT A (ICC(2,k) = 0.999) and TMT B (ICC(2,k) = 0.994). The frequency of pen lifting correlates significantly with the execution time in TMT A (r = 0.372, p = 0.030) and TMT B (r = 0.567, p < 0.001). A digital pen can be used to perform the Trail Making Test, as it has been shown that there is no difference in the results due to the type of pen used. With a digital pen, the advantages of digitized testing can be used without having to accept the disadvantages.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Many digitalized cognitive assessments exist to increase reliability, standardization, and objectivity. Particularly in older adults, the performance of digitized cognitive assessments can lead to poorer test results if they are unfamiliar with the computer, mouse, keyboard, or touch screen. In a cross-over design study, 40 older adults (age M = 74.4 ± 4.1 years) conducted the Trail Making Test A and B with a digital pen (digital pen tests, DPT) and a regular pencil (pencil tests, PT) to identify differences in performance. Furthermore, the tests conducted with a digital pen were analyzed manually (manual results, MR) and electronically (electronic results, ER) by an automized system algorithm to determine the possibilities of digital pen evaluation. ICC(2,k) showed a good level of agreement for TMT A (ICC(2,k) = 0.668) and TMT B (ICC(2,k) = 0.734) between PT and DPT. When comparing MR and ER, ICC(2,k) showed an excellent level of agreement in TMT A (ICC(2,k) = 0.999) and TMT B (ICC(2,k) = 0.994). The frequency of pen lifting correlates significantly with the execution time in TMT A (r = 0.372, p = 0.030) and TMT B (r = 0.567, p < 0.001). A digital pen can be used to perform the Trail Making Test, as it has been shown that there is no difference in the results due to the type of pen used. With a digital pen, the advantages of digitized testing can be used without having to accept the disadvantages. |
Inproceedings |
Nunnari, Fabrizio; Ezema, Abraham; Sonntag, Daniel The effects of masking in melanoma image classification with CNNs towards international standards for image preprocessing Inproceedings 2020 EAI International Symposium on Medical Artificial Intelligence, EAI, 2020. @inproceedings{11368, title = {The effects of masking in melanoma image classification with CNNs towards international standards for image preprocessing}, author = {Fabrizio Nunnari and Abraham Ezema and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11368_2020_EAI_MedAI_StudyOnDatasetBias.pdf}, year = {2020}, date = {2020-12-01}, booktitle = {2020 EAI International Symposium on Medical Artificial Intelligence}, publisher = {EAI}, abstract = {The classification of skin lesion images is known to be biased by artifacts of the surrounding skin, but it is still not clear to what extent masking out healthy skin pixels influences classification performances, and why. To better understand this phenomenon, we apply different strategies of image masking (rectangular masks, circular masks, full masking, and image cropping) to three datasets of skin lesion images (ISIC2016, ISIC2018, and MedNode). We train CNN-based classifiers, provide performance metrics through a 10-fold cross-validation, and analyse the behaviour of Grad-CAM saliency maps through an automated visual inspection. Our experiments show that cropping is the best strategy to maintain classification performance and to significantly re- duce training times as well. Our analysis through visual inspection shows that CNNs have the tendency to focus on pixels of healthy skin when no malignant features can be identified. This suggests that CNNs have the tendency of "eagerly" looking for pixel areas to justify a classification choice, potentially leading to biased discriminators. To mitigate this effect, and to standardize image preprocessing, we suggest to crop images during dataset construction or before the learning step.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The classification of skin lesion images is known to be biased by artifacts of the surrounding skin, but it is still not clear to what extent masking out healthy skin pixels influences classification performances, and why. To better understand this phenomenon, we apply different strategies of image masking (rectangular masks, circular masks, full masking, and image cropping) to three datasets of skin lesion images (ISIC2016, ISIC2018, and MedNode). We train CNN-based classifiers, provide performance metrics through a 10-fold cross-validation, and analyse the behaviour of Grad-CAM saliency maps through an automated visual inspection. Our experiments show that cropping is the best strategy to maintain classification performance and to significantly re- duce training times as well. Our analysis through visual inspection shows that CNNs have the tendency to focus on pixels of healthy skin when no malignant features can be identified. This suggests that CNNs have the tendency of "eagerly" looking for pixel areas to justify a classification choice, potentially leading to biased discriminators. To mitigate this effect, and to standardize image preprocessing, we suggest to crop images during dataset construction or before the learning step. |
Nguyen, Ho Minh Duy; Ezema, Abraham; Nunnari, Fabrizio; Sonntag, Daniel A Visually Explainable Learning System for Skin Lesion Detection Using Multiscale Input with Attention U-Net Inproceedings KI 2020: Advances in Artificial Intelligence, pp. 313-319, Springer, 2020. @inproceedings{11178, title = {A Visually Explainable Learning System for Skin Lesion Detection Using Multiscale Input with Attention U-Net}, author = {Ho Minh Duy Nguyen and Abraham Ezema and Fabrizio Nunnari and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11178_KI_2020.pdf https://link.springer.com/chapter/10.1007/978-3-030-58285-2_28}, year = {2020}, date = {2020-09-01}, booktitle = {KI 2020: Advances in Artificial Intelligence}, volume = {12325}, pages = {313-319}, publisher = {Springer}, abstract = {In this work, we propose a new approach to automatically predict the locations of visual dermoscopic attributes for Task 2 of the ISIC 2018 Challenge. Our method is based on the Attention U-Net with multi-scale images as input. We apply a new strategy based on transfer learning, i.e., training the deep network for feature extraction by adapting the weights of the network trained for segmentation. Our tests show that, first, the proposed algorithm is on par or outperforms the best ISIC 2018 architectures (LeHealth and NMN) in the extraction of two visual features. Secondly, it uses only 1/30 of the training parameters; we observed less computation and memory requirements, which are particularly useful for future implementations on mobile devices. Finally, our approach generates visually explainable behaviour with uncertainty estimations to help doctors in diagnosis and treatment decisions.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } In this work, we propose a new approach to automatically predict the locations of visual dermoscopic attributes for Task 2 of the ISIC 2018 Challenge. Our method is based on the Attention U-Net with multi-scale images as input. We apply a new strategy based on transfer learning, i.e., training the deep network for feature extraction by adapting the weights of the network trained for segmentation. Our tests show that, first, the proposed algorithm is on par or outperforms the best ISIC 2018 architectures (LeHealth and NMN) in the extraction of two visual features. Secondly, it uses only 1/30 of the training parameters; we observed less computation and memory requirements, which are particularly useful for future implementations on mobile devices. Finally, our approach generates visually explainable behaviour with uncertainty estimations to help doctors in diagnosis and treatment decisions. |
Barz, Michael; Altmeyer, Kristin; Malone, Sarah; Lauer, Luisa; Sonntag, Daniel Digital Pen Features Predict Task Difficulty and User Performance of Cognitive Tests Inproceedings Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization, ACM, 2020. @inproceedings{10894, title = {Digital Pen Features Predict Task Difficulty and User Performance of Cognitive Tests}, author = {Michael Barz and Kristin Altmeyer and Sarah Malone and Luisa Lauer and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/10894_digital_pen_predicts_task_performance.pdf}, year = {2020}, date = {2020-07-01}, booktitle = {Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization}, publisher = {ACM}, abstract = {Digital pen signals were shown to be predictive for cognitive states, cognitive load and emotion in educational settings. We investigate whether low-level pen-based features can predict the difficulty of tasks in a cognitive test and the learner's performance in these tasks, which is inherently related to cognitive load, without a semantic content analysis. We record data for tasks of varying difficulty in a controlled study with children from elementary school. We include two versions of the Trail Making Test (TMT) and six drawing patterns from the Snijders-Oomen Non-verbal intelligence test (SON) as tasks that feature increasing levels of difficulty. We examine how accurately we can predict the task difficulty and the user performance as a measure for cognitive load using support vector machines and gradient boosted decision trees with different feature selection strategies. The results show that our correlation-based feature selection is beneficial for model training, in particular when samples from TMT and SON are concatenated for joint modelling of difficulty and time. Our findings open up opportunities for technology-enhanced adaptive learning.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Digital pen signals were shown to be predictive for cognitive states, cognitive load and emotion in educational settings. We investigate whether low-level pen-based features can predict the difficulty of tasks in a cognitive test and the learner's performance in these tasks, which is inherently related to cognitive load, without a semantic content analysis. We record data for tasks of varying difficulty in a controlled study with children from elementary school. We include two versions of the Trail Making Test (TMT) and six drawing patterns from the Snijders-Oomen Non-verbal intelligence test (SON) as tasks that feature increasing levels of difficulty. We examine how accurately we can predict the task difficulty and the user performance as a measure for cognitive load using support vector machines and gradient boosted decision trees with different feature selection strategies. The results show that our correlation-based feature selection is beneficial for model training, in particular when samples from TMT and SON are concatenated for joint modelling of difficulty and time. Our findings open up opportunities for technology-enhanced adaptive learning. |
Barz, Michael; Stauden, Sven; Sonntag, Daniel Visual Search Target Inference in Natural Interaction Settings with Machine Learning Inproceedings Bulling, Andreas; Huckauf, Anke; Jain, Eakta; Radach, Ralph; Weiskopf, Daniel (Ed.): ACM Symposium on Eye Tracking Research and Applications, Association for Computing Machinery, 2020. @inproceedings{10893, title = {Visual Search Target Inference in Natural Interaction Settings with Machine Learning}, author = {Michael Barz and Sven Stauden and Daniel Sonntag}, editor = {Andreas Bulling and Anke Huckauf and Eakta Jain and Ralph Radach and Daniel Weiskopf}, year = {2020}, date = {2020-05-01}, booktitle = {ACM Symposium on Eye Tracking Research and Applications}, publisher = {Association for Computing Machinery}, abstract = {Visual search is a perceptual task in which humans aim at identifying a search target object such as a traffic sign among other objects. Search target inference subsumes computational methods for predicting this target by tracking and analyzing overt behavioral cues of that person, e.g., the human gaze and fixated visual stimuli. We present a generic approach to inferring search targets in natural scenes by predicting the class of the surrounding image segment. Our method encodes visual search sequences as histograms of fixated segment classes determined by SegNet, a deep learning image segmentation model for natural scenes. We compare our sequence encoding and model training (SVM) to a recent baseline from the literature for predicting the target segment. Also, we use a new search target inference dataset. The results show that, first, our new segmentation-based sequence encoding outperforms the method from the literature, and second, that it enables target inference in natural settings.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Visual search is a perceptual task in which humans aim at identifying a search target object such as a traffic sign among other objects. Search target inference subsumes computational methods for predicting this target by tracking and analyzing overt behavioral cues of that person, e.g., the human gaze and fixated visual stimuli. We present a generic approach to inferring search targets in natural scenes by predicting the class of the surrounding image segment. Our method encodes visual search sequences as histograms of fixated segment classes determined by SegNet, a deep learning image segmentation model for natural scenes. We compare our sequence encoding and model training (SVM) to a recent baseline from the literature for predicting the target segment. Also, we use a new search target inference dataset. The results show that, first, our new segmentation-based sequence encoding outperforms the method from the literature, and second, that it enables target inference in natural settings. |
Nunnari, Fabrizio; Bhuvaneshwara, Chirag; Ezema, Abraham Obinwanne; Sonntag, Daniel A Study on the Fusion of Pixels and Patient Metadata in CNN-Based Classification of Skin Lesion Images Inproceedings Holzinger, Andreas; Kieseberg, Peter; Tjoa, Min A; Weippl, Edgar (Ed.): Machine Learning and Knowledge Extraction, pp. 191-208, Springer International Publishing, 2020. @inproceedings{11113, title = {A Study on the Fusion of Pixels and Patient Metadata in CNN-Based Classification of Skin Lesion Images}, author = {Fabrizio Nunnari and Chirag Bhuvaneshwara and Abraham Obinwanne Ezema and Daniel Sonntag}, editor = {Andreas Holzinger and Peter Kieseberg and Min A Tjoa and Edgar Weippl}, url = {https://www.dfki.de/fileadmin/user_upload/import/11113_Nunnari20CD-MAKE.pdf https://link.springer.com/chapter/10.1007/978-3-030-57321-8_11}, year = {2020}, date = {2020-01-01}, booktitle = {Machine Learning and Knowledge Extraction}, pages = {191-208}, publisher = {Springer International Publishing}, abstract = {We present a study on the fusion of pixel data and patient metadata (age, gender, and body location) for improving the classification of skin lesion images. The experiments have been conducted with the ISIC 2019 skin lesion classification challenge data set. Taking two plain convolutional neural networks (CNNs) as a baseline, metadata are merged using either non-neural machine learning methods (tree-based and support vector machines) or shallow neural networks. Results show that shallow neural networks outperform other approaches in all overall evaluation measures. However, despite the increase in the classification accuracy (up to +19.1%), interestingly, the average per-class sensitivity decreases in three out of four cases for CNNs, thus suggesting that using metadata penalizes the prediction accuracy for lower represented classes. A study on the patient metadata shows that age is the most useful metadatum as a decision criterion, followed by body location and gender.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We present a study on the fusion of pixel data and patient metadata (age, gender, and body location) for improving the classification of skin lesion images. The experiments have been conducted with the ISIC 2019 skin lesion classification challenge data set. Taking two plain convolutional neural networks (CNNs) as a baseline, metadata are merged using either non-neural machine learning methods (tree-based and support vector machines) or shallow neural networks. Results show that shallow neural networks outperform other approaches in all overall evaluation measures. However, despite the increase in the classification accuracy (up to +19.1%), interestingly, the average per-class sensitivity decreases in three out of four cases for CNNs, thus suggesting that using metadata penalizes the prediction accuracy for lower represented classes. A study on the patient metadata shows that age is the most useful metadatum as a decision criterion, followed by body location and gender. |
Müller, Julia; Sprenger, Max; Franke, Tobias; Lukowicz, Paul; Reidick, Claudia; Herrlich, Marc Game of TUK: deploying a large-scale activity-boosting gamification project in a university context Inproceedings Mensch und Computer, ACM, 2020. @inproceedings{12112, title = {Game of TUK: deploying a large-scale activity-boosting gamification project in a university context}, author = {Julia Müller and Max Sprenger and Tobias Franke and Paul Lukowicz and Claudia Reidick and Marc Herrlich}, url = {https://www.dfki.de/fileadmin/user_upload/import/12112_2020_GAME_OF_TUK-_DEPLOYING_A_LARGE-SCALE_ACTIVITY-BOOSTING_GAMIFICATION_PROJECT_IN_A_UNIVERSITY_CONTEXT.pdf https://dl.acm.org/doi/abs/10.1145/3404983.3410008}, year = {2020}, date = {2020-01-01}, booktitle = {Mensch und Computer}, publisher = {ACM}, abstract = {We present Game of TUK, a gamified mobile app to increase physical activity among students at TU Kaiserslautern. The scale of our project with almost 2,000 players over the course of four weeks is unique for a project in a university context. We present feedback we received and share our insights. Our results show that location-based activities in particular were very popular. In contrast, mini-games included in the app did not contribute as much to user activity as expected.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We present Game of TUK, a gamified mobile app to increase physical activity among students at TU Kaiserslautern. The scale of our project with almost 2,000 players over the course of four weeks is unique for a project in a university context. We present feedback we received and share our insights. Our results show that location-based activities in particular were very popular. In contrast, mini-games included in the app did not contribute as much to user activity as expected. |
Technical Reports |
Sonntag, Daniel; Nunnari, Fabrizio; Profitlich, Hans-Jürgen The Skincare project, an interactive deep learning system for differential diagnosis of malignant skin lesions. Technical Report BMBF, H2020 Bundesministerium für Bildung und Forschung Kapelle-Ufer 1 D-10117 Berlin, , 2020. @techreport{10912, title = {The Skincare project, an interactive deep learning system for differential diagnosis of malignant skin lesions.}, author = {Daniel Sonntag and Fabrizio Nunnari and Hans-Jürgen Profitlich}, url = {https://www.dfki.de/fileadmin/user_upload/import/10912_main2.pdf https://arxiv.org/abs/2005.09448}, year = {2020}, date = {2020-05-01}, volume = {1}, address = {Bundesministerium für Bildung und Forschung Kapelle-Ufer 1 D-10117 Berlin}, institution = {BMBF, H2020}, abstract = {A shortage of dermatologists causes long wait times for patients who seek dermatologic care. In addition, the diagnostic accuracy of general practitioners has been reported to be lower than the accuracy of artificial intelligence software. This article describes the Skincare project (H2020, EIT Digital). Contributions include enabling technology for clinical decision support based on interactive machine learning (IML), a reference architecture towards a Digital European Healthcare Infrastructure (also cf. EIT MCPS), technical components for aggregating digitised patient information, and the integration of decision support technology into clinical test-bed environments. However, the main contribution is a diagnostic and decision support system in dermatology for patients and doctors, an interactive deep learning system for differential diagnosis of malignant skin lesions. In this article, we describe its functionalities and the user interfaces to facilitate machine learning from human input. The baseline deep learning system, which delivers state-of-the-art results and the potential to augment general practitioners and even dermatologists, was developed and validated using de-identified cases from a dermatology image data base (ISIC), which has about 20000 cases for development and validation, provided by board-certified dermatologists defining the reference standard for every case. ISIC allows for differential diagnosis, a ranked list of eight diagnoses, that is used to plan treatments in the common setting of diagnostic ambiguity. We give an overall description of the outcome of the Skincare project, and we focus on the steps to support communication and coordination between humans and machine in IML. This is an integral part of the development of future cognitive assistants in the medical domain, and we describe the necessary intelligent user interfaces.}, keywords = {}, pubstate = {published}, tppubtype = {techreport} } A shortage of dermatologists causes long wait times for patients who seek dermatologic care. In addition, the diagnostic accuracy of general practitioners has been reported to be lower than the accuracy of artificial intelligence software. This article describes the Skincare project (H2020, EIT Digital). Contributions include enabling technology for clinical decision support based on interactive machine learning (IML), a reference architecture towards a Digital European Healthcare Infrastructure (also cf. EIT MCPS), technical components for aggregating digitised patient information, and the integration of decision support technology into clinical test-bed environments. However, the main contribution is a diagnostic and decision support system in dermatology for patients and doctors, an interactive deep learning system for differential diagnosis of malignant skin lesions. In this article, we describe its functionalities and the user interfaces to facilitate machine learning from human input. The baseline deep learning system, which delivers state-of-the-art results and the potential to augment general practitioners and even dermatologists, was developed and validated using de-identified cases from a dermatology image data base (ISIC), which has about 20000 cases for development and validation, provided by board-certified dermatologists defining the reference standard for every case. ISIC allows for differential diagnosis, a ranked list of eight diagnoses, that is used to plan treatments in the common setting of diagnostic ambiguity. We give an overall description of the outcome of the Skincare project, and we focus on the steps to support communication and coordination between humans and machine in IML. This is an integral part of the development of future cognitive assistants in the medical domain, and we describe the necessary intelligent user interfaces. |
Sonntag, Daniel Künstliche Intelligenz gegen das Coronavirus Technical Report DFKI, BMBF, BMG , 2020. @techreport{10809, title = {Künstliche Intelligenz gegen das Coronavirus}, author = {Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/10809_corona2.pages.pdf}, year = {2020}, date = {2020-01-01}, volume = {1}, institution = {DFKI, BMBF, BMG}, abstract = {Künstliche Intelligenz hat in den letzten Jahren eine neue Reifephase erreicht und entwickelt sich zum Treiber der Digitalisierung in allen Lebensbereichen. Die KI ist eine Querschnittstechnologie, die für alle Bereiche der Medizin mit Bilddaten, Textdaten und Biodaten von großer Bedeutung ist. Es gibt keinen medizinischen Bereich, der nicht von KI beeinflusst werden wird (siehe auch http://www.dfki.de/ MedicalCPS/?p=1111). Hier werden vier Felder gegen das Coronavirus beleuchtet, (1) die Bilddiagnostik, (2) Gensequenzierung, (3) die automatische Auswertung medizinischer Texte und (4) das Katastrophenmanagement.}, keywords = {}, pubstate = {published}, tppubtype = {techreport} } Künstliche Intelligenz hat in den letzten Jahren eine neue Reifephase erreicht und entwickelt sich zum Treiber der Digitalisierung in allen Lebensbereichen. Die KI ist eine Querschnittstechnologie, die für alle Bereiche der Medizin mit Bilddaten, Textdaten und Biodaten von großer Bedeutung ist. Es gibt keinen medizinischen Bereich, der nicht von KI beeinflusst werden wird (siehe auch http://www.dfki.de/ MedicalCPS/?p=1111). Hier werden vier Felder gegen das Coronavirus beleuchtet, (1) die Bilddiagnostik, (2) Gensequenzierung, (3) die automatische Auswertung medizinischer Texte und (4) das Katastrophenmanagement. |
Kalimuthu, Marimuthu; Nunnari, Fabrizio; Sonntag, Daniel A Competitive Deep Neural Network Approach for the ImageCLEFmed Caption 2020 Task Technical Report German Research Center for Artificial Intelligence , 2020. @techreport{11188, title = {A Competitive Deep Neural Network Approach for the ImageCLEFmed Caption 2020 Task}, author = {Marimuthu Kalimuthu and Fabrizio Nunnari and Daniel Sonntag}, year = {2020}, date = {2020-01-01}, volume = {o.A.}, institution = {German Research Center for Artificial Intelligence}, abstract = {The aim of ImageCLEFmed Caption task is to develop a system that automatically labels radiology images with relevant medical concepts. We describe our Deep Neural Network (DNN) based approach for tackling this problem. On the challenge test set of 3,534 radiology images, our system achieves an F1 score of 0.375 and ranks high, 12th among all systems that were successfully submitted to the challenge, whereby we only rely on the provided data sources and do not use any external medical knowledge or ontologies, or pretrained models from other medical image repositories or application domains.}, keywords = {}, pubstate = {published}, tppubtype = {techreport} } The aim of ImageCLEFmed Caption task is to develop a system that automatically labels radiology images with relevant medical concepts. We describe our Deep Neural Network (DNN) based approach for tackling this problem. On the challenge test set of 3,534 radiology images, our system achieves an F1 score of 0.375 and ranks high, 12th among all systems that were successfully submitted to the challenge, whereby we only rely on the provided data sources and do not use any external medical knowledge or ontologies, or pretrained models from other medical image repositories or application domains. |
2019 |
Journal Articles |
Barz, Michael; Sonntag, Daniel Incremental Improvement of a Question Answering System by Re-ranking Answer Candidates using Machine Learning Journal Article Computing Research Repository eprint Journal, abs/1908.10149 , pp. 1-13, 2019. @article{10895, title = {Incremental Improvement of a Question Answering System by Re-ranking Answer Candidates using Machine Learning}, author = {Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/10895_1908.10149.pdf https://arxiv.org/abs/1908.10149}, year = {2019}, date = {2019-08-01}, journal = {Computing Research Repository eprint Journal}, volume = {abs/1908.10149}, pages = {1-13}, publisher = {arXiv}, abstract = {We implement a method for re-ranking top-10 results of a state-of-the-art question answering (QA) system. The goal of our re-ranking approach is to improve the answer selection given the user question and the top-10 candidates. We focus on improving deployed QA systems that do not allow re-training or re-training comes at a high cost. Our re-ranking approach learns a similarity function using n-gram based features using the query, the answer and the initial system confidence as input. Our contributions are: (1) we generate a QA training corpus starting from 877 answers from the customer care domain of T-Mobile Austria, (2) we implement a state-of-the-art QA pipeline using neural sentence embeddings that encode queries in the same space than the answer index, and (3) we evaluate the QA pipeline and our re-ranking approach using a separately provided test set. The test set can be considered to be available after deployment of the system, e.g., based on feedback of users. Our results show that the system performance, in terms of top-n accuracy and the mean reciprocal rank, benefits from re-ranking using gradient boosted regression trees. On average, the mean reciprocal rank improves by 9.15%.}, keywords = {}, pubstate = {published}, tppubtype = {article} } We implement a method for re-ranking top-10 results of a state-of-the-art question answering (QA) system. The goal of our re-ranking approach is to improve the answer selection given the user question and the top-10 candidates. We focus on improving deployed QA systems that do not allow re-training or re-training comes at a high cost. Our re-ranking approach learns a similarity function using n-gram based features using the query, the answer and the initial system confidence as input. Our contributions are: (1) we generate a QA training corpus starting from 877 answers from the customer care domain of T-Mobile Austria, (2) we implement a state-of-the-art QA pipeline using neural sentence embeddings that encode queries in the same space than the answer index, and (3) we evaluate the QA pipeline and our re-ranking approach using a separately provided test set. The test set can be considered to be available after deployment of the system, e.g., based on feedback of users. Our results show that the system performance, in terms of top-n accuracy and the mean reciprocal rank, benefits from re-ranking using gradient boosted regression trees. On average, the mean reciprocal rank improves by 9.15%. |
Sonntag, Daniel Künstliche Intelligenz in der Medizin -- Holzweg oder Heilversprechen? Journal Article HNO, 67 , pp. 343-349, 2019. @article{10833, title = {Künstliche Intelligenz in der Medizin -- Holzweg oder Heilversprechen?}, author = {Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/10833_sonntag-hno-ki-DFKI-repository.pdf}, doi = {https://doi.org/10.1007/s00106-019-0665-z}, year = {2019}, date = {2019-01-01}, journal = {HNO}, volume = {67}, pages = {343-349}, publisher = {Springer}, abstract = {Künstliche Intelligenz (KI) hat in den letzten Jahren eine neue Reifephase erreicht und entwickelt sich zum Treiber der Digitalisierung in allen Lebensbereichen. Die KI ist eine Querschnittstechnologie, die für alle Bereiche der Medizin mit Bilddaten, Textdaten und Biodaten von großer Bedeutung ist. Es gibt keinen medizinischen Bereich, der nicht von KI beeinflusst werden wird. Dabei spielt die klinische Entscheidungsunterstützung eine wichtige Rolle. Gerade beim medizinischen Workflow-Management und bei der Vorhersage des Behandlungserfolgs bzw. Behandlungsergebnisses etablieren sich KI-Methoden. In der Bilddiagnose und im Patientenmanagement können KI-Systeme bereits unterstützen, aber sie können keine kritischen Entscheidungen vorschlagen. Die jeweiligen Präventions- oder Therapiemaßnahmen können mit KI-Unterstützung sinnvoller bewertet werden, allerdings ist die Abdeckung der Krankheiten noch viel zu gering, um robuste Systeme für den klinischen Alltag zu erstellen. Der flächendeckende Einsatz setzt Fortbildungsmaßnahmen für Ärzte voraus, um die Entscheidung treffen zu können, wann auf automatische Entscheidungsunterstützung vertraut werden kann.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Künstliche Intelligenz (KI) hat in den letzten Jahren eine neue Reifephase erreicht und entwickelt sich zum Treiber der Digitalisierung in allen Lebensbereichen. Die KI ist eine Querschnittstechnologie, die für alle Bereiche der Medizin mit Bilddaten, Textdaten und Biodaten von großer Bedeutung ist. Es gibt keinen medizinischen Bereich, der nicht von KI beeinflusst werden wird. Dabei spielt die klinische Entscheidungsunterstützung eine wichtige Rolle. Gerade beim medizinischen Workflow-Management und bei der Vorhersage des Behandlungserfolgs bzw. Behandlungsergebnisses etablieren sich KI-Methoden. In der Bilddiagnose und im Patientenmanagement können KI-Systeme bereits unterstützen, aber sie können keine kritischen Entscheidungen vorschlagen. Die jeweiligen Präventions- oder Therapiemaßnahmen können mit KI-Unterstützung sinnvoller bewertet werden, allerdings ist die Abdeckung der Krankheiten noch viel zu gering, um robuste Systeme für den klinischen Alltag zu erstellen. Der flächendeckende Einsatz setzt Fortbildungsmaßnahmen für Ärzte voraus, um die Entscheidung treffen zu können, wann auf automatische Entscheidungsunterstützung vertraut werden kann. |
Book Chapters |
Sonntag, Daniel Medical and Health Systems Book Chapter The Handbook of Multimodal-Multisensor Interfaces: Language Processing, Software, Commercialization, and Emerging Directions - Volume 3, pp. 423-476, Association for Computing Machinery and Morgan & Claypool, 2019. @inbook{10812, title = {Medical and Health Systems}, author = {Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/10812_Medical-and-Health-Systems.pdf}, doi = {https://doi.org/10.1145/3233795.3233808}, year = {2019}, date = {2019-01-01}, booktitle = {The Handbook of Multimodal-Multisensor Interfaces: Language Processing, Software, Commercialization, and Emerging Directions - Volume 3}, pages = {423-476}, publisher = {Association for Computing Machinery and Morgan & Claypool}, abstract = {In this chapter, we discuss the trends of mutlimodal-multisensor interfaces for medical and health systems. We emphasize the theoretical foundations of multimodal interfaces and systems in the healthcare domain. We aim to provide a basis for motivating and accelerating future interfaces for medical and health systems. Therefore, we provide many examples of existing and futuristic systems. For each of these systems, we define a classification into clinical systems and non-clinical systems, as well as sub-classes of multimodal and multisensor interfaces, to help structure the recent work in this emerging research field of medical and health systems.}, keywords = {}, pubstate = {published}, tppubtype = {inbook} } In this chapter, we discuss the trends of mutlimodal-multisensor interfaces for medical and health systems. We emphasize the theoretical foundations of multimodal interfaces and systems in the healthcare domain. We aim to provide a basis for motivating and accelerating future interfaces for medical and health systems. Therefore, we provide many examples of existing and futuristic systems. For each of these systems, we define a classification into clinical systems and non-clinical systems, as well as sub-classes of multimodal and multisensor interfaces, to help structure the recent work in this emerging research field of medical and health systems. |
Incollections |
Feld, Michael; Neßelrath, Robert; Schwartz, Tim Software Platforms and Toolkits for Building Multimodal Systems and Applications Incollection Oviatt, Sharon; Schuller, Björn; Cohen, Philip R; Potamianos, Gerasimos; Krüger, Antonio; Sonntag, Daniel (Ed.): The Handbook of Multimodal-Multisensor Interfaces, Volume 3 -- Language Processing, Software, Commercialization, and Emerging Directions, #23 , pp. 145-190, Morgan & Claypool Publishers, 2019. @incollection{10492, title = {Software Platforms and Toolkits for Building Multimodal Systems and Applications}, author = {Michael Feld and Robert Neßelrath and Tim Schwartz}, editor = {Sharon Oviatt and Björn Schuller and Philip R Cohen and Gerasimos Potamianos and Antonio Krüger and Daniel Sonntag}, url = {http://www.morganclaypoolpublishers.com/catalog_Orig/product_info.php?products_id=1428}, year = {2019}, date = {2019-07-01}, booktitle = {The Handbook of Multimodal-Multisensor Interfaces, Volume 3 -- Language Processing, Software, Commercialization, and Emerging Directions}, volume = {#23}, pages = {145-190}, publisher = {Morgan & Claypool Publishers}, abstract = {This chapter introduces various concepts needed for the realization of multimodal systems. Alongside an overview of the evolution of multimodal dialogue platform architectures, we give an overview of the major components found in most of today’s architectures: input and output processing; fusion and discourse processing; dialogue management; fission and presentation planning; and middleware. We compare several different dialogue management approaches, look in more detail at how the fusion component works, and introduce dialogue act annotation with communicative functions. We will explain the multimodal reference resolution process and consider the special case of cross-modal references. Finally, we present SiAM- dp, an actual multimodal dialogue platform used in a number of research projects and prototypes and highlight some of its particular features.}, keywords = {}, pubstate = {published}, tppubtype = {incollection} } This chapter introduces various concepts needed for the realization of multimodal systems. Alongside an overview of the evolution of multimodal dialogue platform architectures, we give an overview of the major components found in most of today’s architectures: input and output processing; fusion and discourse processing; dialogue management; fission and presentation planning; and middleware. We compare several different dialogue management approaches, look in more detail at how the fusion component works, and introduce dialogue act annotation with communicative functions. We will explain the multimodal reference resolution process and consider the special case of cross-modal references. Finally, we present SiAM- dp, an actual multimodal dialogue platform used in a number of research projects and prototypes and highlight some of its particular features. |
Inproceedings |
Biswas, Rajarshi; Mogadala, Aditya; Barz, Michael; Sonntag, Daniel; Klakow, Dietrich Automatic Judgement of Neural Network-Generated Image Captions Inproceedings Martin-Vide, Carlos; Purver, Matthew; Pollak, Senja (Ed.): Statistical Language and Speech Processing - 7th International Conference, Proceedings, pp. 261-272, Springer, Jamova cesta 39 1000 Ljubljana Slovenia, 2019. @inproceedings{10707, title = {Automatic Judgement of Neural Network-Generated Image Captions}, author = {Rajarshi Biswas and Aditya Mogadala and Michael Barz and Daniel Sonntag and Dietrich Klakow}, editor = {Carlos Martin-Vide and Matthew Purver and Senja Pollak}, url = {https://www.springerprofessional.de/en/automatic-judgement-of-neural-network-generated-image-captions/17214374}, year = {2019}, date = {2019-09-01}, booktitle = {Statistical Language and Speech Processing - 7th International Conference, Proceedings}, volume = {11816}, pages = {261-272}, publisher = {Springer}, address = {Jamova cesta 39 1000 Ljubljana Slovenia}, abstract = {Manual evaluation of individual results of natural language generation tasks is one of the bottlenecks. It is very time consuming and expensive if it is, for example, crowdsourced. In this work, we address this problem for the specific task of automatic image captioning. We automatically generate human-like judgements on grammatical correctness, image relevance and diversity of the captions obtained from a neural image caption generator. For this purpose, we use pool-based active learning with uncertainty sampling and represent the captions using fixed size vectors from Google’s Universal Sentence Encoder. In addition, we test common metrics, such as BLEU, ROUGE, METEOR, Levenshtein distance, and n-gram counts and report F1 score for the classifiers used under the active learning scheme for this task. To the best of our knowledge, our work is the first in this direction and promises to reduce time, cost, and human effort.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Manual evaluation of individual results of natural language generation tasks is one of the bottlenecks. It is very time consuming and expensive if it is, for example, crowdsourced. In this work, we address this problem for the specific task of automatic image captioning. We automatically generate human-like judgements on grammatical correctness, image relevance and diversity of the captions obtained from a neural image caption generator. For this purpose, we use pool-based active learning with uncertainty sampling and represent the captions using fixed size vectors from Google’s Universal Sentence Encoder. In addition, we test common metrics, such as BLEU, ROUGE, METEOR, Levenshtein distance, and n-gram counts and report F1 score for the classifiers used under the active learning scheme for this task. To the best of our knowledge, our work is the first in this direction and promises to reduce time, cost, and human effort. |
Kalimuthu, Marimuthu; Barz, Michael; Sonntag, Daniel Incremental Domain Adaptation for Neural Machine Translation in Low-Resource Settings Inproceedings Proceedings of the Fourth Arabic Natural Language Processing Workshop, pp. 1-10, Association for Computational Linguistics, 2019. @inproceedings{10520, title = {Incremental Domain Adaptation for Neural Machine Translation in Low-Resource Settings}, author = {Marimuthu Kalimuthu and Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/10520_W19-4601.pdf}, year = {2019}, date = {2019-08-01}, booktitle = {Proceedings of the Fourth Arabic Natural Language Processing Workshop}, pages = {1-10}, publisher = {Association for Computational Linguistics}, abstract = {We study the problem of incremental domain adaptation of a generic neural machine translation model with limited resources (e.g., budget and time) for human translations or model training. In this paper, we propose a novel query strategy for selecting ``unlabeled'' samples from a new domain based on sentence embeddings for Arabic. We accelerate the fine-tuning process of the generic model to the target domain. Specifically, our approach estimates the informativeness of instances from the target domain by comparing the distance of their sentence embeddings to embeddings from the generic domain. We perform machine translation experiments (Ar-to-En direction) for comparing a random sampling baseline with our new approach, similar to active learning, using two small update sets for simulating the work of human translators. For the prescribed setting we can save more than 50% of the annotation costs without loss in quality, demonstrating the effectiveness of our approach.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We study the problem of incremental domain adaptation of a generic neural machine translation model with limited resources (e.g., budget and time) for human translations or model training. In this paper, we propose a novel query strategy for selecting ``unlabeled'' samples from a new domain based on sentence embeddings for Arabic. We accelerate the fine-tuning process of the generic model to the target domain. Specifically, our approach estimates the informativeness of instances from the target domain by comparing the distance of their sentence embeddings to embeddings from the generic domain. We perform machine translation experiments (Ar-to-En direction) for comparing a random sampling baseline with our new approach, similar to active learning, using two small update sets for simulating the work of human translators. For the prescribed setting we can save more than 50% of the annotation costs without loss in quality, demonstrating the effectiveness of our approach. |
Prange, Alexander; Niemann, Mira; Latendorf, Antje; Steinert, Anika; Sonntag, Daniel Multimodal Speech-based Dialogue for the Mini-Mental State Examination Inproceedings Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, pp. CS13:1-CS13:8, ACM, 2019. @inproceedings{10353, title = {Multimodal Speech-based Dialogue for the Mini-Mental State Examination}, author = {Alexander Prange and Mira Niemann and Antje Latendorf and Anika Steinert and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/10353_2019_Multimodal_speech-based_dialogue_for_the_Mini-Mental_State_Examination.pdf http://doi.acm.org/10.1145/3290607.3299040}, year = {2019}, date = {2019-01-01}, booktitle = {Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems}, pages = {CS13:1-CS13:8}, publisher = {ACM}, abstract = {We present a system-initiative multimodal speech-based dialogue system for the Mini-Mental State Examination (MMSE). The MMSE is a questionnaire-based cognitive test, which is traditionally administered by a trained expert using pen and paper and afterwards scored manually to measure cognitive impairment. By using a digital pen and speech dialogue, we implement a multimodal system for the automatic execution and evaluation of the MMSE. User input is evaluated and scored in real-time. We present a user experience study with 15 participants and compare the usability of the proposed system with the traditional approach. Our experiment suggests that both modes perform equally well in terms of usability, but the proposed system has higher novelty ratings. We compare assessment scorings produced by our system with manual scorings made by domain experts.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We present a system-initiative multimodal speech-based dialogue system for the Mini-Mental State Examination (MMSE). The MMSE is a questionnaire-based cognitive test, which is traditionally administered by a trained expert using pen and paper and afterwards scored manually to measure cognitive impairment. By using a digital pen and speech dialogue, we implement a multimodal system for the automatic execution and evaluation of the MMSE. User input is evaluated and scored in real-time. We present a user experience study with 15 participants and compare the usability of the proposed system with the traditional approach. Our experiment suggests that both modes perform equally well in terms of usability, but the proposed system has higher novelty ratings. We compare assessment scorings produced by our system with manual scorings made by domain experts. |
Prange, Alexander; Sonntag, Daniel Modeling Cognitive Status through Automatic Scoring of a Digital Version of the Clock Drawing Test Inproceedings Proceedings of the 27th ACM Conference on User Modeling, Adaptation and Personalization, pp. 70-77, ACM, 2019. @inproceedings{10518, title = {Modeling Cognitive Status through Automatic Scoring of a Digital Version of the Clock Drawing Test}, author = {Alexander Prange and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/10518_umap.pdf}, doi = {https://doi.org/10.1145/3320435.3320452}, year = {2019}, date = {2019-01-01}, booktitle = {Proceedings of the 27th ACM Conference on User Modeling, Adaptation and Personalization}, pages = {70-77}, publisher = {ACM}, abstract = {The Clock Drawing Test is used as a cognitive assessment tool in geriatrics to detect signs of dementia or to model the progress of stroke recovery. The result is scored manually by a trained professional. We implement the Mendez scoring scheme and create a hierarchy of error categories that model the test characteristics of the clock drawing test, based on a set of impaired clock examples provided by a geriatrics clinic. Using a digital pen we recorded 120 clock samples for evaluating the automatic scoring system, with a total of 2400 error samples distributed over the 20 error classes of the Mendez scoring scheme. Error classes are scored automatically using a handwriting and gesture recognition framework. Results show that we provide a clinically relevant cognitive model for each subject. In addition, we heavily reduce the time spent on manual scoring. We compare manual scoring results with results produced by our automated system.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The Clock Drawing Test is used as a cognitive assessment tool in geriatrics to detect signs of dementia or to model the progress of stroke recovery. The result is scored manually by a trained professional. We implement the Mendez scoring scheme and create a hierarchy of error categories that model the test characteristics of the clock drawing test, based on a set of impaired clock examples provided by a geriatrics clinic. Using a digital pen we recorded 120 clock samples for evaluating the automatic scoring system, with a total of 2400 error samples distributed over the 20 error classes of the Mendez scoring scheme. Error classes are scored automatically using a handwriting and gesture recognition framework. Results show that we provide a clinically relevant cognitive model for each subject. In addition, we heavily reduce the time spent on manual scoring. We compare manual scoring results with results produced by our automated system. |
Technical Reports |
Sonntag, Daniel Wie funktionieren neuronale Netze eigentlich? Technical Report DFKI , 2019. @techreport{10725, title = {Wie funktionieren neuronale Netze eigentlich?}, author = {Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/10725_NN-DS.pdf}, year = {2019}, date = {2019-09-01}, volume = {1}, pages = {2}, institution = {DFKI}, abstract = {Wie funktionieren neuronale Netze eigentlich?}, keywords = {}, pubstate = {published}, tppubtype = {techreport} } Wie funktionieren neuronale Netze eigentlich? |
Profitlich, Hans-Jürgen; Sonntag, Daniel Interactivity and Transparency in Medical Risk Assessment with Supersparse Linear Integer Models Technical Report BMBF , 2019. @techreport{11177, title = {Interactivity and Transparency in Medical Risk Assessment with Supersparse Linear Integer Models}, author = {Hans-Jürgen Profitlich and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11177_integer.pdf http://arxiv.org/abs/1911.12119}, year = {2019}, date = {2019-01-01}, journal = {CoRR}, volume = {abs/1911.12119}, publisher = {ArXiv}, institution = {BMBF}, abstract = {Scoring systems are linear classification models that only require users to add or subtract a few small numbers in order to make a prediction. They are used for example by clinicians to assess the risk of medical conditions. This work focuses on our approach to implement an intuitive user interface to allow a clinician to generate such scoring systems interactively, based on the RiskSLIM machine learning library. We describe the technical architecture which allows a medical professional who is not specialised in developing and applying machine learning algorithms to create competitive transparent supersparse linear integer models in an interactive way. We demonstrate our prototype machine learning system in the nephrology domain, where doctors can interactively sub-select datasets to compute models, explore scoring tables that correspond to the learned models, and check the quality of the transparent solutions from a medical perspective.}, keywords = {}, pubstate = {published}, tppubtype = {techreport} } Scoring systems are linear classification models that only require users to add or subtract a few small numbers in order to make a prediction. They are used for example by clinicians to assess the risk of medical conditions. This work focuses on our approach to implement an intuitive user interface to allow a clinician to generate such scoring systems interactively, based on the RiskSLIM machine learning library. We describe the technical architecture which allows a medical professional who is not specialised in developing and applying machine learning algorithms to create competitive transparent supersparse linear integer models in an interactive way. We demonstrate our prototype machine learning system in the nephrology domain, where doctors can interactively sub-select datasets to compute models, explore scoring tables that correspond to the learned models, and check the quality of the transparent solutions from a medical perspective. |
2018 |
Journal Articles |
Prange, Alexander; Barz, Michael; Sonntag, Daniel A categorisation and implementation of digital pen features for behaviour characterisation Journal Article Computing Research Repository eprint Journal, abs/1810.03970 , pp. 1-42, 2018. @article{10183, title = {A categorisation and implementation of digital pen features for behaviour characterisation}, author = {Alexander Prange and Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/10183_1810.03970.pdf http://arxiv.org/abs/1810.03970}, year = {2018}, date = {2018-10-01}, journal = {Computing Research Repository eprint Journal}, volume = {abs/1810.03970}, pages = {1-42}, publisher = {arXiv}, abstract = {In this paper we provide a categorisation and implementation of digital ink features for behaviour characterisation. Based on four feature sets taken from literature, we provide a categorisation in different classes of syntactic and semantic features. We implemented a publicly available framework to calculate these features and show its deployment in the use case of analysing cognitive assessments performed using a digital pen.}, keywords = {}, pubstate = {published}, tppubtype = {article} } In this paper we provide a categorisation and implementation of digital ink features for behaviour characterisation. Based on four feature sets taken from literature, we provide a categorisation in different classes of syntactic and semantic features. We implemented a publicly available framework to calculate these features and show its deployment in the use case of analysing cognitive assessments performed using a digital pen. |
Zacharias, Jan; Barz, Michael; Sonntag, Daniel A Survey on Deep Learning Toolkits and Libraries for Intelligent User Interfaces Journal Article Computing Research Repository eprint Journal, abs/1803.04818 , pp. 1-10, 2018. @article{9857, title = {A Survey on Deep Learning Toolkits and Libraries for Intelligent User Interfaces}, author = {Jan Zacharias and Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/9857_2018_A_Survey_on_Deep_Learning_Toolkits_and_Libraries_for_Intelligent_User_Interfaces.pdf http://arxiv.org/abs/1803.04818}, year = {2018}, date = {2018-03-01}, journal = {Computing Research Repository eprint Journal}, volume = {abs/1803.04818}, pages = {1-10}, publisher = {arXiv.org}, abstract = {This paper provides an overview of prominent deep learning toolkits and, in particular, reports on recent publications that contributed open source software for implementing tasks that are common in intelligent user interfaces (IUI). We provide a scientific reference for researchers and software engineers who plan to utilise deep learning techniques within their IUI research and development projects.}, keywords = {}, pubstate = {published}, tppubtype = {article} } This paper provides an overview of prominent deep learning toolkits and, in particular, reports on recent publications that contributed open source software for implementing tasks that are common in intelligent user interfaces (IUI). We provide a scientific reference for researchers and software engineers who plan to utilise deep learning techniques within their IUI research and development projects. |
Sonntag, Daniel; Profitlich, Hans-Jürgen An architecture of open-source tools to combine textual information extraction, faceted search and information visualisation Journal Article Computing Research Repository eprint Journal, abs/1810.12627 , pp. 13-28, 2018. @article{11491, title = {An architecture of open-source tools to combine textual information extraction, faceted search and information visualisation}, author = {Daniel Sonntag and Hans-Jürgen Profitlich}, url = {https://www.dfki.de/fileadmin/user_upload/import/11491_1810.12627.pdf http://arxiv.org/abs/1810.12627}, year = {2018}, date = {2018-01-01}, journal = {Computing Research Repository eprint Journal}, volume = {abs/1810.12627}, pages = {13-28}, publisher = {Elsevier}, abstract = {This article presents our steps to integrate complex and partly unstructured medical data into a clinical research database with subsequent decision support. Our main application is an integrated faceted search tool, accompanied by the visualisation of results of automatic information extraction from textual documents. We describe the details of our technical architecture (open-source tools), to be replicated at other universities, research institutes, or hospitals. Our exemplary use cases are nephrology and mammography. The software was first developed in the nephrology domain and then adapted to the mammography use case. We report on these case studies, illustrating how the application can be used by a clinician and which questions can be answered. We show that our architecture and the employed software modules are suitable for both areas of application with a limited amount of adaptations. For example, in nephrology we try to answer questions about the temporal characteristics of event sequences to gain significant insight from the data for cohort selection. We present a versatile time-line tool that enables the user to explore relations between a multitude of diagnosis and laboratory values.}, keywords = {}, pubstate = {published}, tppubtype = {article} } This article presents our steps to integrate complex and partly unstructured medical data into a clinical research database with subsequent decision support. Our main application is an integrated faceted search tool, accompanied by the visualisation of results of automatic information extraction from textual documents. We describe the details of our technical architecture (open-source tools), to be replicated at other universities, research institutes, or hospitals. Our exemplary use cases are nephrology and mammography. The software was first developed in the nephrology domain and then adapted to the mammography use case. We report on these case studies, illustrating how the application can be used by a clinician and which questions can be answered. We show that our architecture and the employed software modules are suitable for both areas of application with a limited amount of adaptations. For example, in nephrology we try to answer questions about the temporal characteristics of event sequences to gain significant insight from the data for cohort selection. We present a versatile time-line tool that enables the user to explore relations between a multitude of diagnosis and laboratory values. |
Inproceedings |
Barz, Michael; Büyükdemircioglu, Neslihan; Surya, Rikhu Prasad; Polzehl, Tim; Sonntag, Daniel Device-Type Influence in Crowd-based Natural Language Translation Tasks (short paper) Inproceedings Aroyo, Lora; Dumitrache, Anca; Paritosh, Praveen; Quinn, Alexander J; Welty, Chris; Checco, Alessandro; Demartini, Gianluca; Gadiraju, Ujwal; Sarasua, Cristina (Ed.): Proceedings of the 1st Workshop on Subjectivity, Ambiguity and Disagreement in Crowdsourcing, and Short Paper Proceedings of the 1st Workshop on Disentangling the Relation Between Crowdsourcing and Bias Management (SAD 2018 and CrowdBias 2018), pp. 93-97, CEUR-WS.org, 2018. @inproceedings{10184, title = {Device-Type Influence in Crowd-based Natural Language Translation Tasks (short paper)}, author = {Michael Barz and Neslihan Büyükdemircioglu and Rikhu Prasad Surya and Tim Polzehl and Daniel Sonntag}, editor = {Lora Aroyo and Anca Dumitrache and Praveen Paritosh and Alexander J Quinn and Chris Welty and Alessandro Checco and Gianluca Demartini and Ujwal Gadiraju and Cristina Sarasua}, url = {https://www.dfki.de/fileadmin/user_upload/import/10184_paper12.pdf}, year = {2018}, date = {2018-12-01}, booktitle = {Proceedings of the 1st Workshop on Subjectivity, Ambiguity and Disagreement in Crowdsourcing, and Short Paper Proceedings of the 1st Workshop on Disentangling the Relation Between Crowdsourcing and Bias Management (SAD 2018 and CrowdBias 2018)}, volume = {2276}, pages = {93-97}, publisher = {CEUR-WS.org}, abstract = {The effect of users’ interaction devices and their platform (mobile vs. desktop) should be taken into account when evaluating the performance of translation tasks in crowdsourcing contexts. We investigate the influence of the device type and platform in a crowd-based translation workflow. We implement a crowd translation workflow and use it for translating a subset of the IWSLT parallel corpus from English to Arabic. In addition, we consider machine translations from a state-of-the-art machine translation system which can be used as translation candidates in a human computation workflow. The results of our experiment suggest that users with a mobile device judge translations systematically lower than users with a desktop device, when assessing the quality of machine translations. The perceived quality of shorter sentences is generally higher than the perceived quality of longer sentences.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The effect of users’ interaction devices and their platform (mobile vs. desktop) should be taken into account when evaluating the performance of translation tasks in crowdsourcing contexts. We investigate the influence of the device type and platform in a crowd-based translation workflow. We implement a crowd translation workflow and use it for translating a subset of the IWSLT parallel corpus from English to Arabic. In addition, we consider machine translations from a state-of-the-art machine translation system which can be used as translation candidates in a human computation workflow. The results of our experiment suggest that users with a mobile device judge translations systematically lower than users with a desktop device, when assessing the quality of machine translations. The perceived quality of shorter sentences is generally higher than the perceived quality of longer sentences. |
Stauden, Sven; Barz, Michael; Sonntag, Daniel Visual Search Target Inference Using Bag of Deep Visual Words Inproceedings Trollmann, Frank; Turhan, Anni-Yasmin (Ed.): KI 2018: Advances in Artificial Intelligence - 41st German Conference on AI, Springer, 2018. @inproceedings{10896, title = {Visual Search Target Inference Using Bag of Deep Visual Words}, author = {Sven Stauden and Michael Barz and Daniel Sonntag}, editor = {Frank Trollmann and Anni-Yasmin Turhan}, url = {https://www.dfki.de/fileadmin/user_upload/import/10896_2018_Visual_Search_Target_Inference_Using_Bag_of_Deep_Visual_Words.pdf}, year = {2018}, date = {2018-08-01}, booktitle = {KI 2018: Advances in Artificial Intelligence - 41st German Conference on AI}, publisher = {Springer}, abstract = {Visual Search target inference subsumes methods for predicting the target object through eye tracking. A person intents to find an object in a visual scene which we predict based on the fixation behavior. Knowing about the search target can improve intelligent user interaction. In this work, we implement a new feature encoding, the Bag of Deep Visual Words, for search target inference using a pre-trained convolutional neural network (CNN). Our work is based on a recent approach from the literature that uses Bag of Visual Words, common in computer vision applications. We evaluate our method using a gold standard dataset. The results show that our new feature encoding outperforms the baseline from the literature, in particular, when excluding fixations on the target.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Visual Search target inference subsumes methods for predicting the target object through eye tracking. A person intents to find an object in a visual scene which we predict based on the fixation behavior. Knowing about the search target can improve intelligent user interaction. In this work, we implement a new feature encoding, the Bag of Deep Visual Words, for search target inference using a pre-trained convolutional neural network (CNN). Our work is based on a recent approach from the literature that uses Bag of Visual Words, common in computer vision applications. We evaluate our method using a gold standard dataset. The results show that our new feature encoding outperforms the baseline from the literature, in particular, when excluding fixations on the target. |
Prange, Alexander; Barz, Michael; Sonntag, Daniel Medical 3D Images in Multimodal Virtual Reality Inproceedings IUI Companion, pp. 19:1-19:2, ACM, 2018. @inproceedings{9655, title = {Medical 3D Images in Multimodal Virtual Reality}, author = {Alexander Prange and Michael Barz and Daniel Sonntag}, year = {2018}, date = {2018-01-01}, booktitle = {IUI Companion}, pages = {19:1-19:2}, publisher = {ACM}, abstract = {We present a multimodal medical 3D image system for radiologists in an virtual reality (VR) environment. Users can walk freely inside the virtual room and interact with the system using speech, going through patient records, and manipulate 3D image data with hand gestures. Medical images are retrieved from the hospital's Picture and Archiving System (PACS) and displayed as 3D objects inside VR. Our system incorporates a dialogue-based decision support system for treatments. A central supervised patient database provides input to our predictive model and allows us, first, to add new examination reports by a pen-based mobile application on-the-fly, and second, to get therapy prediction results in real-time. This demo includes a visualisation of real patient records, 3D DICOM radiology image data, and real-time therapy predictions in VR.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We present a multimodal medical 3D image system for radiologists in an virtual reality (VR) environment. Users can walk freely inside the virtual room and interact with the system using speech, going through patient records, and manipulate 3D image data with hand gestures. Medical images are retrieved from the hospital's Picture and Archiving System (PACS) and displayed as 3D objects inside VR. Our system incorporates a dialogue-based decision support system for treatments. A central supervised patient database provides input to our predictive model and allows us, first, to add new examination reports by a pen-based mobile application on-the-fly, and second, to get therapy prediction results in real-time. This demo includes a visualisation of real patient records, 3D DICOM radiology image data, and real-time therapy predictions in VR. |
Barz, Michael; Daiber, Florian; Sonntag, Daniel; Bulling, Andreas Error-aware Gaze-based Interfaces for Robust Mobile Gaze Interaction Inproceedings Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications, pp. 24:1-24:10, ACM, 2018. @inproceedings{9818, title = {Error-aware Gaze-based Interfaces for Robust Mobile Gaze Interaction}, author = {Michael Barz and Florian Daiber and Daniel Sonntag and Andreas Bulling}, url = {https://www.dfki.de/fileadmin/user_upload/import/9818_a24-barz.pdf http://doi.acm.org/10.1145/3204493.3204536}, year = {2018}, date = {2018-01-01}, booktitle = {Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications}, pages = {24:1-24:10}, publisher = {ACM}, abstract = {Gaze estimation error can severely hamper usability and performance of mobile gaze-based interfaces given that the error varies constantly for different interaction positions. In this work, we explore error-aware gaze-based interfaces that estimate and adapt to gaze estimation error on-the-fly. We implement a sample error-aware user interface for gaze-based selection and different error compensation methods: a naïve approach that increases component size directly proportional to the absolute error, a recent model by Feit et al. that is based on the two-dimensional error distribution, and a novel predictive model that shifts gaze by a directional error estimate. We evaluate these models in a 12-participant user study and show that our predictive model significantly outperforms the others in terms of selection rate, particularly for small gaze targets. These results underline both the feasibility and potential of next generation error-aware gaze-based user interfaces.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Gaze estimation error can severely hamper usability and performance of mobile gaze-based interfaces given that the error varies constantly for different interaction positions. In this work, we explore error-aware gaze-based interfaces that estimate and adapt to gaze estimation error on-the-fly. We implement a sample error-aware user interface for gaze-based selection and different error compensation methods: a naïve approach that increases component size directly proportional to the absolute error, a recent model by Feit et al. that is based on the two-dimensional error distribution, and a novel predictive model that shifts gaze by a directional error estimate. We evaluate these models in a 12-participant user study and show that our predictive model significantly outperforms the others in terms of selection rate, particularly for small gaze targets. These results underline both the feasibility and potential of next generation error-aware gaze-based user interfaces. |
Niemann, Mira; Prange, Alexander; Sonntag, Daniel Towards a Multimodal Multisensory Cognitive Assessment Framework Inproceedings Proceedings of the 30th IEEE International Symposium on Computer-Based Medical System, IEEE, 2018. @inproceedings{9856, title = {Towards a Multimodal Multisensory Cognitive Assessment Framework}, author = {Mira Niemann and Alexander Prange and Daniel Sonntag}, year = {2018}, date = {2018-01-01}, booktitle = {Proceedings of the 30th IEEE International Symposium on Computer-Based Medical System}, publisher = {IEEE}, abstract = {Traditionally, neurocognitive testing is done using pen and paper, which is both expensive and time consuming and often leads to a biased outcome. In this paper, we present an approach towards selecting and digitizing existing cognitive tests and supporting the assessment of cognitive impairments through automated evaluation of different input modalities recorded during the assessments. Our multimodal multisensory framework currently records and analyzes handwriting input captured using a digital pen and electrodermal activity captured by the BITalino sensor board. Using artificial intelligence methods, we aim at analyzing the multisensory data in order to support objective assessments of cognitive impairments. In this work, we describe the current state of our framework and outline future research objectives.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Traditionally, neurocognitive testing is done using pen and paper, which is both expensive and time consuming and often leads to a biased outcome. In this paper, we present an approach towards selecting and digitizing existing cognitive tests and supporting the assessment of cognitive impairments through automated evaluation of different input modalities recorded during the assessments. Our multimodal multisensory framework currently records and analyzes handwriting input captured using a digital pen and electrodermal activity captured by the BITalino sensor board. Using artificial intelligence methods, we aim at analyzing the multisensory data in order to support objective assessments of cognitive impairments. In this work, we describe the current state of our framework and outline future research objectives. |
Miscellaneous |
Barz, Michael; Polzehl, Tim; Sonntag, Daniel Towards Hybrid Human-Machine Translation Services Miscellaneous EasyChair Preprint no. 333, 2018. @misc{9879, title = {Towards Hybrid Human-Machine Translation Services}, author = {Michael Barz and Tim Polzehl and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/9879_CI_2018_paper_22_(3).pdf http://ci.acm.org/2018/program/}, year = {2018}, date = {2018-01-01}, publisher = {EasyChair}, abstract = {Crowdsourcing is recently used to automate complex tasks when computational systems alone fail. The literature includes several contributions concerning natural language processing, e.g., language translation [Zaidan and Callison-Burch 2011; Minder and Bernstein 2012a; 2012b], also in combination with active learning [Green et al. 2015] and interactive model training [Zacharias et al. 2018]. In this work, we investigate (1) whether a (paid) crowd, that is acquired from a multilingual website’s community, is capable of translating coherent content from English to their mother tongue (we consider Arabic native speakers); and (2) in which cases state-of-the-art machine translation models can compete with human translations for automation in order to reduce task completion times and costs. The envisioned goal is a hybrid machine translation service that incrementally adapts machine translation models to new domains by employing human computation to make machine translation more competitive (see Figure 1). Recently, approaches for domain adoption of neural machine translation systems include filtering of generic corpora based on sentence embeddings of in-domain samples [Wang et al. 2017] have been proposed, as well as the fine-tuning with mixed batches containing domain and outof-domain samples [Chu et al. 2017] and with different regularization methods [Barone et al. 2017]. As a first step towards this goal, we conduct an experiment using a simple two-staged human computation algorithm for translating a subset of the IWSLT parallel corpus including English transcriptions of TED talks and reference translations in Arabic with a specifically acquired crowd. We compare the output with the state-of-the-art machine translation system Google Translate as a baseline.}, howpublished = {EasyChair Preprint no. 333}, keywords = {}, pubstate = {published}, tppubtype = {misc} } Crowdsourcing is recently used to automate complex tasks when computational systems alone fail. The literature includes several contributions concerning natural language processing, e.g., language translation [Zaidan and Callison-Burch 2011; Minder and Bernstein 2012a; 2012b], also in combination with active learning [Green et al. 2015] and interactive model training [Zacharias et al. 2018]. In this work, we investigate (1) whether a (paid) crowd, that is acquired from a multilingual website’s community, is capable of translating coherent content from English to their mother tongue (we consider Arabic native speakers); and (2) in which cases state-of-the-art machine translation models can compete with human translations for automation in order to reduce task completion times and costs. The envisioned goal is a hybrid machine translation service that incrementally adapts machine translation models to new domains by employing human computation to make machine translation more competitive (see Figure 1). Recently, approaches for domain adoption of neural machine translation systems include filtering of generic corpora based on sentence embeddings of in-domain samples [Wang et al. 2017] have been proposed, as well as the fine-tuning with mixed batches containing domain and outof-domain samples [Chu et al. 2017] and with different regularization methods [Barone et al. 2017]. As a first step towards this goal, we conduct an experiment using a simple two-staged human computation algorithm for translating a subset of the IWSLT parallel corpus including English transcriptions of TED talks and reference translations in Arabic with a specifically acquired crowd. We compare the output with the state-of-the-art machine translation system Google Translate as a baseline. |
2017 |
Journal Articles |
Sonntag, Daniel; Barz, Michael; Zacharias, Jan; Stauden, Sven; Rahmani, Vahid; Fóthi, Áron; andrincz, András Lő Fine-tuning deep CNN models on specific MS COCO categories Journal Article Computing Research Repository eprint Journal, abs/1709.01476 , pp. 0-3, 2017. @article{9241, title = {Fine-tuning deep CNN models on specific MS COCO categories}, author = {Daniel Sonntag and Michael Barz and Jan Zacharias and Sven Stauden and Vahid Rahmani and Áron Fóthi and András Lő andrincz}, url = {https://www.dfki.de/fileadmin/user_upload/import/9241_2017_Fine-tuning_deep_CNN_models_on_specific_MS_COCO_categories.pdf http://arxiv.org/abs/1709.01476}, year = {2017}, date = {2017-09-01}, journal = {Computing Research Repository eprint Journal}, volume = {abs/1709.01476}, pages = {0-3}, publisher = {arXiv.org}, abstract = {Fine-tuning of a deep convolutional neural network (CNN) is often desired. This paper provides an overview of our publicly available py-faster-rcnn-ft software library that can be used to fine-tune the VGG_CNN_M_1024 model on custom subsets of the Microsoft Common Objects in Context (MS COCO) dataset. For example, we improved the procedure so that the user does not have to look for suitable image files in the dataset by hand which can then be used in the demo program. Our implementation randomly selects images that contain at least one object of the categories on which the model is fine-tuned.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Fine-tuning of a deep convolutional neural network (CNN) is often desired. This paper provides an overview of our publicly available py-faster-rcnn-ft software library that can be used to fine-tune the VGG_CNN_M_1024 model on custom subsets of the Microsoft Common Objects in Context (MS COCO) dataset. For example, we improved the procedure so that the user does not have to look for suitable image files in the dataset by hand which can then be used in the demo program. Our implementation randomly selects images that contain at least one object of the categories on which the model is fine-tuned. |
Schmidt, Danilo; Budde, Klemens; Sonntag, Daniel; Profitlich, Hans-Jürgen; Ihle, Matthias; Staeck, Oliver A novel tool for the identification of correlations in medical data by faceted search Journal Article Computers in Biology and Medicine - An International Journal, 85 , pp. 98-105, 2017. @article{11493, title = {A novel tool for the identification of correlations in medical data by faceted search}, author = {Danilo Schmidt and Klemens Budde and Daniel Sonntag and Hans-Jürgen Profitlich and Matthias Ihle and Oliver Staeck}, url = {https://www.sciencedirect.com/science/article/pii/S0010482517300975}, year = {2017}, date = {2017-01-01}, journal = {Computers in Biology and Medicine - An International Journal}, volume = {85}, pages = {98-105}, publisher = {Elsevier}, abstract = {This work focuses on the integration of multifaceted extensive data sets (e.g. laboratory values, vital data, medications) and partly unstructured medical data such as discharge letters, diagnostic reports, clinical notes etc. in a research database. Our main application is an integrated faceted search in nephrology based on information extraction results. We describe the details of the application of transplant medicine and the resulting technical architecture of the faceted search application.}, keywords = {}, pubstate = {published}, tppubtype = {article} } This work focuses on the integration of multifaceted extensive data sets (e.g. laboratory values, vital data, medications) and partly unstructured medical data such as discharge letters, diagnostic reports, clinical notes etc. in a research database. Our main application is an integrated faceted search in nephrology based on information extraction results. We describe the details of the application of transplant medicine and the resulting technical architecture of the faceted search application. |
Inproceedings |
Barz, Michael; Poller, Peter; Schneider, Martin; Zillner, Sonja; Sonntag, Daniel; Mař, Vladimír; ík, Human-in-the-Loop Control Processes in Gas Turbine Maintenance Inproceedings Ma&#;ík, Vladimír; Strasser, Thomas; Kadera, Petr; Wahlster, Wolfgang (Ed.): Industrial Applications of Holonic and Multi-Agent Systems: 8th International Conference, HoloMAS 2017, Springer International Publishing, 2017. @inproceedings{9218, title = {Human-in-the-Loop Control Processes in Gas Turbine Maintenance}, author = {Michael Barz and Peter Poller and Martin Schneider and Sonja Zillner and Daniel Sonntag and Vladimír Mař and ík}, editor = {Vladimír Ma&#;ík and Thomas Strasser and Petr Kadera and Wolfgang Wahlster}, url = {https://www.dfki.de/fileadmin/user_upload/import/9218_2017_Human-in-the-Loop_Control_Processes_in_Gas_Turbine_Maintenance.pdf}, year = {2017}, date = {2017-08-01}, booktitle = {Industrial Applications of Holonic and Multi-Agent Systems: 8th International Conference, HoloMAS 2017}, publisher = {Springer International Publishing}, abstract = {In this applied research paper, we describe an architecture for seamlessly integrating factory workers in industrial cyber-physical production environments. Our human-in-the-loop control process uses novel input techniques and relies on state-of-the-art industry standards. Our architecture allows for real-time processing of semantically annotated data from multiple sources (e.g., machine sensors, user input devices) and real-time analysis of data for anomaly detection and recovery. We use a semantic knowledge base for storing and querying data (http://www.metaphacts.com) and the Business Process Model and Notation (BPMN) for modelling and controlling the process. We exemplify our industrial solution in the use case of the maintenance of a Siemens gas turbine. We report on this case study and show the advantages of our approach for smart factories. An informal evaluation in the gas turbine maintenance use case shows the utility of automated anomaly detection and handling: workers can fill in paper-based incident reports by using a digital pen; the digitised version is stored in metaphacts and linked to semantic knowledge sources such as process models, structure models, business process models, and user models. Subsequently, automatic maintenance and recovery processes that involve human experts are triggered.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } In this applied research paper, we describe an architecture for seamlessly integrating factory workers in industrial cyber-physical production environments. Our human-in-the-loop control process uses novel input techniques and relies on state-of-the-art industry standards. Our architecture allows for real-time processing of semantically annotated data from multiple sources (e.g., machine sensors, user input devices) and real-time analysis of data for anomaly detection and recovery. We use a semantic knowledge base for storing and querying data (http://www.metaphacts.com) and the Business Process Model and Notation (BPMN) for modelling and controlling the process. We exemplify our industrial solution in the use case of the maintenance of a Siemens gas turbine. We report on this case study and show the advantages of our approach for smart factories. An informal evaluation in the gas turbine maintenance use case shows the utility of automated anomaly detection and handling: workers can fill in paper-based incident reports by using a digital pen; the digitised version is stored in metaphacts and linked to semantic knowledge sources such as process models, structure models, business process models, and user models. Subsequently, automatic maintenance and recovery processes that involve human experts are triggered. |
Prange, Alexander; Chikobava, Margarita; Poller, Peter; Barz, Michael; Sonntag, Daniel A Multimodal Dialogue System for Medical Decision Support inside Virtual Reality Inproceedings Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pp. 23-26, Association for Computational Linguistics, 2017. @inproceedings{9219, title = {A Multimodal Dialogue System for Medical Decision Support inside Virtual Reality}, author = {Alexander Prange and Margarita Chikobava and Peter Poller and Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/9219_2017_A_Multimodal_Dialogue_System_for_Medical_Decision_Support_in_Virtual_Reality.pdf}, year = {2017}, date = {2017-08-01}, booktitle = {Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue}, pages = {23-26}, publisher = {Association for Computational Linguistics}, abstract = {We present a multimodal dialogue system that allows doctors to interact with a medical decision support system in virtual reality (VR). We integrate an interactive visualization of patient records and radiology image data, as well as therapy predictions. Therapy predictions are computed in real-time using a deep learning model.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We present a multimodal dialogue system that allows doctors to interact with a medical decision support system in virtual reality (VR). We integrate an interactive visualization of patient records and radiology image data, as well as therapy predictions. Therapy predictions are computed in real-time using a deep learning model. |
Prange, Alexander; Schmidt, Danilo; Sonntag, Daniel A Digital Pen Based Tool for Instant Digitisation and Digitalisation of Biopsy Protocols Inproceedings 2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS), pp. 773-774, IEEE Xplore, 2017. @inproceedings{11235, title = {A Digital Pen Based Tool for Instant Digitisation and Digitalisation of Biopsy Protocols}, author = {Alexander Prange and Danilo Schmidt and Daniel Sonntag}, year = {2017}, date = {2017-06-01}, booktitle = {2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS)}, pages = {773-774}, publisher = {IEEE Xplore}, abstract = {In order to improve medical processes in nephrology, we present an application that allows doctors to create biopsy protocols by using a digital pen on a tablet. The biopsy protocol app is seamlessly integrated into the existing infrastructure at the hospital (see figure 1). Compared to other reporting tools, we provide (1) real-time hand-writing/gesture recognition and real-time feedback on the recognition results on the screen; (2) a real-time digitisation into structured data and PDF documents; and (3) the mapping of the transcribed contents into concepts of the Banff classification. Our approach combines the benefits of paper with the automatic digitisation and digitalisation of hand-written user input. A fully digital and mobile approach should empower nephrologists to produce high quality data more effectively and in real-time so that it can be directly used in hospital processes.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } In order to improve medical processes in nephrology, we present an application that allows doctors to create biopsy protocols by using a digital pen on a tablet. The biopsy protocol app is seamlessly integrated into the existing infrastructure at the hospital (see figure 1). Compared to other reporting tools, we provide (1) real-time hand-writing/gesture recognition and real-time feedback on the recognition results on the screen; (2) a real-time digitisation into structured data and PDF documents; and (3) the mapping of the transcribed contents into concepts of the Banff classification. Our approach combines the benefits of paper with the automatic digitisation and digitalisation of hand-written user input. A fully digital and mobile approach should empower nephrologists to produce high quality data more effectively and in real-time so that it can be directly used in hospital processes. |
2021 |
Inproceedings |
EyeLogin - Calibration-Free Authentication Method for Public Displays Using Eye Gaze Inproceedings ACM Symposium on Eye Tracking Research and Applications, Association for Computing Machinery, 2021. |
A Software Toolbox for Deploying Deep Learning Decision Support Systems with XAI Capabilities Inproceedings Companion of the 2021 ACM SIGCHI Symposium on Engineering Interactive Computing Systems, Association for Computing Machinery, 2021. |
Assessing Cognitive Test Performance Using Automatic Digital Pen Features Analysis Inproceedings Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization, Association for Computing Machinery, 2021. |
Self-Supervised Domain Adaptation for Diabetic Retinopathy Grading using Vessel Image Reconstruction Inproceedings Proceedings of the 44th German Conference on Artificial Intelligence, Springer, 2021. |
On the Overlap Between Grad-CAM Saliency Maps and Explainable Visual Features in Skin Cancer Images Inproceedings Holzinger, Andreas; Kieseberg, Peter; Tjoa, Min A; Weippl, Edgar (Ed.): Machine Learning and Knowledge Extraction, pp. 241-253, Springer International Publishing, 2021. |
Anomaly Detection for Skin Lesion Images Using Replicator Neural Networks Inproceedings Holzinger, Andreas; Kieseberg, Peter; Tjoa, Min A; Weippl, Edgar (Ed.): Machine Learning and Knowledge Extraction, pp. 225-240, Springer International Publishing, 2021. |
Crop It, but Not Too Much: The Effects of Masking on the Classification of Melanoma Images Inproceedings Edelkamp, Stefan; Rueckert, Elmar; Möller, Ralf (Ed.): KI 2021: Advances in Artificial Intelligence, pp. 179-193, Springer International Publishing, 2021. |
A Demonstrator for Interactive Image Clustering and Fine-Tuning Neural Networks in Virtual Reality Inproceedings Edelkamp, Stefan; Rueckert, Elmar; Möller, Ralf (Ed.): KI 2021: Advances in Artificial Intelligence, pp. 194-203, Springer International Publishing, 2021. |
Multisensor-Pipeline: A Lightweight, Flexible, and Extensible Framework for Building Multimodal-Multisensor Interfaces Inproceedings Companion Publication of the 2021 International Conference on Multimodal Interaction, pp. 13-18, Association for Computing Machinery, 2021. |
Miscellaneous |
Interaction with Explanations in the XAINES Project Miscellaneous Trustworthy AI in the Wild Workshop 2021, 2021. |
Measuring Intrisic and Extraneous Cognitive Load in Elementary School Students Using Subjective Ratings and Smart Pen Data Miscellaneous 13th International Cognitive Load Theory Conference, 2021. |
13th International Cognitive Load Theory Conference, 2021. |
Augmented Reality zur Förderung globaler Kohärenzbildungsprozesse beim Experimentieren im Sachunterricht Miscellaneous Tagung der Fachgruppe Pädagogische Psychologie, 2021. |
Technical Reports |
BMBF Bundesministerium für Bildung und Forschung Kapelle-Ufer 1 D-10117 Berlin, , 2021. |
TATL: Task Agnostic Transfer Learning for Skin Attributes Detection Technical Report DFKI , 2021. |
2020 |
Journal Articles |
Towards Explanatory Interactive Image Captioning Using Top-Down and Bottom-Up Features, Beam Search and Re-ranking Journal Article KI - Künstliche Intelligenz, German Journal on Artificial Intelligence - Organ des Fachbereiches "Künstliche Intelligenz" der Gesellschaft für Informatik e.V., 36 , pp. 1-14, 2020. |
Digital pen technology for conducting cognitive assessments: a cross-over study with older adults Journal Article Psychological Research, 85 , pp. 1-9, 2020. |
Inproceedings |
The effects of masking in melanoma image classification with CNNs towards international standards for image preprocessing Inproceedings 2020 EAI International Symposium on Medical Artificial Intelligence, EAI, 2020. |
A Visually Explainable Learning System for Skin Lesion Detection Using Multiscale Input with Attention U-Net Inproceedings KI 2020: Advances in Artificial Intelligence, pp. 313-319, Springer, 2020. |
Digital Pen Features Predict Task Difficulty and User Performance of Cognitive Tests Inproceedings Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization, ACM, 2020. |
Visual Search Target Inference in Natural Interaction Settings with Machine Learning Inproceedings Bulling, Andreas; Huckauf, Anke; Jain, Eakta; Radach, Ralph; Weiskopf, Daniel (Ed.): ACM Symposium on Eye Tracking Research and Applications, Association for Computing Machinery, 2020. |
A Study on the Fusion of Pixels and Patient Metadata in CNN-Based Classification of Skin Lesion Images Inproceedings Holzinger, Andreas; Kieseberg, Peter; Tjoa, Min A; Weippl, Edgar (Ed.): Machine Learning and Knowledge Extraction, pp. 191-208, Springer International Publishing, 2020. |
Game of TUK: deploying a large-scale activity-boosting gamification project in a university context Inproceedings Mensch und Computer, ACM, 2020. |
Technical Reports |
The Skincare project, an interactive deep learning system for differential diagnosis of malignant skin lesions. Technical Report BMBF, H2020 Bundesministerium für Bildung und Forschung Kapelle-Ufer 1 D-10117 Berlin, , 2020. |
Künstliche Intelligenz gegen das Coronavirus Technical Report DFKI, BMBF, BMG , 2020. |
A Competitive Deep Neural Network Approach for the ImageCLEFmed Caption 2020 Task Technical Report German Research Center for Artificial Intelligence , 2020. |
2019 |
Journal Articles |
Incremental Improvement of a Question Answering System by Re-ranking Answer Candidates using Machine Learning Journal Article Computing Research Repository eprint Journal, abs/1908.10149 , pp. 1-13, 2019. |
Künstliche Intelligenz in der Medizin -- Holzweg oder Heilversprechen? Journal Article HNO, 67 , pp. 343-349, 2019. |
Book Chapters |
Medical and Health Systems Book Chapter The Handbook of Multimodal-Multisensor Interfaces: Language Processing, Software, Commercialization, and Emerging Directions - Volume 3, pp. 423-476, Association for Computing Machinery and Morgan & Claypool, 2019. |
Incollections |
Software Platforms and Toolkits for Building Multimodal Systems and Applications Incollection Oviatt, Sharon; Schuller, Björn; Cohen, Philip R; Potamianos, Gerasimos; Krüger, Antonio; Sonntag, Daniel (Ed.): The Handbook of Multimodal-Multisensor Interfaces, Volume 3 -- Language Processing, Software, Commercialization, and Emerging Directions, #23 , pp. 145-190, Morgan & Claypool Publishers, 2019. |
Inproceedings |
Automatic Judgement of Neural Network-Generated Image Captions Inproceedings Martin-Vide, Carlos; Purver, Matthew; Pollak, Senja (Ed.): Statistical Language and Speech Processing - 7th International Conference, Proceedings, pp. 261-272, Springer, Jamova cesta 39 1000 Ljubljana Slovenia, 2019. |
Incremental Domain Adaptation for Neural Machine Translation in Low-Resource Settings Inproceedings Proceedings of the Fourth Arabic Natural Language Processing Workshop, pp. 1-10, Association for Computational Linguistics, 2019. |
Multimodal Speech-based Dialogue for the Mini-Mental State Examination Inproceedings Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, pp. CS13:1-CS13:8, ACM, 2019. |
Modeling Cognitive Status through Automatic Scoring of a Digital Version of the Clock Drawing Test Inproceedings Proceedings of the 27th ACM Conference on User Modeling, Adaptation and Personalization, pp. 70-77, ACM, 2019. |
Technical Reports |
Wie funktionieren neuronale Netze eigentlich? Technical Report DFKI , 2019. |
Interactivity and Transparency in Medical Risk Assessment with Supersparse Linear Integer Models Technical Report BMBF , 2019. |
2018 |
Journal Articles |
A categorisation and implementation of digital pen features for behaviour characterisation Journal Article Computing Research Repository eprint Journal, abs/1810.03970 , pp. 1-42, 2018. |
A Survey on Deep Learning Toolkits and Libraries for Intelligent User Interfaces Journal Article Computing Research Repository eprint Journal, abs/1803.04818 , pp. 1-10, 2018. |
An architecture of open-source tools to combine textual information extraction, faceted search and information visualisation Journal Article Computing Research Repository eprint Journal, abs/1810.12627 , pp. 13-28, 2018. |
Inproceedings |
Device-Type Influence in Crowd-based Natural Language Translation Tasks (short paper) Inproceedings Aroyo, Lora; Dumitrache, Anca; Paritosh, Praveen; Quinn, Alexander J; Welty, Chris; Checco, Alessandro; Demartini, Gianluca; Gadiraju, Ujwal; Sarasua, Cristina (Ed.): Proceedings of the 1st Workshop on Subjectivity, Ambiguity and Disagreement in Crowdsourcing, and Short Paper Proceedings of the 1st Workshop on Disentangling the Relation Between Crowdsourcing and Bias Management (SAD 2018 and CrowdBias 2018), pp. 93-97, CEUR-WS.org, 2018. |
Visual Search Target Inference Using Bag of Deep Visual Words Inproceedings Trollmann, Frank; Turhan, Anni-Yasmin (Ed.): KI 2018: Advances in Artificial Intelligence - 41st German Conference on AI, Springer, 2018. |
Medical 3D Images in Multimodal Virtual Reality Inproceedings IUI Companion, pp. 19:1-19:2, ACM, 2018. |
Error-aware Gaze-based Interfaces for Robust Mobile Gaze Interaction Inproceedings Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications, pp. 24:1-24:10, ACM, 2018. |
Towards a Multimodal Multisensory Cognitive Assessment Framework Inproceedings Proceedings of the 30th IEEE International Symposium on Computer-Based Medical System, IEEE, 2018. |
Miscellaneous |
Towards Hybrid Human-Machine Translation Services Miscellaneous EasyChair Preprint no. 333, 2018. |
2017 |
Journal Articles |
Fine-tuning deep CNN models on specific MS COCO categories Journal Article Computing Research Repository eprint Journal, abs/1709.01476 , pp. 0-3, 2017. |
A novel tool for the identification of correlations in medical data by faceted search Journal Article Computers in Biology and Medicine - An International Journal, 85 , pp. 98-105, 2017. |
Inproceedings |
Human-in-the-Loop Control Processes in Gas Turbine Maintenance Inproceedings Ma&#;ík, Vladimír; Strasser, Thomas; Kadera, Petr; Wahlster, Wolfgang (Ed.): Industrial Applications of Holonic and Multi-Agent Systems: 8th International Conference, HoloMAS 2017, Springer International Publishing, 2017. |
A Multimodal Dialogue System for Medical Decision Support inside Virtual Reality Inproceedings Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pp. 23-26, Association for Computational Linguistics, 2017. |
A Digital Pen Based Tool for Instant Digitisation and Digitalisation of Biopsy Protocols Inproceedings 2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS), pp. 773-774, IEEE Xplore, 2017. |