2022 |
Journal Articles |
Ott, Torben; Masset, Paul; Gouvea, Thiago; Kepecs, Adam Apparent sunk cost effect in rational agents Journal Article Science Advances, 8 , pp. 1-10, 2022. @article{12243, title = {Apparent sunk cost effect in rational agents}, author = {Torben Ott and Paul Masset and Thiago Gouvea and Adam Kepecs}, url = {https://www.science.org/doi/10.1126/sciadv.abi7004}, year = {2022}, date = {2022-02-01}, journal = {Science Advances}, volume = {8}, pages = {1-10}, publisher = {American Association for the Advancement of Science}, abstract = {Rational decision makers aim to maximize their gains, but humans and other animals often fail to do so, exhibiting biases and distortions in their choice behavior. In a recent study of economic decisions, humans, mice, and rats were reported to succumb to the sunk cost fallacy, making decisions based on irrecoverable past investments to the detriment of expected future returns. We challenge this interpretation because it is subject to a statistical fallacy, a form of attrition bias, and the observed behavior can be explained without invoking a sunk cost–dependent mechanism. Using a computational model, we illustrate how a rational decision maker with a reward-maximizing decision strategy reproduces the reported behavioral pattern and propose an improved task design to dissociate sunk costs from fluctuations in decision valuation. Similar statistical confounds may be common in analyses of cognitive behaviors, highlighting the need to use causal statistical inference and generative models for interpretation.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Rational decision makers aim to maximize their gains, but humans and other animals often fail to do so, exhibiting biases and distortions in their choice behavior. In a recent study of economic decisions, humans, mice, and rats were reported to succumb to the sunk cost fallacy, making decisions based on irrecoverable past investments to the detriment of expected future returns. We challenge this interpretation because it is subject to a statistical fallacy, a form of attrition bias, and the observed behavior can be explained without invoking a sunk cost–dependent mechanism. Using a computational model, we illustrate how a rational decision maker with a reward-maximizing decision strategy reproduces the reported behavioral pattern and propose an improved task design to dissociate sunk costs from fluctuations in decision valuation. Similar statistical confounds may be common in analyses of cognitive behaviors, highlighting the need to use causal statistical inference and generative models for interpretation. |
Barz, Michael; Bhatti, Omair Shahzad; Sonntag, Daniel Implicit Estimation of Paragraph Relevance from Eye Movements Journal Article Frontiers in Computer Science, 3 , pp. 13, 2022. @article{12165, title = {Implicit Estimation of Paragraph Relevance from Eye Movements}, author = {Michael Barz and Omair Shahzad Bhatti and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/12165_fcomp-03-808507.pdf https://www.frontiersin.org/articles/10.3389/fcomp.2021.808507}, year = {2022}, date = {2022-01-01}, journal = {Frontiers in Computer Science}, volume = {3}, pages = {13}, publisher = {Frontiers Media S.A.}, abstract = {Eye movements were shown to be an effective source of implicit relevance feedback in constrained search and decision-making tasks. Recent research suggests that gaze-based features, extracted from scanpaths over short news articles (g-REL), can reveal the perceived relevance of read text with respect to a previously shown trigger question. In this work, we aim to confirm this finding and we investigate whether it generalizes to multi-paragraph documents from Wikipedia (Google Natural Questions) that require readers to scroll down to read the whole text. We conduct a user study (n=24) in which participants read single- and multi-paragraph articles and rate their relevance at the paragraph level with respect to a trigger question. We model the perceived document relevance using machine learning and features from the literature as input. Our results confirm that eye movements can be used to effectively model the relevance of short news articles, in particular if we exclude difficult cases: documents which are on topic of the trigger questions but irrelevant. However, our results do not clearly show that the modeling approach generalizes to multi-paragraph document settings. We publish our dataset and our code for feature extraction under an open source license to enable future research in the field of gaze-based implicit relevance feedback.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Eye movements were shown to be an effective source of implicit relevance feedback in constrained search and decision-making tasks. Recent research suggests that gaze-based features, extracted from scanpaths over short news articles (g-REL), can reveal the perceived relevance of read text with respect to a previously shown trigger question. In this work, we aim to confirm this finding and we investigate whether it generalizes to multi-paragraph documents from Wikipedia (Google Natural Questions) that require readers to scroll down to read the whole text. We conduct a user study (n=24) in which participants read single- and multi-paragraph articles and rate their relevance at the paragraph level with respect to a trigger question. We model the perceived document relevance using machine learning and features from the literature as input. Our results confirm that eye movements can be used to effectively model the relevance of short news articles, in particular if we exclude difficult cases: documents which are on topic of the trigger questions but irrelevant. However, our results do not clearly show that the modeling approach generalizes to multi-paragraph document settings. We publish our dataset and our code for feature extraction under an open source license to enable future research in the field of gaze-based implicit relevance feedback. |
Nguyen, Ho Minh Duy; Nguyen, Thu T; Vu, Huong; Pham, Quang; Nguyen, Manh-Duy; Nguyen, Binh T; Sonntag, Daniel TATL: Task Agnostic Transfer Learning for Skin Attributes Detection Journal Article Medical Image Analysis, 01 , pp. 1-27, 2022. @article{12216, title = {TATL: Task Agnostic Transfer Learning for Skin Attributes Detection}, author = {Ho Minh Duy Nguyen and Thu T Nguyen and Huong Vu and Quang Pham and Manh-Duy Nguyen and Binh T Nguyen and Daniel Sonntag}, url = {https://arxiv.org/pdf/2104.01641.pdf}, year = {2022}, date = {2022-01-01}, journal = {Medical Image Analysis}, volume = {01}, pages = {1-27}, publisher = {Elsevier}, abstract = {Existing skin attributes detection methods usually initialize with a pre-trained Imagenet network and then fine-tune on a medical target task. However, we argue that such approaches are suboptimal because medical datasets are largely different from ImageNet and often contain limited training samples. In this work, we propose Task Agnostic Transfer Learning (TATL), a novel framework motivated by dermatologists' behaviors in the skincare context. TATL learns an attribute-agnostic segmenter that detects lesion skin regions and then transfers this knowledge to a set of attribute-specific classifiers to detect each particular attribute. Since TATL's attribute-agnostic segmenter only detects skin attribute regions, it enjoys ample data from all attributes, allows transferring knowledge among features, and compensates for the lack of training data from rare attributes. We conduct extensive experiments to evaluate the proposed TATL transfer learning mechanism with various neural network architectures on two popular skin attributes detection benchmarks. The empirical results show that TATL not only works well with multiple architectures but also can achieve state-of-the-art performances, while enjoying minimal model and computational complexities. We also provide theoretical insights and explanations for why our transfer learning framework performs well in practice.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Existing skin attributes detection methods usually initialize with a pre-trained Imagenet network and then fine-tune on a medical target task. However, we argue that such approaches are suboptimal because medical datasets are largely different from ImageNet and often contain limited training samples. In this work, we propose Task Agnostic Transfer Learning (TATL), a novel framework motivated by dermatologists' behaviors in the skincare context. TATL learns an attribute-agnostic segmenter that detects lesion skin regions and then transfers this knowledge to a set of attribute-specific classifiers to detect each particular attribute. Since TATL's attribute-agnostic segmenter only detects skin attribute regions, it enjoys ample data from all attributes, allows transferring knowledge among features, and compensates for the lack of training data from rare attributes. We conduct extensive experiments to evaluate the proposed TATL transfer learning mechanism with various neural network architectures on two popular skin attributes detection benchmarks. The empirical results show that TATL not only works well with multiple architectures but also can achieve state-of-the-art performances, while enjoying minimal model and computational complexities. We also provide theoretical insights and explanations for why our transfer learning framework performs well in practice. |
Inproceedings |
Valdunciel, Pablo; Bhatti, Omair Shahzad; Barz, Michael; Sonntag, Daniel Interactive Assessment Tool for Gaze-based Machine Learning Models in Information Retrieval Inproceedings ACM SIGIR Conference on Human Information Interaction and Retrieval, Association for Computing Machinery, 2022. @inproceedings{12287, title = {Interactive Assessment Tool for Gaze-based Machine Learning Models in Information Retrieval}, author = {Pablo Valdunciel and Omair Shahzad Bhatti and Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/12287_3498366.3505834.pdf}, year = {2022}, date = {2022-03-01}, booktitle = {ACM SIGIR Conference on Human Information Interaction and Retrieval}, publisher = {Association for Computing Machinery}, abstract = {Eye movements were shown to be an effective source of implicit relevance feedback in information retrieval tasks. They can be used to, e.g., estimate the relevance of read documents and expand search queries using machine learning. In this paper, we present the Reading Model Assessment tool (ReMA), an interactive tool for assessing gaze-based relevance estimation models. Our tool allows experimenters to easily browse recorded trials, compare the model output to a ground truth, and visualize gaze-based features at the token- and paragraph-level that serve as model input. Our goal is to facilitate the understanding of the relation between eye movements and the human relevance estimation process, to understand the strengths and weaknesses of a model at hand, and, eventually, to enable researchers to build more effective models.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Eye movements were shown to be an effective source of implicit relevance feedback in information retrieval tasks. They can be used to, e.g., estimate the relevance of read documents and expand search queries using machine learning. In this paper, we present the Reading Model Assessment tool (ReMA), an interactive tool for assessing gaze-based relevance estimation models. Our tool allows experimenters to easily browse recorded trials, compare the model output to a ground truth, and visualize gaze-based features at the token- and paragraph-level that serve as model input. Our goal is to facilitate the understanding of the relation between eye movements and the human relevance estimation process, to understand the strengths and weaknesses of a model at hand, and, eventually, to enable researchers to build more effective models. |
Lauer, Luisa; Javaheri, Hamraz; Altmeyer, Kristin; Malone, Sarah; Grünerbl, Agnes; Barz, Michael; Peschel, Markus; Brünken, Roland; Lukowicz, Paul Encountering Students' Learning Difficulties in Electrics - Didactical Concept and Prototype of Augmented Reality-Toolkit Inproceedings Fostering scientific citizenship in an uncertain world - ESERA 2021 e-Proceedings, University of Minho, 2022. @inproceedings{12121, title = {Encountering Students' Learning Difficulties in Electrics - Didactical Concept and Prototype of Augmented Reality-Toolkit}, author = {Luisa Lauer and Hamraz Javaheri and Kristin Altmeyer and Sarah Malone and Agnes Grünerbl and Michael Barz and Markus Peschel and Roland Brünken and Paul Lukowicz}, url = {https://www.dfki.de/fileadmin/user_upload/import/12121_2022_Encountering_Students'_Learning_Difficulties_in_Electrics_-_Didactical_Concept_and_Prototype_of_Augmented_Reality-Toolkit.pdf}, year = {2022}, date = {2022-01-01}, booktitle = {Fostering scientific citizenship in an uncertain world - ESERA 2021 e-Proceedings}, publisher = {University of Minho}, abstract = {Real-time visualization of electrical circuit scematics in accordance to the components’ semantic connection• Use of the toolkit may faciliate the acquisition of representational competencies (concerning the matching of components and symbols and the matching of circuits and circuit schematics)• Usable with either handheld AR-devices or head-mounted AR-devices}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Real-time visualization of electrical circuit scematics in accordance to the components’ semantic connection• Use of the toolkit may faciliate the acquisition of representational competencies (concerning the matching of components and symbols and the matching of circuits and circuit schematics)• Usable with either handheld AR-devices or head-mounted AR-devices |
Nguyen, Ho Minh Duy; Henschel, Roberto; Rosenhahn, Bodo; Sonntag, Daniel; Swoboda, Paul LMGP: Lifted Multicut Meets Geometry Projections for Multi-Camera Multi-Object Tracking Inproceedings Conference on Computer Vision and Pattern Recognition (CVPR) 2022, IEEE/CVF, 2022. @inproceedings{12286, title = {LMGP: Lifted Multicut Meets Geometry Projections for Multi-Camera Multi-Object Tracking}, author = {Ho Minh Duy Nguyen and Roberto Henschel and Bodo Rosenhahn and Daniel Sonntag and Paul Swoboda}, url = {https://arxiv.org/pdf/2111.11892.pdf}, year = {2022}, date = {2022-01-01}, booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR) 2022}, publisher = {IEEE/CVF}, abstract = {Multi-Camera Multi-Object Tracking is currently drawing attention in the computer vision field due to its superior performance in real-world applications such as video surveillance with crowded scenes or in vast space. In this work, we propose a mathematically elegant multi-camera multiple object tracking approach based on a spatial-temporal lifted multicut formulation. Our model utilizes state-of-the-art tracklets produced by single-camera trackers as proposals. As these tracklets may contain ID-Switch errors, we refine them through a novel pre-clustering obtained from 3D geometry projections. As a result, we derive a better tracking graph without ID switches and more precise affinity costs for the data association phase. Tracklets are then matched to multi-camera trajectories by solving a global lifted multicut formulation that incorporates short and long-range temporal interactions on tracklets located in the same camera as well as inter-camera ones. Experimental results on the WildTrack dataset yield near-perfect result, outperforming state-of-the-art trackers on Campus while being on par on the PETS-09 dataset. We will make our implementations available upon acceptance of the paper.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Multi-Camera Multi-Object Tracking is currently drawing attention in the computer vision field due to its superior performance in real-world applications such as video surveillance with crowded scenes or in vast space. In this work, we propose a mathematically elegant multi-camera multiple object tracking approach based on a spatial-temporal lifted multicut formulation. Our model utilizes state-of-the-art tracklets produced by single-camera trackers as proposals. As these tracklets may contain ID-Switch errors, we refine them through a novel pre-clustering obtained from 3D geometry projections. As a result, we derive a better tracking graph without ID switches and more precise affinity costs for the data association phase. Tracklets are then matched to multi-camera trajectories by solving a global lifted multicut formulation that incorporates short and long-range temporal interactions on tracklets located in the same camera as well as inter-camera ones. Experimental results on the WildTrack dataset yield near-perfect result, outperforming state-of-the-art trackers on Campus while being on par on the PETS-09 dataset. We will make our implementations available upon acceptance of the paper. |
Miscellaneous |
Hartmann, Mareike; Anagnostopoulou, Aliki; Sonntag, Daniel Interactive Machine Learning for Image Captioning Miscellaneous The AAAI-22 Workshop on Interactive Machine Learning, 2022. @misc{12167, title = {Interactive Machine Learning for Image Captioning}, author = {Mareike Hartmann and Aliki Anagnostopoulou and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/12167_interactive_learning_for_image_captioning.pdf}, year = {2022}, date = {2022-02-01}, abstract = {We propose an approach for interactive learning for an image captioning model. As human feedback is expensive and modern neural network based approaches often require large amounts of supervised data to be trained, we envision a system that exploits human feedback as good as possible by multiplying the feedback using data augmentation methods, and integrating the resulting training examples into the model in a smart way. This approach has three key components, for which we need to find suitable practical implementations: feedback collection, data augmentation, and model update. We outline our idea and review different possibilities to address these tasks.}, howpublished = {The AAAI-22 Workshop on Interactive Machine Learning}, keywords = {}, pubstate = {published}, tppubtype = {misc} } We propose an approach for interactive learning for an image captioning model. As human feedback is expensive and modern neural network based approaches often require large amounts of supervised data to be trained, we envision a system that exploits human feedback as good as possible by multiplying the feedback using data augmentation methods, and integrating the resulting training examples into the model in a smart way. This approach has three key components, for which we need to find suitable practical implementations: feedback collection, data augmentation, and model update. We outline our idea and review different possibilities to address these tasks. |
Technical Reports |
Nguyen, Ho Minh Duy; Henschel, Roberto; Rosenhahn, Bodo; Sonntag, Daniel; Swoboda, Paul LMGP: Lifted Multicut Meets Geometry Projections for Multi-Camera Multi-Object Tracking Technical Report DFKI, MPI-INF , 2022. @techreport{12211, title = {LMGP: Lifted Multicut Meets Geometry Projections for Multi-Camera Multi-Object Tracking}, author = {Ho Minh Duy Nguyen and Roberto Henschel and Bodo Rosenhahn and Daniel Sonntag and Paul Swoboda}, url = {https://arxiv.org/pdf/2111.11892.pdf}, year = {2022}, date = {2022-01-01}, volume = {01}, institution = {DFKI, MPI-INF}, abstract = {Multi-Camera Multi-Object Tracking is currently drawing attention in the computer vision field due to its superior performance in real-world applications such as video surveillance with crowded scenes or in vast space. In this work, we propose a mathematically elegant multi-camera multiple object tracking approach based on a spatial-temporal lifted multicut formulation. Our model utilizes state-of-the-art tracklets produced by single-camera trackers as proposals. As these tracklets may contain ID-Switch errors, we refine them through a novel pre-clustering obtained from 3D geometry projections. As a result, we derive a better tracking graph without ID switches and more precise affinity costs for the data association phase. Tracklets are then matched to multi-camera trajectories by solving a global lifted multicut formulation that incorporates short and long-range temporal interactions on tracklets located in the same camera as well as inter-camera ones. Experimental results on the WildTrack dataset yield near-perfect result, outperforming state-of-the-art trackers on Campus while being on par on the PETS-09 dataset. We will make our implementations available upon acceptance of the paper.}, keywords = {}, pubstate = {published}, tppubtype = {techreport} } Multi-Camera Multi-Object Tracking is currently drawing attention in the computer vision field due to its superior performance in real-world applications such as video surveillance with crowded scenes or in vast space. In this work, we propose a mathematically elegant multi-camera multiple object tracking approach based on a spatial-temporal lifted multicut formulation. Our model utilizes state-of-the-art tracklets produced by single-camera trackers as proposals. As these tracklets may contain ID-Switch errors, we refine them through a novel pre-clustering obtained from 3D geometry projections. As a result, we derive a better tracking graph without ID switches and more precise affinity costs for the data association phase. Tracklets are then matched to multi-camera trajectories by solving a global lifted multicut formulation that incorporates short and long-range temporal interactions on tracklets located in the same camera as well as inter-camera ones. Experimental results on the WildTrack dataset yield near-perfect result, outperforming state-of-the-art trackers on Campus while being on par on the PETS-09 dataset. We will make our implementations available upon acceptance of the paper. |
2021 |
Journal Articles |
Sonntag, Daniel Künstliche Intelligenz in der Medizin und Gynäkologie – Holzweg oder Heilversprechen? Journal Article Der Gynäkologe, 1 , pp. 1-7, 2021. @article{11612, title = {Künstliche Intelligenz in der Medizin und Gynäkologie – Holzweg oder Heilversprechen?}, author = {Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11612_sonntag-gyn.pdf}, year = {2021}, date = {2021-04-01}, journal = {Der Gynäkologe}, volume = {1}, pages = {1-7}, publisher = {Springer}, abstract = {Künstliche Intelligenz (KI) hat in den letzten Jahren eine neue Reifephase erreicht und entwickelt sich zum Treiber der Digitalisierung in allen Lebensbereichen. Die KI ist eine Querschnittstechnologie, die für alle Bereiche der Medizin mit Bild‑, Text- und Biodaten von großer Bedeutung ist. Es gibt keinen medizinischen Bereich, der nicht von KI beeinflusst werden wird. Dabei spielt die klinische Entscheidungsunterstützung eine wichtige Rolle. KI-Methoden etablieren sich gerade beim medizinischen Workflow-Management und bei der Vorhersage des Behandlungserfolgs bzw. des Behandlungsergebnisses. KI-Systeme können bereits in Bilddiagnose und im Patientenmanagement unterstützen, aber keine kritischen Entscheidungen vorschlagen. Die jeweiligen Präventions- oder Therapiemaßnahmen können mit KI-Unterstützung sinnvoller bewertet werden, allerdings ist die Abdeckung der Krankheiten noch viel zu gering, um robuste Systeme für den klinischen Alltag zu erstellen. Der flächendeckende Einsatz setzt Fortbildungsmaßnahmen für Ärzte voraus, um die Entscheidung treffen zu können, wann auf automatische Entscheidungsunterstützung vertraut werden kann. Artificial intelligence (AI) has attained a new level of maturity in recent years and is becoming the driver of digitalization in all areas of life. AI is a cross-sectional technology with great importance for all areas of medicine employing image data, text data and bio-data. There is no medical field that will remain unaffected by AI, with AI-assisted clinical decision-making assuming a particularly important role. AI methods are becoming established in medical workflow management and for prediction of treatment success or treatment outcome. AI systems are already able to lend support to imaging-based diagnosis and patient management, but cannot suggest critical decisions. The corresponding preventive or therapeutic measures can be more rationally assessed with the help of AI, although the number of diseases covered is currently too low to create robust systems for routine clinical use. Prerequisite for the widespread use of AI systems is appropriate training to enable physicians to decide when computer-assisted decision-making can be relied upon.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Künstliche Intelligenz (KI) hat in den letzten Jahren eine neue Reifephase erreicht und entwickelt sich zum Treiber der Digitalisierung in allen Lebensbereichen. Die KI ist eine Querschnittstechnologie, die für alle Bereiche der Medizin mit Bild‑, Text- und Biodaten von großer Bedeutung ist. Es gibt keinen medizinischen Bereich, der nicht von KI beeinflusst werden wird. Dabei spielt die klinische Entscheidungsunterstützung eine wichtige Rolle. KI-Methoden etablieren sich gerade beim medizinischen Workflow-Management und bei der Vorhersage des Behandlungserfolgs bzw. des Behandlungsergebnisses. KI-Systeme können bereits in Bilddiagnose und im Patientenmanagement unterstützen, aber keine kritischen Entscheidungen vorschlagen. Die jeweiligen Präventions- oder Therapiemaßnahmen können mit KI-Unterstützung sinnvoller bewertet werden, allerdings ist die Abdeckung der Krankheiten noch viel zu gering, um robuste Systeme für den klinischen Alltag zu erstellen. Der flächendeckende Einsatz setzt Fortbildungsmaßnahmen für Ärzte voraus, um die Entscheidung treffen zu können, wann auf automatische Entscheidungsunterstützung vertraut werden kann. Artificial intelligence (AI) has attained a new level of maturity in recent years and is becoming the driver of digitalization in all areas of life. AI is a cross-sectional technology with great importance for all areas of medicine employing image data, text data and bio-data. There is no medical field that will remain unaffected by AI, with AI-assisted clinical decision-making assuming a particularly important role. AI methods are becoming established in medical workflow management and for prediction of treatment success or treatment outcome. AI systems are already able to lend support to imaging-based diagnosis and patient management, but cannot suggest critical decisions. The corresponding preventive or therapeutic measures can be more rationally assessed with the help of AI, although the number of diseases covered is currently too low to create robust systems for routine clinical use. Prerequisite for the widespread use of AI systems is appropriate training to enable physicians to decide when computer-assisted decision-making can be relied upon. |
Kapp, Sebastian; Barz, Michael; Mukhametov, Sergey; Sonntag, Daniel; Kuhn, Jochen ARETT: Augmented Reality Eye Tracking Toolkit for Head Mounted Displays Journal Article Sensors - Open Access Journal, 21 , pp. 18, 2021. @article{11528, title = {ARETT: Augmented Reality Eye Tracking Toolkit for Head Mounted Displays}, author = {Sebastian Kapp and Michael Barz and Sergey Mukhametov and Daniel Sonntag and Jochen Kuhn}, url = {https://www.dfki.de/fileadmin/user_upload/import/11528_2021_ARETT-_Augmented_Reality_Eye_Tracking_Toolkit_for_Head_Mounted_Displays.pdf https://www.mdpi.com/1424-8220/21/6/2234}, year = {2021}, date = {2021-01-01}, journal = {Sensors - Open Access Journal}, volume = {21}, pages = {18}, publisher = {Multidisciplinary Digital Publishing Institute (MDPI)}, abstract = {Currently an increasing number of head mounted displays (HMD) for virtual and augmented reality (VR/AR) are equipped with integrated eye trackers. Use cases of these integrated eye trackers include rendering optimization and gaze-based user interaction. In addition, visual attention in VR and AR is interesting for applied research based on eye tracking in cognitive or educational sciences for example. While some research toolkits for VR already exist, only a few target AR scenarios. In this work, we present an open-source eye tracking toolkit for reliable gaze data acquisition in AR based on Unity 3D and the Microsoft HoloLens 2, as well as an R package for seamless data analysis. Furthermore, we evaluate the spatial accuracy and precision of the integrated eye tracker for fixation targets with different distances and angles to the user (n=21). On average, we found that gaze estimates are reported with an angular accuracy of 0.83 degrees and a precision of 0.27 degrees while the user is resting, which is on par with state-of-the-art mobile eye trackers.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Currently an increasing number of head mounted displays (HMD) for virtual and augmented reality (VR/AR) are equipped with integrated eye trackers. Use cases of these integrated eye trackers include rendering optimization and gaze-based user interaction. In addition, visual attention in VR and AR is interesting for applied research based on eye tracking in cognitive or educational sciences for example. While some research toolkits for VR already exist, only a few target AR scenarios. In this work, we present an open-source eye tracking toolkit for reliable gaze data acquisition in AR based on Unity 3D and the Microsoft HoloLens 2, as well as an R package for seamless data analysis. Furthermore, we evaluate the spatial accuracy and precision of the integrated eye tracker for fixation targets with different distances and angles to the user (n=21). On average, we found that gaze estimates are reported with an angular accuracy of 0.83 degrees and a precision of 0.27 degrees while the user is resting, which is on par with state-of-the-art mobile eye trackers. |
Somfai, Ellák; Baffy, Benjámin; Fenech, Kristian; Guo, Changlu; Hosszú, Rita; Korózs, Dorina; Nunnari, Fabrizio; Pólik, Marcell; Sonntag, Daniel; Ulbert, Attila; Lorincz, András Minimizing false negative rate in melanoma detection and providing insight into the causes of classification Journal Article Computing Research Repository eprint Journal, abs/2102.09199 , pp. 1-14, 2021. @article{11613, title = {Minimizing false negative rate in melanoma detection and providing insight into the causes of classification}, author = {Ellák Somfai and Benjámin Baffy and Kristian Fenech and Changlu Guo and Rita Hosszú and Dorina Korózs and Fabrizio Nunnari and Marcell Pólik and Daniel Sonntag and Attila Ulbert and András Lorincz}, url = {https://www.dfki.de/fileadmin/user_upload/import/11613_2021_Minimizing_false_negative_rate_in_melanoma_detection_and_providing_insight_into_the_causes_of_classification.pdf https://arxiv.org/abs/2102.09199}, year = {2021}, date = {2021-01-01}, journal = {Computing Research Repository eprint Journal}, volume = {abs/2102.09199}, pages = {1-14}, publisher = {arXiv}, abstract = {Our goal is to bridge human and machine intelligence in melanoma detection. We develop a classification system exploiting a combination of visual pre-processing, deep learning, and ensembling for providing explanations to experts and to minimize false negative rate while maintaining high accuracy in melanoma detection. Source images are first automatically segmented using a U-net CNN. The result of the segmentation is then used to extract image sub-areas and specific parameters relevant in human evaluation, namely center, border, and asymmetry measures. These data are then processed by tailored neural networks which include structure searching algorithms. Partial results are then ensembled by a committee machine. Our evaluation on the largest skin lesion dataset which is publicly available today, ISIC-2019, shows improvement in all evaluated metrics over a baseline using the original images only. We also showed that indicative scores computed by the feature classifiers can provide useful insight into the various features on which the decision can be based.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Our goal is to bridge human and machine intelligence in melanoma detection. We develop a classification system exploiting a combination of visual pre-processing, deep learning, and ensembling for providing explanations to experts and to minimize false negative rate while maintaining high accuracy in melanoma detection. Source images are first automatically segmented using a U-net CNN. The result of the segmentation is then used to extract image sub-areas and specific parameters relevant in human evaluation, namely center, border, and asymmetry measures. These data are then processed by tailored neural networks which include structure searching algorithms. Partial results are then ensembled by a committee machine. Our evaluation on the largest skin lesion dataset which is publicly available today, ISIC-2019, shows improvement in all evaluated metrics over a baseline using the original images only. We also showed that indicative scores computed by the feature classifiers can provide useful insight into the various features on which the decision can be based. |
Barz, Michael; Sonntag, Daniel Automatic Visual Attention Detection for Mobile Eye Tracking Using Pre-Trained Computer Vision Models and Human Gaze Journal Article Sensors - Open Access Journal, 21 , pp. 21, 2021. @article{11668, title = {Automatic Visual Attention Detection for Mobile Eye Tracking Using Pre-Trained Computer Vision Models and Human Gaze}, author = {Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11668_sensors-21-04143-v2.pdf https://www.mdpi.com/1424-8220/21/12/4143}, year = {2021}, date = {2021-01-01}, journal = {Sensors - Open Access Journal}, volume = {21}, pages = {21}, publisher = {MDPI}, abstract = {Processing visual stimuli in a scene is essential for the human brain to make situation-aware decisions. These stimuli, which are prevalent subjects of diagnostic eye tracking studies, are commonly encoded as rectangular areas of interest (AOIs) per frame. Because it is a tedious manual annotation task, the automatic detection and annotation of visual attention to AOIs can accelerate and objectify eye tracking research, in particular for mobile eye tracking with egocentric video feeds. In this work, we implement two methods to automatically detect visual attention to AOIs using pre-trained deep learning models for image classification and object detection. Furthermore, we develop an evaluation framework based on the VISUS dataset and well-known performance metrics from the field of activity recognition. We systematically evaluate our methods within this framework, discuss potentials and limitations, and propose ways to improve the performance of future automatic visual attention detection methods.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Processing visual stimuli in a scene is essential for the human brain to make situation-aware decisions. These stimuli, which are prevalent subjects of diagnostic eye tracking studies, are commonly encoded as rectangular areas of interest (AOIs) per frame. Because it is a tedious manual annotation task, the automatic detection and annotation of visual attention to AOIs can accelerate and objectify eye tracking research, in particular for mobile eye tracking with egocentric video feeds. In this work, we implement two methods to automatically detect visual attention to AOIs using pre-trained deep learning models for image classification and object detection. Furthermore, we develop an evaluation framework based on the VISUS dataset and well-known performance metrics from the field of activity recognition. We systematically evaluate our methods within this framework, discuss potentials and limitations, and propose ways to improve the performance of future automatic visual attention detection methods. |
Lauer, Luisa; Altmeyer, Kristin; Malone, Sarah; Barz, Michael; Brünken, Roland; Sonntag, Daniel; Peschel, Markus Investigating the Usability of a Head-Mounted Display Augmented Reality Device in Elementary School Children Journal Article Sensors - Open Access Journal, 21 , pp. 20, 2021. @article{11866, title = {Investigating the Usability of a Head-Mounted Display Augmented Reality Device in Elementary School Children}, author = {Luisa Lauer and Kristin Altmeyer and Sarah Malone and Michael Barz and Roland Brünken and Daniel Sonntag and Markus Peschel}, url = {https://www.dfki.de/fileadmin/user_upload/import/11866_sensors-21-06623.pdf https://www.mdpi.com/1424-8220/21/19/6623}, year = {2021}, date = {2021-01-01}, journal = {Sensors - Open Access Journal}, volume = {21}, pages = {20}, publisher = {MDPI}, abstract = {Augmenting reality via head-mounted displays (HMD-AR) is an emerging technology in education. The interactivity provided by HMD-AR devices is particularly promising for learning, but presents a challenge to human activity recognition, especially with children. Recent technological advances regarding speech and gesture recognition concerning Microsoft’s HoloLens 2 may address this prevailing issue. In a within-subjects study with 47 elementary school children (2nd to 6th grade), we examined the usability of the HoloLens 2 using a standardized tutorial on multimodal interaction in AR. The overall system usability was rated “good”. However, several behavioral metrics indicated that specific interaction modes differed in their efficiency. The results are of major importance for the development of learning applications in HMD-AR as they partially deviate from previous findings. In particular, the well-functioning recognition of children’s voice commands that we observed represents a novelty. Furthermore, we found different interaction preferences in HMD-AR among the children. We also found the use of HMD-AR to have a positive effect on children’s activity-related achievement emotions. Overall, our findings can serve as a basis for determining general requirements, possibilities, and limitations of the implementation of educational HMD-AR environments in elementary school classrooms.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Augmenting reality via head-mounted displays (HMD-AR) is an emerging technology in education. The interactivity provided by HMD-AR devices is particularly promising for learning, but presents a challenge to human activity recognition, especially with children. Recent technological advances regarding speech and gesture recognition concerning Microsoft’s HoloLens 2 may address this prevailing issue. In a within-subjects study with 47 elementary school children (2nd to 6th grade), we examined the usability of the HoloLens 2 using a standardized tutorial on multimodal interaction in AR. The overall system usability was rated “good”. However, several behavioral metrics indicated that specific interaction modes differed in their efficiency. The results are of major importance for the development of learning applications in HMD-AR as they partially deviate from previous findings. In particular, the well-functioning recognition of children’s voice commands that we observed represents a novelty. Furthermore, we found different interaction preferences in HMD-AR among the children. We also found the use of HMD-AR to have a positive effect on children’s activity-related achievement emotions. Overall, our findings can serve as a basis for determining general requirements, possibilities, and limitations of the implementation of educational HMD-AR environments in elementary school classrooms. |
Incollections |
Barz, Michael; Sonntag, Daniel Incremental Improvement of a Question Answering System by Re-ranking Answer Candidates Using Machine Learning Incollection Marchi, Erik; Siniscalchi, Sabato Marco; Cumani, Sandro; Salerno, Valerio Mario; Li, Haizhou (Ed.): Increasing Naturalness and Flexibility in Spoken Dialogue Interaction: 10th International Workshop on Spoken Dialogue Systems, pp. 367-379, Springer, 2021. @incollection{11522, title = {Incremental Improvement of a Question Answering System by Re-ranking Answer Candidates Using Machine Learning}, author = {Michael Barz and Daniel Sonntag}, editor = {Erik Marchi and Sabato Marco Siniscalchi and Sandro Cumani and Valerio Mario Salerno and Haizhou Li}, url = {https://www.dfki.de/fileadmin/user_upload/import/11522_2019_Incremental_Improvement_of_a_Question_Answering_System_by_Re-ranking_Answer_Candidates_using_Machine_Learning.pdf}, doi = {https://doi.org/10.1007/978-981-15-9323-9_34}, year = {2021}, date = {2021-01-01}, booktitle = {Increasing Naturalness and Flexibility in Spoken Dialogue Interaction: 10th International Workshop on Spoken Dialogue Systems}, pages = {367-379}, publisher = {Springer}, abstract = {We implement a method for re-ranking top-10 results of a state-of-the-art question answering (QA) system. The goal of our re-ranking approach is to improve the answer selection given the user question and the top-10 candidates. We focus on improving deployed QA systems that do not allow re-training or when re-training comes at a high cost. Our re-ranking approach learns a similarity function using n-gram based features using the query, the answer and the initial system confidence as input. Our contributions are: (1) we generate a QA training corpus starting from 877 answers from the customer care domain of T-Mobile Austria, (2) we implement a state-of-the-art QA pipeline using neural sentence embeddings that encode queries in the same space than the answer index, and (3) we evaluate the QA pipeline and our re-ranking approach using a separately provided test set. The test set can be considered to be available after deployment of the system, e.g., based on feedback of users. Our results show that the system performance, in terms of top-n accuracy and the mean reciprocal rank, benefits from re-ranking using gradient boosted regression trees. On average, the mean reciprocal rank improves by 9.15%9.15%9.15textbackslash%.}, keywords = {}, pubstate = {published}, tppubtype = {incollection} } We implement a method for re-ranking top-10 results of a state-of-the-art question answering (QA) system. The goal of our re-ranking approach is to improve the answer selection given the user question and the top-10 candidates. We focus on improving deployed QA systems that do not allow re-training or when re-training comes at a high cost. Our re-ranking approach learns a similarity function using n-gram based features using the query, the answer and the initial system confidence as input. Our contributions are: (1) we generate a QA training corpus starting from 877 answers from the customer care domain of T-Mobile Austria, (2) we implement a state-of-the-art QA pipeline using neural sentence embeddings that encode queries in the same space than the answer index, and (3) we evaluate the QA pipeline and our re-ranking approach using a separately provided test set. The test set can be considered to be available after deployment of the system, e.g., based on feedback of users. Our results show that the system performance, in terms of top-n accuracy and the mean reciprocal rank, benefits from re-ranking using gradient boosted regression trees. On average, the mean reciprocal rank improves by 9.15%9.15%9.15textbackslash%. |
Inproceedings |
Biswas, Rajarshi; Barz, Michael; Hartmann, Mareike; Sonntag, Daniel Improving German Image Captions using Machine Translation and Transfer Learning Inproceedings Espinosa-Anke, Luis; Martin-Vide, Carlos; Spasic, Irena (Ed.): Statistical Language and Speech Processing SLSP 2021, Springer, Council Chamber Glamorgan Building King Edward VII Ave Cathays Park Cardiff CF10 3WT, 2021. @inproceedings{11805, title = {Improving German Image Captions using Machine Translation and Transfer Learning}, author = {Rajarshi Biswas and Michael Barz and Mareike Hartmann and Daniel Sonntag}, editor = {Luis Espinosa-Anke and Carlos Martin-Vide and Irena Spasic}, url = {https://www.dfki.de/fileadmin/user_upload/import/11805_SLSP2021Paper.pdf}, year = {2021}, date = {2021-11-01}, booktitle = {Statistical Language and Speech Processing SLSP 2021}, publisher = {Springer}, address = {Council Chamber Glamorgan Building King Edward VII Ave Cathays Park Cardiff CF10 3WT}, abstract = {Image captioning is a complex artificial intelligence task that involves many fundamental questions of data representation, learning, and natural language processing. In addition, most of the work in this domain addresses the English language because of the high availability of annotated training data compared to other languages. Therefore, we investigate methods for image captioning in German that transfer knowledge from English training data. We explore four different methods for generating image captions in German, two baseline methods and two more advanced ones based on transfer learning. The baseline methods are based on a state-of-the-art model which we train using a translated version of the English MS COCO dataset and the smaller German Multi30K dataset, respectively. Both advanced methods are pre-trained using the translated MS COCO dataset and fine-tuned for German on the Multi30K dataset. One of these methods uses an alternative attention mechanism from the literature that showed a good performance in English image captioning. We compare the performance of all methods for the Multi30K test set in German using common automatic evaluation metrics. We show that our advanced method with the alternative attention mechanism presents a new baseline for German BLEU, ROUGE, CIDEr, and SPICE scores, and achieves a relative improvement of 21.2 % in BLEU-4 score compared to the current state-of-the-art in German image captioning.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Image captioning is a complex artificial intelligence task that involves many fundamental questions of data representation, learning, and natural language processing. In addition, most of the work in this domain addresses the English language because of the high availability of annotated training data compared to other languages. Therefore, we investigate methods for image captioning in German that transfer knowledge from English training data. We explore four different methods for generating image captions in German, two baseline methods and two more advanced ones based on transfer learning. The baseline methods are based on a state-of-the-art model which we train using a translated version of the English MS COCO dataset and the smaller German Multi30K dataset, respectively. Both advanced methods are pre-trained using the translated MS COCO dataset and fine-tuned for German on the Multi30K dataset. One of these methods uses an alternative attention mechanism from the literature that showed a good performance in English image captioning. We compare the performance of all methods for the Multi30K test set in German using common automatic evaluation metrics. We show that our advanced method with the alternative attention mechanism presents a new baseline for German BLEU, ROUGE, CIDEr, and SPICE scores, and achieves a relative improvement of 21.2 % in BLEU-4 score compared to the current state-of-the-art in German image captioning. |
Hartmann, Mareike; de Lhoneux, Miryam; Hershcovich, Daniel; Kementchedjhieva, Yova; Nielsen, Lukas; Qiu, Chen; Søgaard, Anders A Multilingual Benchmark for Probing Negation-Awareness with Minimal Pairs Inproceedings Proceedings of the 25th Conference on Computational Natural Language Learning (CoNLL), pp. 224-257, Association for Computational Linguistics, 2021. @inproceedings{11846, title = {A Multilingual Benchmark for Probing Negation-Awareness with Minimal Pairs}, author = {Mareike Hartmann and Miryam de Lhoneux and Daniel Hershcovich and Yova Kementchedjhieva and Lukas Nielsen and Chen Qiu and Anders Søgaard}, url = {https://www.dfki.de/fileadmin/user_upload/import/11846_2021.conll-1.19.pdf https://aclanthology.org/2021.conll-1.19/}, year = {2021}, date = {2021-11-01}, booktitle = {Proceedings of the 25th Conference on Computational Natural Language Learning (CoNLL)}, pages = {224-257}, publisher = {Association for Computational Linguistics}, abstract = {Negation is one of the most fundamental concepts in human cognition and language, and several natural language inference (NLI) probes have been designed to investigate pretrained language models' ability to detect and reason with negation. However, the existing probing datasets are limited to English only, and do not enable controlled probing of performance in the absence or presence of negation. In response, we present a multilingual (English, Bulgarian, German, French and Chinese) benchmark collection of NLI examples that are grammatical and correctly labeled, as a result of manual inspection and editing. We use the benchmark to probe the negation-awareness of multilingual language models and find that models that correctly predict examples with negation cues often fail to correctly predict their counter-examples em without negation cues, even when the cues are irrelevant for semantic inference.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Negation is one of the most fundamental concepts in human cognition and language, and several natural language inference (NLI) probes have been designed to investigate pretrained language models' ability to detect and reason with negation. However, the existing probing datasets are limited to English only, and do not enable controlled probing of performance in the absence or presence of negation. In response, we present a multilingual (English, Bulgarian, German, French and Chinese) benchmark collection of NLI examples that are grammatical and correctly labeled, as a result of manual inspection and editing. We use the benchmark to probe the negation-awareness of multilingual language models and find that models that correctly predict examples with negation cues often fail to correctly predict their counter-examples em without negation cues, even when the cues are irrelevant for semantic inference. |
Jørgensen, Rasmus Kær; Hartmann, Mareike; Dai, Xiang; Elliott, Desmond mDAPT: Multilingual Domain Adaptive Pretraining in a Single Model Inproceedings Findings of the Association for Computational Linguistics - EMNLP 2021, pp. 3404-3018, Association for Computational Linguistics, 2021. @inproceedings{11845, title = {mDAPT: Multilingual Domain Adaptive Pretraining in a Single Model}, author = {Rasmus Kær Jørgensen and Mareike Hartmann and Xiang Dai and Desmond Elliott}, url = {https://www.dfki.de/fileadmin/user_upload/import/11845_2021.findings-emnlp.290.pdf}, year = {2021}, date = {2021-11-01}, booktitle = {Findings of the Association for Computational Linguistics - EMNLP 2021}, journal = {Findings of the Association for Computational Linguistics: EMNLP 2021}, volume = {1}, pages = {3404-3018}, publisher = {Association for Computational Linguistics}, abstract = {Domain adaptive pretraining, i.e. the continued unsupervised pretraining of a language model on domain-specific text, improves the modelling of text for downstream tasks within the domain. Numerous real-world applications are based on domain-specific text, e.g. working with financial or biomedical documents, and these applications often need to support multiple languages. However, large-scale domain-specific multilingual pretraining data for such scenarios can be difficult to obtain, due to regulations, legislation, or simply a lack of language- and domain-specific text. One solution is to train a single multilingual model, taking advantage of the data available in as many languages as possible. In this work, we explore the benefits of domain adaptive pretraining with a focus on adapting to multiple languages within a specific domain. We propose different techniques to compose pretraining corpora that enable a language model to both become domain-specific and multilingual. Evaluation on nine domain-specific datasets---for biomedical named entity recognition and financial sentence classification---covering seven different languages show that a single multilingual domain-specific model can outperform the general multilingual model, and performs close to its monolingual counterpart. This finding holds across two different pretraining methods, adapter-based pretraining and full model pretraining.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Domain adaptive pretraining, i.e. the continued unsupervised pretraining of a language model on domain-specific text, improves the modelling of text for downstream tasks within the domain. Numerous real-world applications are based on domain-specific text, e.g. working with financial or biomedical documents, and these applications often need to support multiple languages. However, large-scale domain-specific multilingual pretraining data for such scenarios can be difficult to obtain, due to regulations, legislation, or simply a lack of language- and domain-specific text. One solution is to train a single multilingual model, taking advantage of the data available in as many languages as possible. In this work, we explore the benefits of domain adaptive pretraining with a focus on adapting to multiple languages within a specific domain. We propose different techniques to compose pretraining corpora that enable a language model to both become domain-specific and multilingual. Evaluation on nine domain-specific datasets---for biomedical named entity recognition and financial sentence classification---covering seven different languages show that a single multilingual domain-specific model can outperform the general multilingual model, and performs close to its monolingual counterpart. This finding holds across two different pretraining methods, adapter-based pretraining and full model pretraining. |
Erlemeyer, Fabian; Rehtanz, Christian; Hermanns, Annegret; Lüers, Bengt; Nebel-Wenner, Marvin; Eilers, Reef Janes Live Testing of Flexibilities on Distribution Grid Level – Simulation Setup and Lessons Learned Inproceedings IEEE Electric Power and Energy Conference, IEEE Xplore, IEEE Operations Center 445 Hoes Lane Piscataway, NJ 08854-4141 USA Phone: +1 732 981 0060, 2021. @inproceedings{11927, title = {Live Testing of Flexibilities on Distribution Grid Level – Simulation Setup and Lessons Learned}, author = {Fabian Erlemeyer and Christian Rehtanz and Annegret Hermanns and Bengt Lüers and Marvin Nebel-Wenner and Reef Janes Eilers}, url = {https://www.dfki.de/fileadmin/user_upload/import/11927_2021199998.pdf}, year = {2021}, date = {2021-10-01}, booktitle = {IEEE Electric Power and Energy Conference}, publisher = {IEEE Xplore}, address = {IEEE Operations Center 445 Hoes Lane Piscataway, NJ 08854-4141 USA Phone: +1 732 981 0060}, abstract = {In the DESIGNETZ project real flexibility units were connected to a distribution grid simulation to investigate the integration of decentralized flexibilities for different use-cases. The simulation determines the demand for unit flexibility and communicates the demand to the flexibilities. In return, the response of the flexibilities is integrated back into the simulation to consider not-simulated effects, too. This paper presents the simulation setup and discusses lessons learnt from deploying the simulation into operation.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } In the DESIGNETZ project real flexibility units were connected to a distribution grid simulation to investigate the integration of decentralized flexibilities for different use-cases. The simulation determines the demand for unit flexibility and communicates the demand to the flexibilities. In return, the response of the flexibilities is integrated back into the simulation to consider not-simulated effects, too. This paper presents the simulation setup and discusses lessons learnt from deploying the simulation into operation. |
Barz, Michael; Kapp, Sebastian; Kuhn, Jochen; Sonntag, Daniel Automatic Recognition and Augmentation of Attended Objects in Real-Time Using Eye Tracking and a Head-Mounted Display Inproceedings ACM Symposium on Eye Tracking Research and Applications, pp. 4, Association for Computing Machinery, 2021. @inproceedings{11614, title = {Automatic Recognition and Augmentation of Attended Objects in Real-Time Using Eye Tracking and a Head-Mounted Display}, author = {Michael Barz and Sebastian Kapp and Jochen Kuhn and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11614_etra_ar_video.pdf}, doi = {https://doi.org/10.1145/3450341.3458766}, year = {2021}, date = {2021-05-01}, booktitle = {ACM Symposium on Eye Tracking Research and Applications}, pages = {4}, publisher = {Association for Computing Machinery}, abstract = {Scanning and processing visual stimuli in a scene is essential for the human brain to make situation-aware decisions. Adding the ability to observe the scanning behavior and scene processing to intelligent mobile user interfaces can facilitate a new class of cognition-aware user interfaces. As a first step in this direction, we implement an augmented reality (AR) system that classifies objects at the user’s point of regard, detects visual attention to them, and augments the real objects with virtual labels that stick to the objects in real-time. We use a head-mounted AR device (Microsoft HoloLens 2) with integrated eye tracking capabilities and a front-facing camera for implementing our prototype.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Scanning and processing visual stimuli in a scene is essential for the human brain to make situation-aware decisions. Adding the ability to observe the scanning behavior and scene processing to intelligent mobile user interfaces can facilitate a new class of cognition-aware user interfaces. As a first step in this direction, we implement an augmented reality (AR) system that classifies objects at the user’s point of regard, detects visual attention to them, and augments the real objects with virtual labels that stick to the objects in real-time. We use a head-mounted AR device (Microsoft HoloLens 2) with integrated eye tracking capabilities and a front-facing camera for implementing our prototype. |
Nguyen, Ho Minh Duy; Nguyen, Duy M; Vu, Huong; Nguyen, Binh T; Nunnari, Fabrizio; Sonntag, Daniel An Attention Mechanism using Multiple Knowledge Sources for COVID-19 Detection from CT Images Inproceedings The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), AAAI, 2021. @inproceedings{11369, title = {An Attention Mechanism using Multiple Knowledge Sources for COVID-19 Detection from CT Images}, author = {Ho Minh Duy Nguyen and Duy M Nguyen and Huong Vu and Binh T Nguyen and Fabrizio Nunnari and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11369_AAAI_Workshop_TrustworthyHealthcare_v3.pdf}, year = {2021}, date = {2021-01-01}, booktitle = {The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)}, publisher = {AAAI}, abstract = {Besides principal polymerase chain reaction (PCR) tests, automatically identifying positive samples based on computed tomography (CT) scans can present a promising option in the early diagnosis of COVID-19. Recently, there have been increasing efforts to utilize deep networks for COVID-19 diagnosis based on CT scans. While these approaches mostly focus on introducing novel architectures, transfer learning techniques or construction of large scale data, we propose a novel strategy to improve several performance baselines by leveraging multiple useful information sources relevant to doctors' judgments. Specifically, infected regions and heat-map features extracted from learned networks are integrated with the global image via an attention mechanism during the learning process. This procedure makes our system more robust to noise and guides the network focusing on local lesion areas. Extensive experiments illustrate the superior performance of our approach compared to recent baselines. Furthermore, our learned network guidance presents an explainable feature to doctors to understand the connection between input and output in a grey-box model.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Besides principal polymerase chain reaction (PCR) tests, automatically identifying positive samples based on computed tomography (CT) scans can present a promising option in the early diagnosis of COVID-19. Recently, there have been increasing efforts to utilize deep networks for COVID-19 diagnosis based on CT scans. While these approaches mostly focus on introducing novel architectures, transfer learning techniques or construction of large scale data, we propose a novel strategy to improve several performance baselines by leveraging multiple useful information sources relevant to doctors' judgments. Specifically, infected regions and heat-map features extracted from learned networks are integrated with the global image via an attention mechanism during the learning process. This procedure makes our system more robust to noise and guides the network focusing on local lesion areas. Extensive experiments illustrate the superior performance of our approach compared to recent baselines. Furthermore, our learned network guidance presents an explainable feature to doctors to understand the connection between input and output in a grey-box model. |
Prange, Alexander; Barz, Michael; Heimann-Steinert, Anika; Sonntag, Daniel Explainable Automatic Evaluation of the Trail Making Test for Dementia Screening Inproceedings Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, 2021. @inproceedings{11432, title = {Explainable Automatic Evaluation of the Trail Making Test for Dementia Screening}, author = {Alexander Prange and Michael Barz and Anika Heimann-Steinert and Daniel Sonntag}, doi = {https://doi.org/10.1145/3411764.3445046}, year = {2021}, date = {2021-01-01}, booktitle = {Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems}, publisher = {Association for Computing Machinery}, abstract = {The Trail Making Test (TMT) is a frequently used neuropsychological test for assessing cognitive performance. The subject connects a sequence of numbered nodes by using a pen on normal paper. We present an automatic cognitive assessment tool that analyzes samples of the TMT which we record using a digital pen. This enables us to analyze digital pen features that are difficult or impossible to evaluate manually. Our system automatically measures several pen features, including the completion time which is the main performance indicator used by clinicians to score the TMT in practice. In addition, our system provides a structured report of the analysis of the test, for example indicating missed or erroneously connected nodes, thereby offering more objective, transparent and explainable results to the clinician. We evaluate our system with 40 elderly subjects from a geriatrics daycare clinic of a large hospital.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The Trail Making Test (TMT) is a frequently used neuropsychological test for assessing cognitive performance. The subject connects a sequence of numbered nodes by using a pen on normal paper. We present an automatic cognitive assessment tool that analyzes samples of the TMT which we record using a digital pen. This enables us to analyze digital pen features that are difficult or impossible to evaluate manually. Our system automatically measures several pen features, including the completion time which is the main performance indicator used by clinicians to score the TMT in practice. In addition, our system provides a structured report of the analysis of the test, for example indicating missed or erroneously connected nodes, thereby offering more objective, transparent and explainable results to the clinician. We evaluate our system with 40 elderly subjects from a geriatrics daycare clinic of a large hospital. |
Bhatti, Omair Shahzad; Barz, Michael; Sonntag, Daniel EyeLogin - Calibration-Free Authentication Method for Public Displays Using Eye Gaze Inproceedings ACM Symposium on Eye Tracking Research and Applications, Association for Computing Machinery, 2021. @inproceedings{11616, title = {EyeLogin - Calibration-Free Authentication Method for Public Displays Using Eye Gaze}, author = {Omair Shahzad Bhatti and Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11616_EyeLogin.pdf}, doi = {https://doi.org/10.1145/3448018.3458001}, year = {2021}, date = {2021-01-01}, booktitle = {ACM Symposium on Eye Tracking Research and Applications}, publisher = {Association for Computing Machinery}, abstract = {The usage of interactive public displays has increased including the number of sensitive applications and, hence, the demand for user authentication methods. In this context, gaze-based authentication was shown to be effective and more secure, but significantly slower than touch- or gesture-based methods. We implement a calibration-free and fast authentication method for situated displays based on saccadic eye movements. In a user study (n = 10), we compare our new method with CueAuth from Khamis et al. (IMWUT’18), an authentication method based on smooth pursuit eye movements. The results show a significant improvement in accuracy from 82.94% to 95.88%. At the same time, we found that the entry speed can be increased enormously with our method, on average, 18.28s down to 5.12s, which is comparable to touch-based input.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The usage of interactive public displays has increased including the number of sensitive applications and, hence, the demand for user authentication methods. In this context, gaze-based authentication was shown to be effective and more secure, but significantly slower than touch- or gesture-based methods. We implement a calibration-free and fast authentication method for situated displays based on saccadic eye movements. In a user study (n = 10), we compare our new method with CueAuth from Khamis et al. (IMWUT’18), an authentication method based on smooth pursuit eye movements. The results show a significant improvement in accuracy from 82.94% to 95.88%. At the same time, we found that the entry speed can be increased enormously with our method, on average, 18.28s down to 5.12s, which is comparable to touch-based input. |
Nunnari, Fabrizio; Sonntag, Daniel A Software Toolbox for Deploying Deep Learning Decision Support Systems with XAI Capabilities Inproceedings Companion of the 2021 ACM SIGCHI Symposium on Engineering Interactive Computing Systems, Association for Computing Machinery, 2021. @inproceedings{11664, title = {A Software Toolbox for Deploying Deep Learning Decision Support Systems with XAI Capabilities}, author = {Fabrizio Nunnari and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11664_nunnari21EICS-TIML.pdf}, doi = {https://doi.org/10.1145/3459926.3464753}, year = {2021}, date = {2021-01-01}, booktitle = {Companion of the 2021 ACM SIGCHI Symposium on Engineering Interactive Computing Systems}, publisher = {Association for Computing Machinery}, abstract = {We describe the software architecture of a toolbox of reusable components for the configuration of convolutional neural networks (CNNs) for classification and labeling problems. The toolbox architecture has been designed to maximize the reuse of established algorithms and to include domain experts in the development and evaluation process across different projects and challenges. In addition, we implemented easy-to-edit input formats and modules for XAI (eXplainable AI) through visual inspection capabilities. The toolbox is available for the research community to implement applied artificial intelligence projects.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We describe the software architecture of a toolbox of reusable components for the configuration of convolutional neural networks (CNNs) for classification and labeling problems. The toolbox architecture has been designed to maximize the reuse of established algorithms and to include domain experts in the development and evaluation process across different projects and challenges. In addition, we implemented easy-to-edit input formats and modules for XAI (eXplainable AI) through visual inspection capabilities. The toolbox is available for the research community to implement applied artificial intelligence projects. |
Prange, Alexander; Sonntag, Daniel Assessing Cognitive Test Performance Using Automatic Digital Pen Features Analysis Inproceedings Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization, Association for Computing Machinery, 2021. @inproceedings{11703, title = {Assessing Cognitive Test Performance Using Automatic Digital Pen Features Analysis}, author = {Alexander Prange and Daniel Sonntag}, year = {2021}, date = {2021-01-01}, booktitle = {Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization}, publisher = {Association for Computing Machinery}, abstract = {Most cognitive assessments, for dementia screening for example, are conducted with a pen on normal paper. We record these tests with a digital pen as part of a new interactive cognitive assessment tool with automatic analysis of pen input. The clinician can, first, observe the sketching process in real-time on a mobile tablet, e.g., in telemedicine settings or to follow Covid-19 distancing regulations. Second, the results of an automatic test analysis are presented to the clinician in real-time, thereby reducing manual scoring effort and producing objective reports. The presented research describes the architecture of our cognitive assessment tool and examines how accurately different machine learning (ML) models can automatically score cognitive tests, without a semantic content analysis. Our system uses a set of more than 170 pen features, calculated directly from the raw digital pen signal. We evaluate our system with 40 subjects from a geriatrics daycare clinic. Using standard ML techniques our feature set outperforms previous approaches on the cognitive tests we consider, i.e., the Clock Drawing, the Rey-Osterrieth Complex Figure, and the Trail Making Test, by automatically scoring tests with up to 82% accuracy in a binary classification task.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Most cognitive assessments, for dementia screening for example, are conducted with a pen on normal paper. We record these tests with a digital pen as part of a new interactive cognitive assessment tool with automatic analysis of pen input. The clinician can, first, observe the sketching process in real-time on a mobile tablet, e.g., in telemedicine settings or to follow Covid-19 distancing regulations. Second, the results of an automatic test analysis are presented to the clinician in real-time, thereby reducing manual scoring effort and producing objective reports. The presented research describes the architecture of our cognitive assessment tool and examines how accurately different machine learning (ML) models can automatically score cognitive tests, without a semantic content analysis. Our system uses a set of more than 170 pen features, calculated directly from the raw digital pen signal. We evaluate our system with 40 subjects from a geriatrics daycare clinic. Using standard ML techniques our feature set outperforms previous approaches on the cognitive tests we consider, i.e., the Clock Drawing, the Rey-Osterrieth Complex Figure, and the Trail Making Test, by automatically scoring tests with up to 82% accuracy in a binary classification task. |
Nguyen, Ho Minh Duy; Mai, Truong Thanh-Nhat; Than, Ngoc Trong Tuong; Prange, Alexander; Sonntag, Daniel Self-Supervised Domain Adaptation for Diabetic Retinopathy Grading using Vessel Image Reconstruction Inproceedings Proceedings of the 44th German Conference on Artificial Intelligence, Springer, 2021. @inproceedings{11715, title = {Self-Supervised Domain Adaptation for Diabetic Retinopathy Grading using Vessel Image Reconstruction}, author = {Ho Minh Duy Nguyen and Truong Thanh-Nhat Mai and Ngoc Trong Tuong Than and Alexander Prange and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11715_KI_2021_Self_Supervised_Domain_Adaptation_for_Diabetic_Retinopathy_Grading.pdf}, year = {2021}, date = {2021-01-01}, booktitle = {Proceedings of the 44th German Conference on Artificial Intelligence}, publisher = {Springer}, abstract = {This paper investigates the problem of domain adaptation for diabetic retinopathy (DR) grading. We learn invariant target-domain features by defining a novel self-supervised task based on retinal vessel image reconstructions, inspired by medical domain knowledge. Then, a benchmark of current state-of-the-art unsupervised domain adaptation methods on the DR problem is provided. It can be shown that our approach outperforms existing domain adaption strategies. Furthermore, when utilizing entire training data in the target domain, we are able to compete with several state-of-the-art approaches in final classification accuracy just by applying standard network architectures and using image-level labels.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } This paper investigates the problem of domain adaptation for diabetic retinopathy (DR) grading. We learn invariant target-domain features by defining a novel self-supervised task based on retinal vessel image reconstructions, inspired by medical domain knowledge. Then, a benchmark of current state-of-the-art unsupervised domain adaptation methods on the DR problem is provided. It can be shown that our approach outperforms existing domain adaption strategies. Furthermore, when utilizing entire training data in the target domain, we are able to compete with several state-of-the-art approaches in final classification accuracy just by applying standard network architectures and using image-level labels. |
Nunnari, Fabrizio; Kadir, Md Abdul; Sonntag, Daniel On the Overlap Between Grad-CAM Saliency Maps and Explainable Visual Features in Skin Cancer Images Inproceedings Holzinger, Andreas; Kieseberg, Peter; Tjoa, Min A; Weippl, Edgar (Ed.): Machine Learning and Knowledge Extraction, pp. 241-253, Springer International Publishing, 2021. @inproceedings{11802, title = {On the Overlap Between Grad-CAM Saliency Maps and Explainable Visual Features in Skin Cancer Images}, author = {Fabrizio Nunnari and Md Abdul Kadir and Daniel Sonntag}, editor = {Andreas Holzinger and Peter Kieseberg and Min A Tjoa and Edgar Weippl}, url = {https://www.dfki.de/fileadmin/user_upload/import/11802_2021_CD_MAKE_XAI_and_SkinFeatures.pdf}, doi = {https://doi.org/10.1007/978-3-030-84060-0_16}, year = {2021}, date = {2021-01-01}, booktitle = {Machine Learning and Knowledge Extraction}, volume = {12844}, pages = {241-253}, publisher = {Springer International Publishing}, abstract = {Dermatologists recognize melanomas by inspecting images in which they identify human-comprehensible visual features. In this paper, we investigate to what extent such features correspond to the saliency areas identified on CNNs trained for classification. Our experiments, conducted on two neural architectures characterized by different depth and different resolution of the last convolutional layer, quantify to what extent thresholded Grad-CAM saliency maps can be used to identify visual features of skin cancer. We found that the best threshold value, i.e., the threshold at which we can measure the highest Jaccard index, varies significantly among features; ranging from 0.3 to 0.7. In addition, we measured Jaccard indices as high as 0.143, which is almost 50% of the performance of state-of-the-art architectures specialized in feature mask prediction at pixel-level, such as U-Net. Finally, a breakdown test between malignancy and classification correctness shows that higher resolution saliency maps could help doctors in spotting wrong classifications.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Dermatologists recognize melanomas by inspecting images in which they identify human-comprehensible visual features. In this paper, we investigate to what extent such features correspond to the saliency areas identified on CNNs trained for classification. Our experiments, conducted on two neural architectures characterized by different depth and different resolution of the last convolutional layer, quantify to what extent thresholded Grad-CAM saliency maps can be used to identify visual features of skin cancer. We found that the best threshold value, i.e., the threshold at which we can measure the highest Jaccard index, varies significantly among features; ranging from 0.3 to 0.7. In addition, we measured Jaccard indices as high as 0.143, which is almost 50% of the performance of state-of-the-art architectures specialized in feature mask prediction at pixel-level, such as U-Net. Finally, a breakdown test between malignancy and classification correctness shows that higher resolution saliency maps could help doctors in spotting wrong classifications. |
Nunnari, Fabrizio; Alam, Hasan Md Tusfiqur; Sonntag, Daniel Anomaly Detection for Skin Lesion Images Using Replicator Neural Networks Inproceedings Holzinger, Andreas; Kieseberg, Peter; Tjoa, Min A; Weippl, Edgar (Ed.): Machine Learning and Knowledge Extraction, pp. 225-240, Springer International Publishing, 2021. @inproceedings{11803, title = {Anomaly Detection for Skin Lesion Images Using Replicator Neural Networks}, author = {Fabrizio Nunnari and Hasan Md Tusfiqur Alam and Daniel Sonntag}, editor = {Andreas Holzinger and Peter Kieseberg and Min A Tjoa and Edgar Weippl}, url = {https://www.dfki.de/fileadmin/user_upload/import/11803_2021_CD_MAKE_AnomalyDetection.pdf}, doi = {https://doi.org/10.1007/978-3-030-84060-0_15}, year = {2021}, date = {2021-01-01}, booktitle = {Machine Learning and Knowledge Extraction}, volume = {12844}, pages = {225-240}, publisher = {Springer International Publishing}, abstract = {This paper presents an investigation on the task of anomaly detection for images of skin lesions. The goal is to provide a decision support system with an extra filtering layer to inform users if a classifier should not be used for a given sample. We tested anomaly detectors based on autoencoders and three discrimination methods: feature vector distance, replicator neural networks, and support vector data description fine-tuning. Results show that neural-based detectors can perfectly discriminate between skin lesions and open world images, but class discrimination cannot easily be accomplished and requires further investigation.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } This paper presents an investigation on the task of anomaly detection for images of skin lesions. The goal is to provide a decision support system with an extra filtering layer to inform users if a classifier should not be used for a given sample. We tested anomaly detectors based on autoencoders and three discrimination methods: feature vector distance, replicator neural networks, and support vector data description fine-tuning. Results show that neural-based detectors can perfectly discriminate between skin lesions and open world images, but class discrimination cannot easily be accomplished and requires further investigation. |
Nunnari, Fabrizio; Ezema, Abraham; Sonntag, Daniel Crop It, but Not Too Much: The Effects of Masking on the Classification of Melanoma Images Inproceedings Edelkamp, Stefan; Rueckert, Elmar; Möller, Ralf (Ed.): KI 2021: Advances in Artificial Intelligence, pp. 179-193, Springer International Publishing, 2021. @inproceedings{11859, title = {Crop It, but Not Too Much: The Effects of Masking on the Classification of Melanoma Images}, author = {Fabrizio Nunnari and Abraham Ezema and Daniel Sonntag}, editor = {Stefan Edelkamp and Elmar Rueckert and Ralf Möller}, url = {https://www.dfki.de/fileadmin/user_upload/import/11859_2021_KIconference_SkinLesionMasking.pdf https://link.springer.com/chapter/10.1007/978-3-030-87626-5_13}, year = {2021}, date = {2021-01-01}, booktitle = {KI 2021: Advances in Artificial Intelligence}, pages = {179-193}, publisher = {Springer International Publishing}, abstract = {To improve the accuracy of convolutional neural networks in discriminating between nevi and melanomas, we test nine different combinations of masking and cropping on three datasets of skin lesion images (ISIC2016, ISIC2018, and MedNode). Our experiments, confirmed by 10-fold cross-validation, show that cropping increases classification performances, but specificity decreases when cropping is applied together with masking out healthy skin regions. An analysis of Grad-CAM saliency maps shows that in fact our CNN models have the tendency to focus on healthy skin at the border when a nevus is classified.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } To improve the accuracy of convolutional neural networks in discriminating between nevi and melanomas, we test nine different combinations of masking and cropping on three datasets of skin lesion images (ISIC2016, ISIC2018, and MedNode). Our experiments, confirmed by 10-fold cross-validation, show that cropping increases classification performances, but specificity decreases when cropping is applied together with masking out healthy skin regions. An analysis of Grad-CAM saliency maps shows that in fact our CNN models have the tendency to focus on healthy skin at the border when a nevus is classified. |
Prange, Alexander; Sonntag, Daniel A Demonstrator for Interactive Image Clustering and Fine-Tuning Neural Networks in Virtual Reality Inproceedings Edelkamp, Stefan; Rueckert, Elmar; Möller, Ralf (Ed.): KI 2021: Advances in Artificial Intelligence, pp. 194-203, Springer International Publishing, 2021. @inproceedings{11886, title = {A Demonstrator for Interactive Image Clustering and Fine-Tuning Neural Networks in Virtual Reality}, author = {Alexander Prange and Daniel Sonntag}, editor = {Stefan Edelkamp and Elmar Rueckert and Ralf Möller}, url = {https://link.springer.com/chapter/10.1007/978-3-030-87626-5_14}, year = {2021}, date = {2021-01-01}, booktitle = {KI 2021: Advances in Artificial Intelligence}, pages = {194-203}, publisher = {Springer International Publishing}, abstract = {We present a virtual reality (VR) application that enables us to interactively explore and manipulate image clusters based on layer activations of convolutional neural networks (CNNs). We apply dimensionality reduction techniques to project images into the 3D space, where the user can directly interact with the model. The user can change the position of an image by using natural hand gestures. This manipulation triggers additional training steps of the network, based on the new spatial information and new label of the image. After the training step is finished, the visualization is updated according to the new output of the CNN. The goal is to visualize and improve the cluster output of the model, and at the same time, to improve the understanding of the model. We discuss two different approaches for calculating the VR projection, a combined PCA/t-SNE dimensionality reduction based approach and a variational auto-encoder (VAE) based approach.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We present a virtual reality (VR) application that enables us to interactively explore and manipulate image clusters based on layer activations of convolutional neural networks (CNNs). We apply dimensionality reduction techniques to project images into the 3D space, where the user can directly interact with the model. The user can change the position of an image by using natural hand gestures. This manipulation triggers additional training steps of the network, based on the new spatial information and new label of the image. After the training step is finished, the visualization is updated according to the new output of the CNN. The goal is to visualize and improve the cluster output of the model, and at the same time, to improve the understanding of the model. We discuss two different approaches for calculating the VR projection, a combined PCA/t-SNE dimensionality reduction based approach and a variational auto-encoder (VAE) based approach. |
Barz, Michael; Bhatti, Omair Shahzad; Lüers, Bengt; Prange, Alexander; Sonntag, Daniel Multisensor-Pipeline: A Lightweight, Flexible, and Extensible Framework for Building Multimodal-Multisensor Interfaces Inproceedings Companion Publication of the 2021 International Conference on Multimodal Interaction, pp. 13-18, Association for Computing Machinery, 2021. @inproceedings{11981, title = {Multisensor-Pipeline: A Lightweight, Flexible, and Extensible Framework for Building Multimodal-Multisensor Interfaces}, author = {Michael Barz and Omair Shahzad Bhatti and Bengt Lüers and Alexander Prange and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11981_icmi_cr.pdf}, year = {2021}, date = {2021-01-01}, booktitle = {Companion Publication of the 2021 International Conference on Multimodal Interaction}, pages = {13-18}, publisher = {Association for Computing Machinery}, abstract = {We present the multisensor-pipeline (MSP), a lightweight, flexible, and extensible framework for prototyping multimodal-multisensor interfaces based on real-time sensor input. Our open-source framework (available on GitHub) enables researchers and developers to easily integrate multiple sensors or other data streams via source modules, to add stream and event processing capabilities via processor modules, and to connect user interfaces or databases via sink modules in a graph-based processing pipeline. Our framework is implemented in Python with a low number of dependencies, which enables a quick setup process, execution across multiple operating systems, and direct access to cutting-edge machine learning libraries and models. We showcase the functionality and capabilities of MSP through a sample application that connects a mobile eye tracker to classify image patches surrounding the user’s fixation points and visualizes the classification results in real-time.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We present the multisensor-pipeline (MSP), a lightweight, flexible, and extensible framework for prototyping multimodal-multisensor interfaces based on real-time sensor input. Our open-source framework (available on GitHub) enables researchers and developers to easily integrate multiple sensors or other data streams via source modules, to add stream and event processing capabilities via processor modules, and to connect user interfaces or databases via sink modules in a graph-based processing pipeline. Our framework is implemented in Python with a low number of dependencies, which enables a quick setup process, execution across multiple operating systems, and direct access to cutting-edge machine learning libraries and models. We showcase the functionality and capabilities of MSP through a sample application that connects a mobile eye tracker to classify image patches surrounding the user’s fixation points and visualizes the classification results in real-time. |
Miscellaneous |
Hartmann, Mareike; Kruijff-Korbayová, Ivana; Sonntag, Daniel Interaction with Explanations in the XAINES Project Miscellaneous Trustworthy AI in the Wild Workshop 2021, 2021. @misc{11867, title = {Interaction with Explanations in the XAINES Project}, author = {Mareike Hartmann and Ivana Kruijff-Korbayová and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11867_AI_in_the_wild__Xaines.pdf}, year = {2021}, date = {2021-09-01}, booktitle = {Trustworthy AI in the wild}, publisher = {-}, abstract = {AI systems are increasingly pervasive and their large-scale adoption makes it necessary to explain their behaviour, for example to their users who are impacted by their decisions, or to their developers who need to ensure their functionality. This requires, on the one hand, to obtain an accurate representation of the chain of events that caused the system to behave in a certain way (e.g., to make a specific decision). On the other hand, this causal chain needs to be communicated to the users depending on their needs and expectations. In this phase of explanation delivery, allowing interaction between user and model has the potential to improve both model quality and user experience. In this abstract, we present our planned and on-going work on the interaction with explanations as part of the XAINES project. The project investigates the explanation of AI systems through narratives targeted to the needs of a specific audience, and our work focuses on the question of how and in which way human-model interaction can enable successful explanation.}, howpublished = {Trustworthy AI in the Wild Workshop 2021}, keywords = {}, pubstate = {published}, tppubtype = {misc} } AI systems are increasingly pervasive and their large-scale adoption makes it necessary to explain their behaviour, for example to their users who are impacted by their decisions, or to their developers who need to ensure their functionality. This requires, on the one hand, to obtain an accurate representation of the chain of events that caused the system to behave in a certain way (e.g., to make a specific decision). On the other hand, this causal chain needs to be communicated to the users depending on their needs and expectations. In this phase of explanation delivery, allowing interaction between user and model has the potential to improve both model quality and user experience. In this abstract, we present our planned and on-going work on the interaction with explanations as part of the XAINES project. The project investigates the explanation of AI systems through narratives targeted to the needs of a specific audience, and our work focuses on the question of how and in which way human-model interaction can enable successful explanation. |
Malone, Sarah; Altmeyer, Kristin; Barz, Michael; Lauer, Luisa; Sonntag, Daniel; Peschel, Markus; Brünken, Roland Measuring Intrisic and Extraneous Cognitive Load in Elementary School Students Using Subjective Ratings and Smart Pen Data Miscellaneous 13th International Cognitive Load Theory Conference, 2021. @misc{11868, title = {Measuring Intrisic and Extraneous Cognitive Load in Elementary School Students Using Subjective Ratings and Smart Pen Data}, author = {Sarah Malone and Kristin Altmeyer and Michael Barz and Luisa Lauer and Daniel Sonntag and Markus Peschel and Roland Brünken}, url = {https://www.dfki.de/fileadmin/user_upload/import/11868_Cl_measurement_in_children.pdf}, year = {2021}, date = {2021-01-01}, abstract = {New methods are constantly being developed to optimize and adapt cognitive load measurement to different contexts (Korbach et al., 2018). It is noteworthy, however, that research on cognitive load measurement in elementary school students is rare. Although there is some evidence that they might be able to report their total cognitive load (Ayres, 2006), there are also reasons to doubt the quality of children’s self-reports (e.g., Chambers & Johnson, 2002). To avoid these issues, behavioral and objective online-measures are promising. A novel approach – the use of smartpen data generated by natural use of a pen during task completion – seems particularly encouraging as these measures proved to be predictive of cognitive load in adults (e.g., Yu, Epps, & Chen, 2011). Moreover, Barz et al. (2020) demonstrated the predictive power of smartpen data for performance in children. The present research addressed two prevailing gaps in research on cognitive load assessment in elementary school students. We developed a subjective rating scale and investigated whether this instrument can provide valid measurements of ICL and ECL (Research Question 1). Moreover, we researched whether smartpen data can be used as a valid process measurement of cognitive load (Research Question 2).}, howpublished = {13th International Cognitive Load Theory Conference}, keywords = {}, pubstate = {published}, tppubtype = {misc} } New methods are constantly being developed to optimize and adapt cognitive load measurement to different contexts (Korbach et al., 2018). It is noteworthy, however, that research on cognitive load measurement in elementary school students is rare. Although there is some evidence that they might be able to report their total cognitive load (Ayres, 2006), there are also reasons to doubt the quality of children’s self-reports (e.g., Chambers & Johnson, 2002). To avoid these issues, behavioral and objective online-measures are promising. A novel approach – the use of smartpen data generated by natural use of a pen during task completion – seems particularly encouraging as these measures proved to be predictive of cognitive load in adults (e.g., Yu, Epps, & Chen, 2011). Moreover, Barz et al. (2020) demonstrated the predictive power of smartpen data for performance in children. The present research addressed two prevailing gaps in research on cognitive load assessment in elementary school students. We developed a subjective rating scale and investigated whether this instrument can provide valid measurements of ICL and ECL (Research Question 1). Moreover, we researched whether smartpen data can be used as a valid process measurement of cognitive load (Research Question 2). |
Altmeyer, Kristin; Malone, Sarah; Kapp, Sebastian; Barz, Michael; Lauer, Luisa; Thees, Michael; Kuhn, Jochen; Peschel, Markus; Sonntag, Daniel; Brünken, Roland 13th International Cognitive Load Theory Conference, 2021. @misc{11870, title = {The effect of augmented reality on global coherence formation processes during STEM laboratory work in elementary school children}, author = {Kristin Altmeyer and Sarah Malone and Sebastian Kapp and Michael Barz and Luisa Lauer and Michael Thees and Jochen Kuhn and Markus Peschel and Daniel Sonntag and Roland Brünken}, url = {https://www.dfki.de/fileadmin/user_upload/import/11870_ICLTC_2021_Altmeyer_final.pdf}, year = {2021}, date = {2021-01-01}, abstract = {In science education, hands-on student experiments are used to explore cause and effect relationships. Conventional lab work requires students to interact with physical experimentation objects and observe additional information like measurement values to deduce scientific laws and interrelations. The observable information, however, are usually presented in physical distance to the setting, e.g., on a separate display of a measuring device. The resulting spatial split (Chandler & Sweller, 1991) between representations hampers global coherence formation (Seufert & Brünken, 2004): Mapping processes between the spatially distant sources of information are assumed to lead to an increase in extraneous cognitive load (ECL; Ayres & Sweller, 2014). Consequently, learning outcomes can be impaired (Kalyuga et al., 1999). Augmented Reality (AR) can be used to overcome the split-attention effect by allowing additional information to be virtually integrated into the real-world set-up (Azuma, 1997). A study by Altmeyer et al. (2020) with university students showed that AR-support during experimentation led to a higher conceptual knowledge gain but had no effect on ECL. The current study provides a conceptual replication of Altmeyer et al.’s (2020) research and focuses on three main objectives: First, we aimed at investigating the generalizability of the advantage of AR on experimental learning in a sample of elementary school children. Second, we examined if low prior-knowledge of children even amplifies the split-attention effect, as proposed by Kalyuga et al. (1998). Finally, we focused on obtaining deeper insights into global coherence formation processes during lab work using specific tests and eye tracking measures.}, howpublished = {13th International Cognitive Load Theory Conference}, keywords = {}, pubstate = {published}, tppubtype = {misc} } In science education, hands-on student experiments are used to explore cause and effect relationships. Conventional lab work requires students to interact with physical experimentation objects and observe additional information like measurement values to deduce scientific laws and interrelations. The observable information, however, are usually presented in physical distance to the setting, e.g., on a separate display of a measuring device. The resulting spatial split (Chandler & Sweller, 1991) between representations hampers global coherence formation (Seufert & Brünken, 2004): Mapping processes between the spatially distant sources of information are assumed to lead to an increase in extraneous cognitive load (ECL; Ayres & Sweller, 2014). Consequently, learning outcomes can be impaired (Kalyuga et al., 1999). Augmented Reality (AR) can be used to overcome the split-attention effect by allowing additional information to be virtually integrated into the real-world set-up (Azuma, 1997). A study by Altmeyer et al. (2020) with university students showed that AR-support during experimentation led to a higher conceptual knowledge gain but had no effect on ECL. The current study provides a conceptual replication of Altmeyer et al.’s (2020) research and focuses on three main objectives: First, we aimed at investigating the generalizability of the advantage of AR on experimental learning in a sample of elementary school children. Second, we examined if low prior-knowledge of children even amplifies the split-attention effect, as proposed by Kalyuga et al. (1998). Finally, we focused on obtaining deeper insights into global coherence formation processes during lab work using specific tests and eye tracking measures. |
Altmeyer, Kristin; Malone, Sarah; Kapp, Sebastian; Barz, Michael; Lauer, Luisa; Thees, Michael; Kuhn, Jochen; Peschel, Markus; Sonntag, Daniel; Brünken, Roland Augmented Reality zur Förderung globaler Kohärenzbildungsprozesse beim Experimentieren im Sachunterricht Miscellaneous Tagung der Fachgruppe Pädagogische Psychologie, 2021. @misc{11871, title = {Augmented Reality zur Förderung globaler Kohärenzbildungsprozesse beim Experimentieren im Sachunterricht}, author = {Kristin Altmeyer and Sarah Malone and Sebastian Kapp and Michael Barz and Luisa Lauer and Michael Thees and Jochen Kuhn and Markus Peschel and Daniel Sonntag and Roland Brünken}, url = {https://www.dfki.de/fileadmin/user_upload/import/11871_v3_Altmeyer_VR_Symposium_PAEPSY_2021.pdf}, year = {2021}, date = {2021-01-01}, abstract = {Augmented Reality (AR) lässt sich als eine Form virtueller Umgebungen auf einem Realitäts-Virtualitäts-Kontinuum (Milgram & Kishino, 1994) der gemischten Realität zuordnen. AR erweitert die Realität durch die Integration virtueller Objekte. Ein vielversprechendes Anwendungsgebiet für AR im Bildungsbereich bietet das technologiegestützte Experimentieren: Experimente bilden ein wesentliches Merkmal der Naturwissenschaften und werden im MINT-Unterricht eingesetzt, um Zusammenhänge zu untersuchen. Bisherige Forschung deutet darauf hin, dass bereits Kinder im Grundschulalter (natur)wissenschaftliches Denken und die Fähigkeit zum Experimentieren entwickeln können (z.B. Osterhaus et al., 2015). Um Ursache-Wirkung-Beziehungen aus einem Experiment abzuleiten, müssen Lernende meist reale Informationen der Experimentierumgebung mit virtuellen Informationen, wie z.B. Messwerten auf Messwertdisplays, mental verknüpfen. Im Sinne der Cognitive Theory of Multimedia Learning (Mayer, 2005) und der Cognitive Load Theory (Sweller et al., 1998) stellt die Verknüpfung räumlich getrennter Informationen eine besondere Herausforderung an das Arbeitsgedächtnis dar. AR kann dazu genutzt werden, reale und virtuelle Informationen beim Experimentieren integriert darzustellen. Vorausgehende Studienergebnisse (z.B. Altmeyer et al., 2020) implizieren, dass AR die globale Kohärenzbildung (Seufert & Brünken, 2004) unterstützt und zu besseren Lernergebnissen führen kann (Altmeyer et. al., 2020). In der vorliegenden Studie wurde der Effekt von AR-Unterstützung beim Experimentieren in einer Stichprobe von Grundschulkindern untersucht. Nach einem Vorwissenstest führten 59 Kinder Experimente zu elektrischen Schaltkreisen durch. Einer Gruppe wurden Echzeit-Messwerte für die Stromstärke in einer Tabelle auf einem separaten Tabletbildschirm präsentiert. Dagegen sah die AR-unterstützte Gruppe die Messwerte beim Blick durch eine Tabletkamera in die Experimentierumgebung integriert. Während des Experimentierens wurden die Blickbewegungen der Kinder erfasst. Danach bearbeiteten beide Gruppen Posttests, welche in ihren Anforderungen an die globale Kohärenzbildung zwischen realen und virtuellen Elementen beim Experimentieren variierten. Erste Ergebnisse zeigen, dass Kinder insbesondere hinsichtlich Aufgaben, die eine starke globale Kohärenz erfordern, von der AR-Umgebung profitieren. Blickbewegungsanalysen sollen weitere Aufschlüsse über den Prozess der Kohärenzbildung während des Experimentierens in AR geben.}, howpublished = {Tagung der Fachgruppe Pädagogische Psychologie}, keywords = {}, pubstate = {published}, tppubtype = {misc} } Augmented Reality (AR) lässt sich als eine Form virtueller Umgebungen auf einem Realitäts-Virtualitäts-Kontinuum (Milgram & Kishino, 1994) der gemischten Realität zuordnen. AR erweitert die Realität durch die Integration virtueller Objekte. Ein vielversprechendes Anwendungsgebiet für AR im Bildungsbereich bietet das technologiegestützte Experimentieren: Experimente bilden ein wesentliches Merkmal der Naturwissenschaften und werden im MINT-Unterricht eingesetzt, um Zusammenhänge zu untersuchen. Bisherige Forschung deutet darauf hin, dass bereits Kinder im Grundschulalter (natur)wissenschaftliches Denken und die Fähigkeit zum Experimentieren entwickeln können (z.B. Osterhaus et al., 2015). Um Ursache-Wirkung-Beziehungen aus einem Experiment abzuleiten, müssen Lernende meist reale Informationen der Experimentierumgebung mit virtuellen Informationen, wie z.B. Messwerten auf Messwertdisplays, mental verknüpfen. Im Sinne der Cognitive Theory of Multimedia Learning (Mayer, 2005) und der Cognitive Load Theory (Sweller et al., 1998) stellt die Verknüpfung räumlich getrennter Informationen eine besondere Herausforderung an das Arbeitsgedächtnis dar. AR kann dazu genutzt werden, reale und virtuelle Informationen beim Experimentieren integriert darzustellen. Vorausgehende Studienergebnisse (z.B. Altmeyer et al., 2020) implizieren, dass AR die globale Kohärenzbildung (Seufert & Brünken, 2004) unterstützt und zu besseren Lernergebnissen führen kann (Altmeyer et. al., 2020). In der vorliegenden Studie wurde der Effekt von AR-Unterstützung beim Experimentieren in einer Stichprobe von Grundschulkindern untersucht. Nach einem Vorwissenstest führten 59 Kinder Experimente zu elektrischen Schaltkreisen durch. Einer Gruppe wurden Echzeit-Messwerte für die Stromstärke in einer Tabelle auf einem separaten Tabletbildschirm präsentiert. Dagegen sah die AR-unterstützte Gruppe die Messwerte beim Blick durch eine Tabletkamera in die Experimentierumgebung integriert. Während des Experimentierens wurden die Blickbewegungen der Kinder erfasst. Danach bearbeiteten beide Gruppen Posttests, welche in ihren Anforderungen an die globale Kohärenzbildung zwischen realen und virtuellen Elementen beim Experimentieren variierten. Erste Ergebnisse zeigen, dass Kinder insbesondere hinsichtlich Aufgaben, die eine starke globale Kohärenz erfordern, von der AR-Umgebung profitieren. Blickbewegungsanalysen sollen weitere Aufschlüsse über den Prozess der Kohärenzbildung während des Experimentierens in AR geben. |
Technical Reports |
Profitlich, Hans-Jürgen; Sonntag, Daniel BMBF Bundesministerium für Bildung und Forschung Kapelle-Ufer 1 D-10117 Berlin, , 2021. @techreport{11611, title = {A Case Study on Pros and Cons of Regular Expression Detection and Dependency Parsing for Negation Extraction from German Medical Documents. Technical Report}, author = {Hans-Jürgen Profitlich and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11611_CaseStudy_TR_final.pdf http://arxiv.org/abs/2105.09702}, year = {2021}, date = {2021-05-01}, volume = {1}, pages = {30}, address = {Bundesministerium für Bildung und Forschung Kapelle-Ufer 1 D-10117 Berlin}, institution = {BMBF}, abstract = {We describe our work on information extraction in medical documents written in German, especially detecting negations using an architecture based on the UIMA pipeline. Based on our previous work on software modules to cover medical concepts like diagnoses, examinations, etc. we employ a version of the NegEx regular expression algorithm with a large set of triggers as a baseline. We show how a significantly smaller trigger set is sufficient to achieve similar results, in order to reduce adaptation times to new text types. We elaborate on the question whether dependency parsing (based on the Stanford CoreNLP model) is a good alternative and describe the potentials and shortcomings of both approaches.}, keywords = {}, pubstate = {published}, tppubtype = {techreport} } We describe our work on information extraction in medical documents written in German, especially detecting negations using an architecture based on the UIMA pipeline. Based on our previous work on software modules to cover medical concepts like diagnoses, examinations, etc. we employ a version of the NegEx regular expression algorithm with a large set of triggers as a baseline. We show how a significantly smaller trigger set is sufficient to achieve similar results, in order to reduce adaptation times to new text types. We elaborate on the question whether dependency parsing (based on the Stanford CoreNLP model) is a good alternative and describe the potentials and shortcomings of both approaches. |
Nguyen, Ho Minh Duy; Nguyen, Thu T; Vu, Huong; Pham, Quang; Nguyen, Manh-Duy; Nguyen, Binh T; Sonntag, Daniel TATL: Task Agnostic Transfer Learning for Skin Attributes Detection Technical Report DFKI , 2021. @techreport{11594, title = {TATL: Task Agnostic Transfer Learning for Skin Attributes Detection}, author = {Ho Minh Duy Nguyen and Thu T Nguyen and Huong Vu and Quang Pham and Manh-Duy Nguyen and Binh T Nguyen and Daniel Sonntag}, url = {https://arxiv.org/pdf/2104.01641.pdf}, year = {2021}, date = {2021-04-01}, volume = {01}, institution = {DFKI}, abstract = {Existing skin attributes detection methods usually initialize with a pre-trained Imagenet network and then fine-tune the medical target task. However, we argue that such approaches are suboptimal because medical datasets are largely different from ImageNet and often contain limited training samples. In this work, we propose Task Agnostic Transfer Learning (TATL), a novel framework motivated by dermatologists' behaviors in the skincare context. TATL learns an attribute-agnostic segmenter that detects lesion skin regions and then transfers this knowledge to a set of attribute-specific classifiers to detect each particular region's attributes. Since TATL's attribute-agnostic segmenter only detects abnormal skin regions, it enjoys ample data from all attributes, allows transferring knowledge among features, and compensates for the lack of training data from rare attributes. We extensively evaluate TATL on two popular skin attributes detection benchmarks and show that TATL outperforms state-of-the-art methods while enjoying minimal model and computational complexity. We also provide theoretical insights and explanations for why TATL works well in practice.}, keywords = {}, pubstate = {published}, tppubtype = {techreport} } Existing skin attributes detection methods usually initialize with a pre-trained Imagenet network and then fine-tune the medical target task. However, we argue that such approaches are suboptimal because medical datasets are largely different from ImageNet and often contain limited training samples. In this work, we propose Task Agnostic Transfer Learning (TATL), a novel framework motivated by dermatologists' behaviors in the skincare context. TATL learns an attribute-agnostic segmenter that detects lesion skin regions and then transfers this knowledge to a set of attribute-specific classifiers to detect each particular region's attributes. Since TATL's attribute-agnostic segmenter only detects abnormal skin regions, it enjoys ample data from all attributes, allows transferring knowledge among features, and compensates for the lack of training data from rare attributes. We extensively evaluate TATL on two popular skin attributes detection benchmarks and show that TATL outperforms state-of-the-art methods while enjoying minimal model and computational complexity. We also provide theoretical insights and explanations for why TATL works well in practice. |
2020 |
Journal Articles |
Biswas, Rajarshi; Barz, Michael; Sonntag, Daniel Towards Explanatory Interactive Image Captioning Using Top-Down and Bottom-Up Features, Beam Search and Re-ranking Journal Article KI - Künstliche Intelligenz, German Journal on Artificial Intelligence - Organ des Fachbereiches "Künstliche Intelligenz" der Gesellschaft für Informatik e.V., 36 , pp. 1-14, 2020. @article{11236, title = {Towards Explanatory Interactive Image Captioning Using Top-Down and Bottom-Up Features, Beam Search and Re-ranking}, author = {Rajarshi Biswas and Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11236_2021_TOWARDS_EXPLANATORY_INTERACTIVE_IMAGE_CAPTIONING_USING_TOP-DOWN_AND_BOTTOM-UP_FEATURES,_BEAM_SEARCH_AND_RE-RANKING.pdf}, doi = {https://doi.org/10.1007/s13218-020-00679-2}, year = {2020}, date = {2020-07-01}, journal = {KI - Künstliche Intelligenz, German Journal on Artificial Intelligence - Organ des Fachbereiches "Künstliche Intelligenz" der Gesellschaft für Informatik e.V.}, volume = {36}, pages = {1-14}, publisher = {Springer}, abstract = {Image captioning is a challenging multimodal task. Significant improvements could be obtained by deep learning. Yet, captions generated by humans are still considered better, which makes it an interesting application for interactive machine learning and explainable artificial intelligence methods. In this work, we aim at improving the performance and explainability of the state-of-the-art method Show, Attend and Tell by augmenting their attention mechanism using additional bottom-up features. We compute visual attention on the joint embedding space formed by the union of high-level features and the low-level features obtained from the object specific salient regions of the input image. We embed the content of bounding boxes from a pre-trained Mask R-CNN model. This delivers state-of-the-art performance, while it provides explanatory features. Further, we discuss how interactive model improvement can be realized through re-ranking caption candidates using beam search decoders and explanatory features. We show that interactive re-ranking of beam search candidates has the potential to outperform the state-of-the-art in image captioning.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Image captioning is a challenging multimodal task. Significant improvements could be obtained by deep learning. Yet, captions generated by humans are still considered better, which makes it an interesting application for interactive machine learning and explainable artificial intelligence methods. In this work, we aim at improving the performance and explainability of the state-of-the-art method Show, Attend and Tell by augmenting their attention mechanism using additional bottom-up features. We compute visual attention on the joint embedding space formed by the union of high-level features and the low-level features obtained from the object specific salient regions of the input image. We embed the content of bounding boxes from a pre-trained Mask R-CNN model. This delivers state-of-the-art performance, while it provides explanatory features. Further, we discuss how interactive model improvement can be realized through re-ranking caption candidates using beam search decoders and explanatory features. We show that interactive re-ranking of beam search candidates has the potential to outperform the state-of-the-art in image captioning. |
Heimann-Steinert, A; Latendorf, A; Prange, Alexander; Sonntag, Daniel; Müller-Werdan, U Digital pen technology for conducting cognitive assessments: a cross-over study with older adults Journal Article Psychological Research, 85 , pp. 1-9, 2020. @article{11374, title = {Digital pen technology for conducting cognitive assessments: a cross-over study with older adults}, author = {A Heimann-Steinert and A Latendorf and Alexander Prange and Daniel Sonntag and U Müller-Werdan}, url = {https://www.dfki.de/fileadmin/user_upload/import/11374_Heimann-Steinert-2020-DigitalPenTechnologyForConduct.pdf https://link.springer.com/article/10.1007/s00426-020-01452-8#citeas}, year = {2020}, date = {2020-01-01}, journal = {Psychological Research}, volume = {85}, pages = {1-9}, publisher = {Springer}, abstract = {Many digitalized cognitive assessments exist to increase reliability, standardization, and objectivity. Particularly in older adults, the performance of digitized cognitive assessments can lead to poorer test results if they are unfamiliar with the computer, mouse, keyboard, or touch screen. In a cross-over design study, 40 older adults (age M = 74.4 ± 4.1 years) conducted the Trail Making Test A and B with a digital pen (digital pen tests, DPT) and a regular pencil (pencil tests, PT) to identify differences in performance. Furthermore, the tests conducted with a digital pen were analyzed manually (manual results, MR) and electronically (electronic results, ER) by an automized system algorithm to determine the possibilities of digital pen evaluation. ICC(2,k) showed a good level of agreement for TMT A (ICC(2,k) = 0.668) and TMT B (ICC(2,k) = 0.734) between PT and DPT. When comparing MR and ER, ICC(2,k) showed an excellent level of agreement in TMT A (ICC(2,k) = 0.999) and TMT B (ICC(2,k) = 0.994). The frequency of pen lifting correlates significantly with the execution time in TMT A (r = 0.372, p = 0.030) and TMT B (r = 0.567, p < 0.001). A digital pen can be used to perform the Trail Making Test, as it has been shown that there is no difference in the results due to the type of pen used. With a digital pen, the advantages of digitized testing can be used without having to accept the disadvantages.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Many digitalized cognitive assessments exist to increase reliability, standardization, and objectivity. Particularly in older adults, the performance of digitized cognitive assessments can lead to poorer test results if they are unfamiliar with the computer, mouse, keyboard, or touch screen. In a cross-over design study, 40 older adults (age M = 74.4 ± 4.1 years) conducted the Trail Making Test A and B with a digital pen (digital pen tests, DPT) and a regular pencil (pencil tests, PT) to identify differences in performance. Furthermore, the tests conducted with a digital pen were analyzed manually (manual results, MR) and electronically (electronic results, ER) by an automized system algorithm to determine the possibilities of digital pen evaluation. ICC(2,k) showed a good level of agreement for TMT A (ICC(2,k) = 0.668) and TMT B (ICC(2,k) = 0.734) between PT and DPT. When comparing MR and ER, ICC(2,k) showed an excellent level of agreement in TMT A (ICC(2,k) = 0.999) and TMT B (ICC(2,k) = 0.994). The frequency of pen lifting correlates significantly with the execution time in TMT A (r = 0.372, p = 0.030) and TMT B (r = 0.567, p < 0.001). A digital pen can be used to perform the Trail Making Test, as it has been shown that there is no difference in the results due to the type of pen used. With a digital pen, the advantages of digitized testing can be used without having to accept the disadvantages. |
Inproceedings |
Nunnari, Fabrizio; Ezema, Abraham; Sonntag, Daniel The effects of masking in melanoma image classification with CNNs towards international standards for image preprocessing Inproceedings 2020 EAI International Symposium on Medical Artificial Intelligence, EAI, 2020. @inproceedings{11368, title = {The effects of masking in melanoma image classification with CNNs towards international standards for image preprocessing}, author = {Fabrizio Nunnari and Abraham Ezema and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11368_2020_EAI_MedAI_StudyOnDatasetBias.pdf}, year = {2020}, date = {2020-12-01}, booktitle = {2020 EAI International Symposium on Medical Artificial Intelligence}, publisher = {EAI}, abstract = {The classification of skin lesion images is known to be biased by artifacts of the surrounding skin, but it is still not clear to what extent masking out healthy skin pixels influences classification performances, and why. To better understand this phenomenon, we apply different strategies of image masking (rectangular masks, circular masks, full masking, and image cropping) to three datasets of skin lesion images (ISIC2016, ISIC2018, and MedNode). We train CNN-based classifiers, provide performance metrics through a 10-fold cross-validation, and analyse the behaviour of Grad-CAM saliency maps through an automated visual inspection. Our experiments show that cropping is the best strategy to maintain classification performance and to significantly re- duce training times as well. Our analysis through visual inspection shows that CNNs have the tendency to focus on pixels of healthy skin when no malignant features can be identified. This suggests that CNNs have the tendency of "eagerly" looking for pixel areas to justify a classification choice, potentially leading to biased discriminators. To mitigate this effect, and to standardize image preprocessing, we suggest to crop images during dataset construction or before the learning step.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The classification of skin lesion images is known to be biased by artifacts of the surrounding skin, but it is still not clear to what extent masking out healthy skin pixels influences classification performances, and why. To better understand this phenomenon, we apply different strategies of image masking (rectangular masks, circular masks, full masking, and image cropping) to three datasets of skin lesion images (ISIC2016, ISIC2018, and MedNode). We train CNN-based classifiers, provide performance metrics through a 10-fold cross-validation, and analyse the behaviour of Grad-CAM saliency maps through an automated visual inspection. Our experiments show that cropping is the best strategy to maintain classification performance and to significantly re- duce training times as well. Our analysis through visual inspection shows that CNNs have the tendency to focus on pixels of healthy skin when no malignant features can be identified. This suggests that CNNs have the tendency of "eagerly" looking for pixel areas to justify a classification choice, potentially leading to biased discriminators. To mitigate this effect, and to standardize image preprocessing, we suggest to crop images during dataset construction or before the learning step. |
Nguyen, Ho Minh Duy; Ezema, Abraham; Nunnari, Fabrizio; Sonntag, Daniel A Visually Explainable Learning System for Skin Lesion Detection Using Multiscale Input with Attention U-Net Inproceedings KI 2020: Advances in Artificial Intelligence, pp. 313-319, Springer, 2020. @inproceedings{11178, title = {A Visually Explainable Learning System for Skin Lesion Detection Using Multiscale Input with Attention U-Net}, author = {Ho Minh Duy Nguyen and Abraham Ezema and Fabrizio Nunnari and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11178_KI_2020.pdf https://link.springer.com/chapter/10.1007/978-3-030-58285-2_28}, year = {2020}, date = {2020-09-01}, booktitle = {KI 2020: Advances in Artificial Intelligence}, volume = {12325}, pages = {313-319}, publisher = {Springer}, abstract = {In this work, we propose a new approach to automatically predict the locations of visual dermoscopic attributes for Task 2 of the ISIC 2018 Challenge. Our method is based on the Attention U-Net with multi-scale images as input. We apply a new strategy based on transfer learning, i.e., training the deep network for feature extraction by adapting the weights of the network trained for segmentation. Our tests show that, first, the proposed algorithm is on par or outperforms the best ISIC 2018 architectures (LeHealth and NMN) in the extraction of two visual features. Secondly, it uses only 1/30 of the training parameters; we observed less computation and memory requirements, which are particularly useful for future implementations on mobile devices. Finally, our approach generates visually explainable behaviour with uncertainty estimations to help doctors in diagnosis and treatment decisions.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } In this work, we propose a new approach to automatically predict the locations of visual dermoscopic attributes for Task 2 of the ISIC 2018 Challenge. Our method is based on the Attention U-Net with multi-scale images as input. We apply a new strategy based on transfer learning, i.e., training the deep network for feature extraction by adapting the weights of the network trained for segmentation. Our tests show that, first, the proposed algorithm is on par or outperforms the best ISIC 2018 architectures (LeHealth and NMN) in the extraction of two visual features. Secondly, it uses only 1/30 of the training parameters; we observed less computation and memory requirements, which are particularly useful for future implementations on mobile devices. Finally, our approach generates visually explainable behaviour with uncertainty estimations to help doctors in diagnosis and treatment decisions. |
Barz, Michael; Altmeyer, Kristin; Malone, Sarah; Lauer, Luisa; Sonntag, Daniel Digital Pen Features Predict Task Difficulty and User Performance of Cognitive Tests Inproceedings Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization, ACM, 2020. @inproceedings{10894, title = {Digital Pen Features Predict Task Difficulty and User Performance of Cognitive Tests}, author = {Michael Barz and Kristin Altmeyer and Sarah Malone and Luisa Lauer and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/10894_digital_pen_predicts_task_performance.pdf}, year = {2020}, date = {2020-07-01}, booktitle = {Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization}, publisher = {ACM}, abstract = {Digital pen signals were shown to be predictive for cognitive states, cognitive load and emotion in educational settings. We investigate whether low-level pen-based features can predict the difficulty of tasks in a cognitive test and the learner's performance in these tasks, which is inherently related to cognitive load, without a semantic content analysis. We record data for tasks of varying difficulty in a controlled study with children from elementary school. We include two versions of the Trail Making Test (TMT) and six drawing patterns from the Snijders-Oomen Non-verbal intelligence test (SON) as tasks that feature increasing levels of difficulty. We examine how accurately we can predict the task difficulty and the user performance as a measure for cognitive load using support vector machines and gradient boosted decision trees with different feature selection strategies. The results show that our correlation-based feature selection is beneficial for model training, in particular when samples from TMT and SON are concatenated for joint modelling of difficulty and time. Our findings open up opportunities for technology-enhanced adaptive learning.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Digital pen signals were shown to be predictive for cognitive states, cognitive load and emotion in educational settings. We investigate whether low-level pen-based features can predict the difficulty of tasks in a cognitive test and the learner's performance in these tasks, which is inherently related to cognitive load, without a semantic content analysis. We record data for tasks of varying difficulty in a controlled study with children from elementary school. We include two versions of the Trail Making Test (TMT) and six drawing patterns from the Snijders-Oomen Non-verbal intelligence test (SON) as tasks that feature increasing levels of difficulty. We examine how accurately we can predict the task difficulty and the user performance as a measure for cognitive load using support vector machines and gradient boosted decision trees with different feature selection strategies. The results show that our correlation-based feature selection is beneficial for model training, in particular when samples from TMT and SON are concatenated for joint modelling of difficulty and time. Our findings open up opportunities for technology-enhanced adaptive learning. |
Barz, Michael; Stauden, Sven; Sonntag, Daniel Visual Search Target Inference in Natural Interaction Settings with Machine Learning Inproceedings Bulling, Andreas; Huckauf, Anke; Jain, Eakta; Radach, Ralph; Weiskopf, Daniel (Ed.): ACM Symposium on Eye Tracking Research and Applications, Association for Computing Machinery, 2020. @inproceedings{10893, title = {Visual Search Target Inference in Natural Interaction Settings with Machine Learning}, author = {Michael Barz and Sven Stauden and Daniel Sonntag}, editor = {Andreas Bulling and Anke Huckauf and Eakta Jain and Ralph Radach and Daniel Weiskopf}, year = {2020}, date = {2020-05-01}, booktitle = {ACM Symposium on Eye Tracking Research and Applications}, publisher = {Association for Computing Machinery}, abstract = {Visual search is a perceptual task in which humans aim at identifying a search target object such as a traffic sign among other objects. Search target inference subsumes computational methods for predicting this target by tracking and analyzing overt behavioral cues of that person, e.g., the human gaze and fixated visual stimuli. We present a generic approach to inferring search targets in natural scenes by predicting the class of the surrounding image segment. Our method encodes visual search sequences as histograms of fixated segment classes determined by SegNet, a deep learning image segmentation model for natural scenes. We compare our sequence encoding and model training (SVM) to a recent baseline from the literature for predicting the target segment. Also, we use a new search target inference dataset. The results show that, first, our new segmentation-based sequence encoding outperforms the method from the literature, and second, that it enables target inference in natural settings.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Visual search is a perceptual task in which humans aim at identifying a search target object such as a traffic sign among other objects. Search target inference subsumes computational methods for predicting this target by tracking and analyzing overt behavioral cues of that person, e.g., the human gaze and fixated visual stimuli. We present a generic approach to inferring search targets in natural scenes by predicting the class of the surrounding image segment. Our method encodes visual search sequences as histograms of fixated segment classes determined by SegNet, a deep learning image segmentation model for natural scenes. We compare our sequence encoding and model training (SVM) to a recent baseline from the literature for predicting the target segment. Also, we use a new search target inference dataset. The results show that, first, our new segmentation-based sequence encoding outperforms the method from the literature, and second, that it enables target inference in natural settings. |
Nunnari, Fabrizio; Bhuvaneshwara, Chirag; Ezema, Abraham Obinwanne; Sonntag, Daniel A Study on the Fusion of Pixels and Patient Metadata in CNN-Based Classification of Skin Lesion Images Inproceedings Holzinger, Andreas; Kieseberg, Peter; Tjoa, Min A; Weippl, Edgar (Ed.): Machine Learning and Knowledge Extraction, pp. 191-208, Springer International Publishing, 2020. @inproceedings{11113, title = {A Study on the Fusion of Pixels and Patient Metadata in CNN-Based Classification of Skin Lesion Images}, author = {Fabrizio Nunnari and Chirag Bhuvaneshwara and Abraham Obinwanne Ezema and Daniel Sonntag}, editor = {Andreas Holzinger and Peter Kieseberg and Min A Tjoa and Edgar Weippl}, url = {https://www.dfki.de/fileadmin/user_upload/import/11113_Nunnari20CD-MAKE.pdf https://link.springer.com/chapter/10.1007/978-3-030-57321-8_11}, year = {2020}, date = {2020-01-01}, booktitle = {Machine Learning and Knowledge Extraction}, pages = {191-208}, publisher = {Springer International Publishing}, abstract = {We present a study on the fusion of pixel data and patient metadata (age, gender, and body location) for improving the classification of skin lesion images. The experiments have been conducted with the ISIC 2019 skin lesion classification challenge data set. Taking two plain convolutional neural networks (CNNs) as a baseline, metadata are merged using either non-neural machine learning methods (tree-based and support vector machines) or shallow neural networks. Results show that shallow neural networks outperform other approaches in all overall evaluation measures. However, despite the increase in the classification accuracy (up to +19.1%), interestingly, the average per-class sensitivity decreases in three out of four cases for CNNs, thus suggesting that using metadata penalizes the prediction accuracy for lower represented classes. A study on the patient metadata shows that age is the most useful metadatum as a decision criterion, followed by body location and gender.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We present a study on the fusion of pixel data and patient metadata (age, gender, and body location) for improving the classification of skin lesion images. The experiments have been conducted with the ISIC 2019 skin lesion classification challenge data set. Taking two plain convolutional neural networks (CNNs) as a baseline, metadata are merged using either non-neural machine learning methods (tree-based and support vector machines) or shallow neural networks. Results show that shallow neural networks outperform other approaches in all overall evaluation measures. However, despite the increase in the classification accuracy (up to +19.1%), interestingly, the average per-class sensitivity decreases in three out of four cases for CNNs, thus suggesting that using metadata penalizes the prediction accuracy for lower represented classes. A study on the patient metadata shows that age is the most useful metadatum as a decision criterion, followed by body location and gender. |
Müller, Julia; Sprenger, Max; Franke, Tobias; Lukowicz, Paul; Reidick, Claudia; Herrlich, Marc Game of TUK: deploying a large-scale activity-boosting gamification project in a university context Inproceedings Mensch und Computer, ACM, 2020. @inproceedings{12112, title = {Game of TUK: deploying a large-scale activity-boosting gamification project in a university context}, author = {Julia Müller and Max Sprenger and Tobias Franke and Paul Lukowicz and Claudia Reidick and Marc Herrlich}, url = {https://www.dfki.de/fileadmin/user_upload/import/12112_2020_GAME_OF_TUK-_DEPLOYING_A_LARGE-SCALE_ACTIVITY-BOOSTING_GAMIFICATION_PROJECT_IN_A_UNIVERSITY_CONTEXT.pdf https://dl.acm.org/doi/abs/10.1145/3404983.3410008}, year = {2020}, date = {2020-01-01}, booktitle = {Mensch und Computer}, publisher = {ACM}, abstract = {We present Game of TUK, a gamified mobile app to increase physical activity among students at TU Kaiserslautern. The scale of our project with almost 2,000 players over the course of four weeks is unique for a project in a university context. We present feedback we received and share our insights. Our results show that location-based activities in particular were very popular. In contrast, mini-games included in the app did not contribute as much to user activity as expected.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We present Game of TUK, a gamified mobile app to increase physical activity among students at TU Kaiserslautern. The scale of our project with almost 2,000 players over the course of four weeks is unique for a project in a university context. We present feedback we received and share our insights. Our results show that location-based activities in particular were very popular. In contrast, mini-games included in the app did not contribute as much to user activity as expected. |
Technical Reports |
Sonntag, Daniel; Nunnari, Fabrizio; Profitlich, Hans-Jürgen The Skincare project, an interactive deep learning system for differential diagnosis of malignant skin lesions. Technical Report BMBF, H2020 Bundesministerium für Bildung und Forschung Kapelle-Ufer 1 D-10117 Berlin, , 2020. @techreport{10912, title = {The Skincare project, an interactive deep learning system for differential diagnosis of malignant skin lesions.}, author = {Daniel Sonntag and Fabrizio Nunnari and Hans-Jürgen Profitlich}, url = {https://www.dfki.de/fileadmin/user_upload/import/10912_main2.pdf https://arxiv.org/abs/2005.09448}, year = {2020}, date = {2020-05-01}, volume = {1}, address = {Bundesministerium für Bildung und Forschung Kapelle-Ufer 1 D-10117 Berlin}, institution = {BMBF, H2020}, abstract = {A shortage of dermatologists causes long wait times for patients who seek dermatologic care. In addition, the diagnostic accuracy of general practitioners has been reported to be lower than the accuracy of artificial intelligence software. This article describes the Skincare project (H2020, EIT Digital). Contributions include enabling technology for clinical decision support based on interactive machine learning (IML), a reference architecture towards a Digital European Healthcare Infrastructure (also cf. EIT MCPS), technical components for aggregating digitised patient information, and the integration of decision support technology into clinical test-bed environments. However, the main contribution is a diagnostic and decision support system in dermatology for patients and doctors, an interactive deep learning system for differential diagnosis of malignant skin lesions. In this article, we describe its functionalities and the user interfaces to facilitate machine learning from human input. The baseline deep learning system, which delivers state-of-the-art results and the potential to augment general practitioners and even dermatologists, was developed and validated using de-identified cases from a dermatology image data base (ISIC), which has about 20000 cases for development and validation, provided by board-certified dermatologists defining the reference standard for every case. ISIC allows for differential diagnosis, a ranked list of eight diagnoses, that is used to plan treatments in the common setting of diagnostic ambiguity. We give an overall description of the outcome of the Skincare project, and we focus on the steps to support communication and coordination between humans and machine in IML. This is an integral part of the development of future cognitive assistants in the medical domain, and we describe the necessary intelligent user interfaces.}, keywords = {}, pubstate = {published}, tppubtype = {techreport} } A shortage of dermatologists causes long wait times for patients who seek dermatologic care. In addition, the diagnostic accuracy of general practitioners has been reported to be lower than the accuracy of artificial intelligence software. This article describes the Skincare project (H2020, EIT Digital). Contributions include enabling technology for clinical decision support based on interactive machine learning (IML), a reference architecture towards a Digital European Healthcare Infrastructure (also cf. EIT MCPS), technical components for aggregating digitised patient information, and the integration of decision support technology into clinical test-bed environments. However, the main contribution is a diagnostic and decision support system in dermatology for patients and doctors, an interactive deep learning system for differential diagnosis of malignant skin lesions. In this article, we describe its functionalities and the user interfaces to facilitate machine learning from human input. The baseline deep learning system, which delivers state-of-the-art results and the potential to augment general practitioners and even dermatologists, was developed and validated using de-identified cases from a dermatology image data base (ISIC), which has about 20000 cases for development and validation, provided by board-certified dermatologists defining the reference standard for every case. ISIC allows for differential diagnosis, a ranked list of eight diagnoses, that is used to plan treatments in the common setting of diagnostic ambiguity. We give an overall description of the outcome of the Skincare project, and we focus on the steps to support communication and coordination between humans and machine in IML. This is an integral part of the development of future cognitive assistants in the medical domain, and we describe the necessary intelligent user interfaces. |
Sonntag, Daniel Künstliche Intelligenz gegen das Coronavirus Technical Report DFKI, BMBF, BMG , 2020. @techreport{10809, title = {Künstliche Intelligenz gegen das Coronavirus}, author = {Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/10809_corona2.pages.pdf}, year = {2020}, date = {2020-01-01}, volume = {1}, institution = {DFKI, BMBF, BMG}, abstract = {Künstliche Intelligenz hat in den letzten Jahren eine neue Reifephase erreicht und entwickelt sich zum Treiber der Digitalisierung in allen Lebensbereichen. Die KI ist eine Querschnittstechnologie, die für alle Bereiche der Medizin mit Bilddaten, Textdaten und Biodaten von großer Bedeutung ist. Es gibt keinen medizinischen Bereich, der nicht von KI beeinflusst werden wird (siehe auch http://www.dfki.de/ MedicalCPS/?p=1111). Hier werden vier Felder gegen das Coronavirus beleuchtet, (1) die Bilddiagnostik, (2) Gensequenzierung, (3) die automatische Auswertung medizinischer Texte und (4) das Katastrophenmanagement.}, keywords = {}, pubstate = {published}, tppubtype = {techreport} } Künstliche Intelligenz hat in den letzten Jahren eine neue Reifephase erreicht und entwickelt sich zum Treiber der Digitalisierung in allen Lebensbereichen. Die KI ist eine Querschnittstechnologie, die für alle Bereiche der Medizin mit Bilddaten, Textdaten und Biodaten von großer Bedeutung ist. Es gibt keinen medizinischen Bereich, der nicht von KI beeinflusst werden wird (siehe auch http://www.dfki.de/ MedicalCPS/?p=1111). Hier werden vier Felder gegen das Coronavirus beleuchtet, (1) die Bilddiagnostik, (2) Gensequenzierung, (3) die automatische Auswertung medizinischer Texte und (4) das Katastrophenmanagement. |
Kalimuthu, Marimuthu; Nunnari, Fabrizio; Sonntag, Daniel A Competitive Deep Neural Network Approach for the ImageCLEFmed Caption 2020 Task Technical Report German Research Center for Artificial Intelligence , 2020. @techreport{11188, title = {A Competitive Deep Neural Network Approach for the ImageCLEFmed Caption 2020 Task}, author = {Marimuthu Kalimuthu and Fabrizio Nunnari and Daniel Sonntag}, year = {2020}, date = {2020-01-01}, volume = {o.A.}, institution = {German Research Center for Artificial Intelligence}, abstract = {The aim of ImageCLEFmed Caption task is to develop a system that automatically labels radiology images with relevant medical concepts. We describe our Deep Neural Network (DNN) based approach for tackling this problem. On the challenge test set of 3,534 radiology images, our system achieves an F1 score of 0.375 and ranks high, 12th among all systems that were successfully submitted to the challenge, whereby we only rely on the provided data sources and do not use any external medical knowledge or ontologies, or pretrained models from other medical image repositories or application domains.}, keywords = {}, pubstate = {published}, tppubtype = {techreport} } The aim of ImageCLEFmed Caption task is to develop a system that automatically labels radiology images with relevant medical concepts. We describe our Deep Neural Network (DNN) based approach for tackling this problem. On the challenge test set of 3,534 radiology images, our system achieves an F1 score of 0.375 and ranks high, 12th among all systems that were successfully submitted to the challenge, whereby we only rely on the provided data sources and do not use any external medical knowledge or ontologies, or pretrained models from other medical image repositories or application domains. |
2019 |
Journal Articles |
Barz, Michael; Sonntag, Daniel Incremental Improvement of a Question Answering System by Re-ranking Answer Candidates using Machine Learning Journal Article Computing Research Repository eprint Journal, abs/1908.10149 , pp. 1-13, 2019. @article{10895, title = {Incremental Improvement of a Question Answering System by Re-ranking Answer Candidates using Machine Learning}, author = {Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/10895_1908.10149.pdf https://arxiv.org/abs/1908.10149}, year = {2019}, date = {2019-08-01}, journal = {Computing Research Repository eprint Journal}, volume = {abs/1908.10149}, pages = {1-13}, publisher = {arXiv}, abstract = {We implement a method for re-ranking top-10 results of a state-of-the-art question answering (QA) system. The goal of our re-ranking approach is to improve the answer selection given the user question and the top-10 candidates. We focus on improving deployed QA systems that do not allow re-training or re-training comes at a high cost. Our re-ranking approach learns a similarity function using n-gram based features using the query, the answer and the initial system confidence as input. Our contributions are: (1) we generate a QA training corpus starting from 877 answers from the customer care domain of T-Mobile Austria, (2) we implement a state-of-the-art QA pipeline using neural sentence embeddings that encode queries in the same space than the answer index, and (3) we evaluate the QA pipeline and our re-ranking approach using a separately provided test set. The test set can be considered to be available after deployment of the system, e.g., based on feedback of users. Our results show that the system performance, in terms of top-n accuracy and the mean reciprocal rank, benefits from re-ranking using gradient boosted regression trees. On average, the mean reciprocal rank improves by 9.15%.}, keywords = {}, pubstate = {published}, tppubtype = {article} } We implement a method for re-ranking top-10 results of a state-of-the-art question answering (QA) system. The goal of our re-ranking approach is to improve the answer selection given the user question and the top-10 candidates. We focus on improving deployed QA systems that do not allow re-training or re-training comes at a high cost. Our re-ranking approach learns a similarity function using n-gram based features using the query, the answer and the initial system confidence as input. Our contributions are: (1) we generate a QA training corpus starting from 877 answers from the customer care domain of T-Mobile Austria, (2) we implement a state-of-the-art QA pipeline using neural sentence embeddings that encode queries in the same space than the answer index, and (3) we evaluate the QA pipeline and our re-ranking approach using a separately provided test set. The test set can be considered to be available after deployment of the system, e.g., based on feedback of users. Our results show that the system performance, in terms of top-n accuracy and the mean reciprocal rank, benefits from re-ranking using gradient boosted regression trees. On average, the mean reciprocal rank improves by 9.15%. |
Sonntag, Daniel Künstliche Intelligenz in der Medizin -- Holzweg oder Heilversprechen? Journal Article HNO, 67 , pp. 343-349, 2019. @article{10833, title = {Künstliche Intelligenz in der Medizin -- Holzweg oder Heilversprechen?}, author = {Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/10833_sonntag-hno-ki-DFKI-repository.pdf}, doi = {https://doi.org/10.1007/s00106-019-0665-z}, year = {2019}, date = {2019-01-01}, journal = {HNO}, volume = {67}, pages = {343-349}, publisher = {Springer}, abstract = {Künstliche Intelligenz (KI) hat in den letzten Jahren eine neue Reifephase erreicht und entwickelt sich zum Treiber der Digitalisierung in allen Lebensbereichen. Die KI ist eine Querschnittstechnologie, die für alle Bereiche der Medizin mit Bilddaten, Textdaten und Biodaten von großer Bedeutung ist. Es gibt keinen medizinischen Bereich, der nicht von KI beeinflusst werden wird. Dabei spielt die klinische Entscheidungsunterstützung eine wichtige Rolle. Gerade beim medizinischen Workflow-Management und bei der Vorhersage des Behandlungserfolgs bzw. Behandlungsergebnisses etablieren sich KI-Methoden. In der Bilddiagnose und im Patientenmanagement können KI-Systeme bereits unterstützen, aber sie können keine kritischen Entscheidungen vorschlagen. Die jeweiligen Präventions- oder Therapiemaßnahmen können mit KI-Unterstützung sinnvoller bewertet werden, allerdings ist die Abdeckung der Krankheiten noch viel zu gering, um robuste Systeme für den klinischen Alltag zu erstellen. Der flächendeckende Einsatz setzt Fortbildungsmaßnahmen für Ärzte voraus, um die Entscheidung treffen zu können, wann auf automatische Entscheidungsunterstützung vertraut werden kann.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Künstliche Intelligenz (KI) hat in den letzten Jahren eine neue Reifephase erreicht und entwickelt sich zum Treiber der Digitalisierung in allen Lebensbereichen. Die KI ist eine Querschnittstechnologie, die für alle Bereiche der Medizin mit Bilddaten, Textdaten und Biodaten von großer Bedeutung ist. Es gibt keinen medizinischen Bereich, der nicht von KI beeinflusst werden wird. Dabei spielt die klinische Entscheidungsunterstützung eine wichtige Rolle. Gerade beim medizinischen Workflow-Management und bei der Vorhersage des Behandlungserfolgs bzw. Behandlungsergebnisses etablieren sich KI-Methoden. In der Bilddiagnose und im Patientenmanagement können KI-Systeme bereits unterstützen, aber sie können keine kritischen Entscheidungen vorschlagen. Die jeweiligen Präventions- oder Therapiemaßnahmen können mit KI-Unterstützung sinnvoller bewertet werden, allerdings ist die Abdeckung der Krankheiten noch viel zu gering, um robuste Systeme für den klinischen Alltag zu erstellen. Der flächendeckende Einsatz setzt Fortbildungsmaßnahmen für Ärzte voraus, um die Entscheidung treffen zu können, wann auf automatische Entscheidungsunterstützung vertraut werden kann. |
Book Chapters |
Sonntag, Daniel Medical and Health Systems Book Chapter The Handbook of Multimodal-Multisensor Interfaces: Language Processing, Software, Commercialization, and Emerging Directions - Volume 3, pp. 423-476, Association for Computing Machinery and Morgan & Claypool, 2019. @inbook{10812, title = {Medical and Health Systems}, author = {Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/10812_Medical-and-Health-Systems.pdf}, doi = {https://doi.org/10.1145/3233795.3233808}, year = {2019}, date = {2019-01-01}, booktitle = {The Handbook of Multimodal-Multisensor Interfaces: Language Processing, Software, Commercialization, and Emerging Directions - Volume 3}, pages = {423-476}, publisher = {Association for Computing Machinery and Morgan & Claypool}, abstract = {In this chapter, we discuss the trends of mutlimodal-multisensor interfaces for medical and health systems. We emphasize the theoretical foundations of multimodal interfaces and systems in the healthcare domain. We aim to provide a basis for motivating and accelerating future interfaces for medical and health systems. Therefore, we provide many examples of existing and futuristic systems. For each of these systems, we define a classification into clinical systems and non-clinical systems, as well as sub-classes of multimodal and multisensor interfaces, to help structure the recent work in this emerging research field of medical and health systems.}, keywords = {}, pubstate = {published}, tppubtype = {inbook} } In this chapter, we discuss the trends of mutlimodal-multisensor interfaces for medical and health systems. We emphasize the theoretical foundations of multimodal interfaces and systems in the healthcare domain. We aim to provide a basis for motivating and accelerating future interfaces for medical and health systems. Therefore, we provide many examples of existing and futuristic systems. For each of these systems, we define a classification into clinical systems and non-clinical systems, as well as sub-classes of multimodal and multisensor interfaces, to help structure the recent work in this emerging research field of medical and health systems. |
2022 |
Journal Articles |
Apparent sunk cost effect in rational agents Journal Article Science Advances, 8 , pp. 1-10, 2022. |
Implicit Estimation of Paragraph Relevance from Eye Movements Journal Article Frontiers in Computer Science, 3 , pp. 13, 2022. |
TATL: Task Agnostic Transfer Learning for Skin Attributes Detection Journal Article Medical Image Analysis, 01 , pp. 1-27, 2022. |
Inproceedings |
Interactive Assessment Tool for Gaze-based Machine Learning Models in Information Retrieval Inproceedings ACM SIGIR Conference on Human Information Interaction and Retrieval, Association for Computing Machinery, 2022. |
Encountering Students' Learning Difficulties in Electrics - Didactical Concept and Prototype of Augmented Reality-Toolkit Inproceedings Fostering scientific citizenship in an uncertain world - ESERA 2021 e-Proceedings, University of Minho, 2022. |
LMGP: Lifted Multicut Meets Geometry Projections for Multi-Camera Multi-Object Tracking Inproceedings Conference on Computer Vision and Pattern Recognition (CVPR) 2022, IEEE/CVF, 2022. |
Miscellaneous |
Interactive Machine Learning for Image Captioning Miscellaneous The AAAI-22 Workshop on Interactive Machine Learning, 2022. |
Technical Reports |
LMGP: Lifted Multicut Meets Geometry Projections for Multi-Camera Multi-Object Tracking Technical Report DFKI, MPI-INF , 2022. |
2021 |
Journal Articles |
Künstliche Intelligenz in der Medizin und Gynäkologie – Holzweg oder Heilversprechen? Journal Article Der Gynäkologe, 1 , pp. 1-7, 2021. |
ARETT: Augmented Reality Eye Tracking Toolkit for Head Mounted Displays Journal Article Sensors - Open Access Journal, 21 , pp. 18, 2021. |
Minimizing false negative rate in melanoma detection and providing insight into the causes of classification Journal Article Computing Research Repository eprint Journal, abs/2102.09199 , pp. 1-14, 2021. |
Automatic Visual Attention Detection for Mobile Eye Tracking Using Pre-Trained Computer Vision Models and Human Gaze Journal Article Sensors - Open Access Journal, 21 , pp. 21, 2021. |
Investigating the Usability of a Head-Mounted Display Augmented Reality Device in Elementary School Children Journal Article Sensors - Open Access Journal, 21 , pp. 20, 2021. |
Incollections |
Incremental Improvement of a Question Answering System by Re-ranking Answer Candidates Using Machine Learning Incollection Marchi, Erik; Siniscalchi, Sabato Marco; Cumani, Sandro; Salerno, Valerio Mario; Li, Haizhou (Ed.): Increasing Naturalness and Flexibility in Spoken Dialogue Interaction: 10th International Workshop on Spoken Dialogue Systems, pp. 367-379, Springer, 2021. |
Inproceedings |
Improving German Image Captions using Machine Translation and Transfer Learning Inproceedings Espinosa-Anke, Luis; Martin-Vide, Carlos; Spasic, Irena (Ed.): Statistical Language and Speech Processing SLSP 2021, Springer, Council Chamber Glamorgan Building King Edward VII Ave Cathays Park Cardiff CF10 3WT, 2021. |
A Multilingual Benchmark for Probing Negation-Awareness with Minimal Pairs Inproceedings Proceedings of the 25th Conference on Computational Natural Language Learning (CoNLL), pp. 224-257, Association for Computational Linguistics, 2021. |
mDAPT: Multilingual Domain Adaptive Pretraining in a Single Model Inproceedings Findings of the Association for Computational Linguistics - EMNLP 2021, pp. 3404-3018, Association for Computational Linguistics, 2021. |
Live Testing of Flexibilities on Distribution Grid Level – Simulation Setup and Lessons Learned Inproceedings IEEE Electric Power and Energy Conference, IEEE Xplore, IEEE Operations Center 445 Hoes Lane Piscataway, NJ 08854-4141 USA Phone: +1 732 981 0060, 2021. |
Automatic Recognition and Augmentation of Attended Objects in Real-Time Using Eye Tracking and a Head-Mounted Display Inproceedings ACM Symposium on Eye Tracking Research and Applications, pp. 4, Association for Computing Machinery, 2021. |
An Attention Mechanism using Multiple Knowledge Sources for COVID-19 Detection from CT Images Inproceedings The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), AAAI, 2021. |
Explainable Automatic Evaluation of the Trail Making Test for Dementia Screening Inproceedings Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, 2021. |
EyeLogin - Calibration-Free Authentication Method for Public Displays Using Eye Gaze Inproceedings ACM Symposium on Eye Tracking Research and Applications, Association for Computing Machinery, 2021. |
A Software Toolbox for Deploying Deep Learning Decision Support Systems with XAI Capabilities Inproceedings Companion of the 2021 ACM SIGCHI Symposium on Engineering Interactive Computing Systems, Association for Computing Machinery, 2021. |
Assessing Cognitive Test Performance Using Automatic Digital Pen Features Analysis Inproceedings Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization, Association for Computing Machinery, 2021. |
Self-Supervised Domain Adaptation for Diabetic Retinopathy Grading using Vessel Image Reconstruction Inproceedings Proceedings of the 44th German Conference on Artificial Intelligence, Springer, 2021. |
On the Overlap Between Grad-CAM Saliency Maps and Explainable Visual Features in Skin Cancer Images Inproceedings Holzinger, Andreas; Kieseberg, Peter; Tjoa, Min A; Weippl, Edgar (Ed.): Machine Learning and Knowledge Extraction, pp. 241-253, Springer International Publishing, 2021. |
Anomaly Detection for Skin Lesion Images Using Replicator Neural Networks Inproceedings Holzinger, Andreas; Kieseberg, Peter; Tjoa, Min A; Weippl, Edgar (Ed.): Machine Learning and Knowledge Extraction, pp. 225-240, Springer International Publishing, 2021. |
Crop It, but Not Too Much: The Effects of Masking on the Classification of Melanoma Images Inproceedings Edelkamp, Stefan; Rueckert, Elmar; Möller, Ralf (Ed.): KI 2021: Advances in Artificial Intelligence, pp. 179-193, Springer International Publishing, 2021. |
A Demonstrator for Interactive Image Clustering and Fine-Tuning Neural Networks in Virtual Reality Inproceedings Edelkamp, Stefan; Rueckert, Elmar; Möller, Ralf (Ed.): KI 2021: Advances in Artificial Intelligence, pp. 194-203, Springer International Publishing, 2021. |
Multisensor-Pipeline: A Lightweight, Flexible, and Extensible Framework for Building Multimodal-Multisensor Interfaces Inproceedings Companion Publication of the 2021 International Conference on Multimodal Interaction, pp. 13-18, Association for Computing Machinery, 2021. |
Miscellaneous |
Interaction with Explanations in the XAINES Project Miscellaneous Trustworthy AI in the Wild Workshop 2021, 2021. |
Measuring Intrisic and Extraneous Cognitive Load in Elementary School Students Using Subjective Ratings and Smart Pen Data Miscellaneous 13th International Cognitive Load Theory Conference, 2021. |
13th International Cognitive Load Theory Conference, 2021. |
Augmented Reality zur Förderung globaler Kohärenzbildungsprozesse beim Experimentieren im Sachunterricht Miscellaneous Tagung der Fachgruppe Pädagogische Psychologie, 2021. |
Technical Reports |
BMBF Bundesministerium für Bildung und Forschung Kapelle-Ufer 1 D-10117 Berlin, , 2021. |
TATL: Task Agnostic Transfer Learning for Skin Attributes Detection Technical Report DFKI , 2021. |
2020 |
Journal Articles |
Towards Explanatory Interactive Image Captioning Using Top-Down and Bottom-Up Features, Beam Search and Re-ranking Journal Article KI - Künstliche Intelligenz, German Journal on Artificial Intelligence - Organ des Fachbereiches "Künstliche Intelligenz" der Gesellschaft für Informatik e.V., 36 , pp. 1-14, 2020. |
Digital pen technology for conducting cognitive assessments: a cross-over study with older adults Journal Article Psychological Research, 85 , pp. 1-9, 2020. |
Inproceedings |
The effects of masking in melanoma image classification with CNNs towards international standards for image preprocessing Inproceedings 2020 EAI International Symposium on Medical Artificial Intelligence, EAI, 2020. |
A Visually Explainable Learning System for Skin Lesion Detection Using Multiscale Input with Attention U-Net Inproceedings KI 2020: Advances in Artificial Intelligence, pp. 313-319, Springer, 2020. |
Digital Pen Features Predict Task Difficulty and User Performance of Cognitive Tests Inproceedings Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization, ACM, 2020. |
Visual Search Target Inference in Natural Interaction Settings with Machine Learning Inproceedings Bulling, Andreas; Huckauf, Anke; Jain, Eakta; Radach, Ralph; Weiskopf, Daniel (Ed.): ACM Symposium on Eye Tracking Research and Applications, Association for Computing Machinery, 2020. |
A Study on the Fusion of Pixels and Patient Metadata in CNN-Based Classification of Skin Lesion Images Inproceedings Holzinger, Andreas; Kieseberg, Peter; Tjoa, Min A; Weippl, Edgar (Ed.): Machine Learning and Knowledge Extraction, pp. 191-208, Springer International Publishing, 2020. |
Game of TUK: deploying a large-scale activity-boosting gamification project in a university context Inproceedings Mensch und Computer, ACM, 2020. |
Technical Reports |
The Skincare project, an interactive deep learning system for differential diagnosis of malignant skin lesions. Technical Report BMBF, H2020 Bundesministerium für Bildung und Forschung Kapelle-Ufer 1 D-10117 Berlin, , 2020. |
Künstliche Intelligenz gegen das Coronavirus Technical Report DFKI, BMBF, BMG , 2020. |
A Competitive Deep Neural Network Approach for the ImageCLEFmed Caption 2020 Task Technical Report German Research Center for Artificial Intelligence , 2020. |
2019 |
Journal Articles |
Incremental Improvement of a Question Answering System by Re-ranking Answer Candidates using Machine Learning Journal Article Computing Research Repository eprint Journal, abs/1908.10149 , pp. 1-13, 2019. |
Künstliche Intelligenz in der Medizin -- Holzweg oder Heilversprechen? Journal Article HNO, 67 , pp. 343-349, 2019. |
Book Chapters |
Medical and Health Systems Book Chapter The Handbook of Multimodal-Multisensor Interfaces: Language Processing, Software, Commercialization, and Emerging Directions - Volume 3, pp. 423-476, Association for Computing Machinery and Morgan & Claypool, 2019. |