2020 |
Inproceedings |
Barz, Michael; Altmeyer, Kristin; Malone, Sarah; Lauer, Luisa; Sonntag, Daniel Digital Pen Features Predict Task Difficulty and User Performance of Cognitive Tests Inproceedings Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization, ACM, 2020. @inproceedings{10894, title = {Digital Pen Features Predict Task Difficulty and User Performance of Cognitive Tests}, author = {Michael Barz and Kristin Altmeyer and Sarah Malone and Luisa Lauer and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/10894_digital_pen_predicts_task_performance.pdf}, year = {2020}, date = {2020-07-01}, booktitle = {Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization}, publisher = {ACM}, abstract = {Digital pen signals were shown to be predictive for cognitive states, cognitive load and emotion in educational settings. We investigate whether low-level pen-based features can predict the difficulty of tasks in a cognitive test and the learner's performance in these tasks, which is inherently related to cognitive load, without a semantic content analysis. We record data for tasks of varying difficulty in a controlled study with children from elementary school. We include two versions of the Trail Making Test (TMT) and six drawing patterns from the Snijders-Oomen Non-verbal intelligence test (SON) as tasks that feature increasing levels of difficulty. We examine how accurately we can predict the task difficulty and the user performance as a measure for cognitive load using support vector machines and gradient boosted decision trees with different feature selection strategies. The results show that our correlation-based feature selection is beneficial for model training, in particular when samples from TMT and SON are concatenated for joint modelling of difficulty and time. Our findings open up opportunities for technology-enhanced adaptive learning.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Digital pen signals were shown to be predictive for cognitive states, cognitive load and emotion in educational settings. We investigate whether low-level pen-based features can predict the difficulty of tasks in a cognitive test and the learner's performance in these tasks, which is inherently related to cognitive load, without a semantic content analysis. We record data for tasks of varying difficulty in a controlled study with children from elementary school. We include two versions of the Trail Making Test (TMT) and six drawing patterns from the Snijders-Oomen Non-verbal intelligence test (SON) as tasks that feature increasing levels of difficulty. We examine how accurately we can predict the task difficulty and the user performance as a measure for cognitive load using support vector machines and gradient boosted decision trees with different feature selection strategies. The results show that our correlation-based feature selection is beneficial for model training, in particular when samples from TMT and SON are concatenated for joint modelling of difficulty and time. Our findings open up opportunities for technology-enhanced adaptive learning. |
Barz, Michael; Stauden, Sven; Sonntag, Daniel Visual Search Target Inference in Natural Interaction Settings with Machine Learning Inproceedings Bulling, Andreas; Huckauf, Anke; Jain, Eakta; Radach, Ralph; Weiskopf, Daniel (Ed.): ACM Symposium on Eye Tracking Research and Applications, Association for Computing Machinery, 2020. @inproceedings{10893, title = {Visual Search Target Inference in Natural Interaction Settings with Machine Learning}, author = {Michael Barz and Sven Stauden and Daniel Sonntag}, editor = {Andreas Bulling and Anke Huckauf and Eakta Jain and Ralph Radach and Daniel Weiskopf}, year = {2020}, date = {2020-05-01}, booktitle = {ACM Symposium on Eye Tracking Research and Applications}, publisher = {Association for Computing Machinery}, abstract = {Visual search is a perceptual task in which humans aim at identifying a search target object such as a traffic sign among other objects. Search target inference subsumes computational methods for predicting this target by tracking and analyzing overt behavioral cues of that person, e.g., the human gaze and fixated visual stimuli. We present a generic approach to inferring search targets in natural scenes by predicting the class of the surrounding image segment. Our method encodes visual search sequences as histograms of fixated segment classes determined by SegNet, a deep learning image segmentation model for natural scenes. We compare our sequence encoding and model training (SVM) to a recent baseline from the literature for predicting the target segment. Also, we use a new search target inference dataset. The results show that, first, our new segmentation-based sequence encoding outperforms the method from the literature, and second, that it enables target inference in natural settings.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Visual search is a perceptual task in which humans aim at identifying a search target object such as a traffic sign among other objects. Search target inference subsumes computational methods for predicting this target by tracking and analyzing overt behavioral cues of that person, e.g., the human gaze and fixated visual stimuli. We present a generic approach to inferring search targets in natural scenes by predicting the class of the surrounding image segment. Our method encodes visual search sequences as histograms of fixated segment classes determined by SegNet, a deep learning image segmentation model for natural scenes. We compare our sequence encoding and model training (SVM) to a recent baseline from the literature for predicting the target segment. Also, we use a new search target inference dataset. The results show that, first, our new segmentation-based sequence encoding outperforms the method from the literature, and second, that it enables target inference in natural settings. |
Nunnari, Fabrizio; Bhuvaneshwara, Chirag; Ezema, Abraham Obinwanne; Sonntag, Daniel A Study on the Fusion of Pixels and Patient Metadata in CNN-Based Classification of Skin Lesion Images Inproceedings Holzinger, Andreas; Kieseberg, Peter; Tjoa, Min A; Weippl, Edgar (Ed.): Machine Learning and Knowledge Extraction, pp. 191-208, Springer International Publishing, 2020. @inproceedings{11113, title = {A Study on the Fusion of Pixels and Patient Metadata in CNN-Based Classification of Skin Lesion Images}, author = {Fabrizio Nunnari and Chirag Bhuvaneshwara and Abraham Obinwanne Ezema and Daniel Sonntag}, editor = {Andreas Holzinger and Peter Kieseberg and Min A Tjoa and Edgar Weippl}, url = {https://www.dfki.de/fileadmin/user_upload/import/11113_Nunnari20CD-MAKE.pdf https://link.springer.com/chapter/10.1007/978-3-030-57321-8_11}, year = {2020}, date = {2020-01-01}, booktitle = {Machine Learning and Knowledge Extraction}, pages = {191-208}, publisher = {Springer International Publishing}, abstract = {We present a study on the fusion of pixel data and patient metadata (age, gender, and body location) for improving the classification of skin lesion images. The experiments have been conducted with the ISIC 2019 skin lesion classification challenge data set. Taking two plain convolutional neural networks (CNNs) as a baseline, metadata are merged using either non-neural machine learning methods (tree-based and support vector machines) or shallow neural networks. Results show that shallow neural networks outperform other approaches in all overall evaluation measures. However, despite the increase in the classification accuracy (up to +19.1%), interestingly, the average per-class sensitivity decreases in three out of four cases for CNNs, thus suggesting that using metadata penalizes the prediction accuracy for lower represented classes. A study on the patient metadata shows that age is the most useful metadatum as a decision criterion, followed by body location and gender.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We present a study on the fusion of pixel data and patient metadata (age, gender, and body location) for improving the classification of skin lesion images. The experiments have been conducted with the ISIC 2019 skin lesion classification challenge data set. Taking two plain convolutional neural networks (CNNs) as a baseline, metadata are merged using either non-neural machine learning methods (tree-based and support vector machines) or shallow neural networks. Results show that shallow neural networks outperform other approaches in all overall evaluation measures. However, despite the increase in the classification accuracy (up to +19.1%), interestingly, the average per-class sensitivity decreases in three out of four cases for CNNs, thus suggesting that using metadata penalizes the prediction accuracy for lower represented classes. A study on the patient metadata shows that age is the most useful metadatum as a decision criterion, followed by body location and gender. |
Müller, Julia; Sprenger, Max; Franke, Tobias; Lukowicz, Paul; Reidick, Claudia; Herrlich, Marc Game of TUK: deploying a large-scale activity-boosting gamification project in a university context Inproceedings Mensch und Computer, ACM, 2020. @inproceedings{12112, title = {Game of TUK: deploying a large-scale activity-boosting gamification project in a university context}, author = {Julia Müller and Max Sprenger and Tobias Franke and Paul Lukowicz and Claudia Reidick and Marc Herrlich}, url = {https://www.dfki.de/fileadmin/user_upload/import/12112_2020_GAME_OF_TUK-_DEPLOYING_A_LARGE-SCALE_ACTIVITY-BOOSTING_GAMIFICATION_PROJECT_IN_A_UNIVERSITY_CONTEXT.pdf https://dl.acm.org/doi/abs/10.1145/3404983.3410008}, year = {2020}, date = {2020-01-01}, booktitle = {Mensch und Computer}, publisher = {ACM}, abstract = {We present Game of TUK, a gamified mobile app to increase physical activity among students at TU Kaiserslautern. The scale of our project with almost 2,000 players over the course of four weeks is unique for a project in a university context. We present feedback we received and share our insights. Our results show that location-based activities in particular were very popular. In contrast, mini-games included in the app did not contribute as much to user activity as expected.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We present Game of TUK, a gamified mobile app to increase physical activity among students at TU Kaiserslautern. The scale of our project with almost 2,000 players over the course of four weeks is unique for a project in a university context. We present feedback we received and share our insights. Our results show that location-based activities in particular were very popular. In contrast, mini-games included in the app did not contribute as much to user activity as expected. |
Technical Reports |
Sonntag, Daniel; Nunnari, Fabrizio; Profitlich, Hans-Jürgen The Skincare project, an interactive deep learning system for differential diagnosis of malignant skin lesions. Technical Report BMBF, H2020 Bundesministerium für Bildung und Forschung Kapelle-Ufer 1 D-10117 Berlin, , 2020. @techreport{10912, title = {The Skincare project, an interactive deep learning system for differential diagnosis of malignant skin lesions.}, author = {Daniel Sonntag and Fabrizio Nunnari and Hans-Jürgen Profitlich}, url = {https://www.dfki.de/fileadmin/user_upload/import/10912_main2.pdf https://arxiv.org/abs/2005.09448}, year = {2020}, date = {2020-05-01}, volume = {1}, address = {Bundesministerium für Bildung und Forschung Kapelle-Ufer 1 D-10117 Berlin}, institution = {BMBF, H2020}, abstract = {A shortage of dermatologists causes long wait times for patients who seek dermatologic care. In addition, the diagnostic accuracy of general practitioners has been reported to be lower than the accuracy of artificial intelligence software. This article describes the Skincare project (H2020, EIT Digital). Contributions include enabling technology for clinical decision support based on interactive machine learning (IML), a reference architecture towards a Digital European Healthcare Infrastructure (also cf. EIT MCPS), technical components for aggregating digitised patient information, and the integration of decision support technology into clinical test-bed environments. However, the main contribution is a diagnostic and decision support system in dermatology for patients and doctors, an interactive deep learning system for differential diagnosis of malignant skin lesions. In this article, we describe its functionalities and the user interfaces to facilitate machine learning from human input. The baseline deep learning system, which delivers state-of-the-art results and the potential to augment general practitioners and even dermatologists, was developed and validated using de-identified cases from a dermatology image data base (ISIC), which has about 20000 cases for development and validation, provided by board-certified dermatologists defining the reference standard for every case. ISIC allows for differential diagnosis, a ranked list of eight diagnoses, that is used to plan treatments in the common setting of diagnostic ambiguity. We give an overall description of the outcome of the Skincare project, and we focus on the steps to support communication and coordination between humans and machine in IML. This is an integral part of the development of future cognitive assistants in the medical domain, and we describe the necessary intelligent user interfaces.}, keywords = {}, pubstate = {published}, tppubtype = {techreport} } A shortage of dermatologists causes long wait times for patients who seek dermatologic care. In addition, the diagnostic accuracy of general practitioners has been reported to be lower than the accuracy of artificial intelligence software. This article describes the Skincare project (H2020, EIT Digital). Contributions include enabling technology for clinical decision support based on interactive machine learning (IML), a reference architecture towards a Digital European Healthcare Infrastructure (also cf. EIT MCPS), technical components for aggregating digitised patient information, and the integration of decision support technology into clinical test-bed environments. However, the main contribution is a diagnostic and decision support system in dermatology for patients and doctors, an interactive deep learning system for differential diagnosis of malignant skin lesions. In this article, we describe its functionalities and the user interfaces to facilitate machine learning from human input. The baseline deep learning system, which delivers state-of-the-art results and the potential to augment general practitioners and even dermatologists, was developed and validated using de-identified cases from a dermatology image data base (ISIC), which has about 20000 cases for development and validation, provided by board-certified dermatologists defining the reference standard for every case. ISIC allows for differential diagnosis, a ranked list of eight diagnoses, that is used to plan treatments in the common setting of diagnostic ambiguity. We give an overall description of the outcome of the Skincare project, and we focus on the steps to support communication and coordination between humans and machine in IML. This is an integral part of the development of future cognitive assistants in the medical domain, and we describe the necessary intelligent user interfaces. |
Sonntag, Daniel Künstliche Intelligenz gegen das Coronavirus Technical Report DFKI, BMBF, BMG , 2020. @techreport{10809, title = {Künstliche Intelligenz gegen das Coronavirus}, author = {Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/10809_corona2.pages.pdf}, year = {2020}, date = {2020-01-01}, volume = {1}, institution = {DFKI, BMBF, BMG}, abstract = {Künstliche Intelligenz hat in den letzten Jahren eine neue Reifephase erreicht und entwickelt sich zum Treiber der Digitalisierung in allen Lebensbereichen. Die KI ist eine Querschnittstechnologie, die für alle Bereiche der Medizin mit Bilddaten, Textdaten und Biodaten von großer Bedeutung ist. Es gibt keinen medizinischen Bereich, der nicht von KI beeinflusst werden wird (siehe auch http://www.dfki.de/ MedicalCPS/?p=1111). Hier werden vier Felder gegen das Coronavirus beleuchtet, (1) die Bilddiagnostik, (2) Gensequenzierung, (3) die automatische Auswertung medizinischer Texte und (4) das Katastrophenmanagement.}, keywords = {}, pubstate = {published}, tppubtype = {techreport} } Künstliche Intelligenz hat in den letzten Jahren eine neue Reifephase erreicht und entwickelt sich zum Treiber der Digitalisierung in allen Lebensbereichen. Die KI ist eine Querschnittstechnologie, die für alle Bereiche der Medizin mit Bilddaten, Textdaten und Biodaten von großer Bedeutung ist. Es gibt keinen medizinischen Bereich, der nicht von KI beeinflusst werden wird (siehe auch http://www.dfki.de/ MedicalCPS/?p=1111). Hier werden vier Felder gegen das Coronavirus beleuchtet, (1) die Bilddiagnostik, (2) Gensequenzierung, (3) die automatische Auswertung medizinischer Texte und (4) das Katastrophenmanagement. |
Kalimuthu, Marimuthu; Nunnari, Fabrizio; Sonntag, Daniel A Competitive Deep Neural Network Approach for the ImageCLEFmed Caption 2020 Task Technical Report German Research Center for Artificial Intelligence , 2020. @techreport{11188, title = {A Competitive Deep Neural Network Approach for the ImageCLEFmed Caption 2020 Task}, author = {Marimuthu Kalimuthu and Fabrizio Nunnari and Daniel Sonntag}, year = {2020}, date = {2020-01-01}, volume = {o.A.}, institution = {German Research Center for Artificial Intelligence}, abstract = {The aim of ImageCLEFmed Caption task is to develop a system that automatically labels radiology images with relevant medical concepts. We describe our Deep Neural Network (DNN) based approach for tackling this problem. On the challenge test set of 3,534 radiology images, our system achieves an F1 score of 0.375 and ranks high, 12th among all systems that were successfully submitted to the challenge, whereby we only rely on the provided data sources and do not use any external medical knowledge or ontologies, or pretrained models from other medical image repositories or application domains.}, keywords = {}, pubstate = {published}, tppubtype = {techreport} } The aim of ImageCLEFmed Caption task is to develop a system that automatically labels radiology images with relevant medical concepts. We describe our Deep Neural Network (DNN) based approach for tackling this problem. On the challenge test set of 3,534 radiology images, our system achieves an F1 score of 0.375 and ranks high, 12th among all systems that were successfully submitted to the challenge, whereby we only rely on the provided data sources and do not use any external medical knowledge or ontologies, or pretrained models from other medical image repositories or application domains. |
2019 |
Journal Articles |
Barz, Michael; Sonntag, Daniel Incremental Improvement of a Question Answering System by Re-ranking Answer Candidates using Machine Learning Journal Article Computing Research Repository eprint Journal, abs/1908.10149 , pp. 1-13, 2019. @article{10895, title = {Incremental Improvement of a Question Answering System by Re-ranking Answer Candidates using Machine Learning}, author = {Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/10895_1908.10149.pdf https://arxiv.org/abs/1908.10149}, year = {2019}, date = {2019-08-01}, journal = {Computing Research Repository eprint Journal}, volume = {abs/1908.10149}, pages = {1-13}, publisher = {arXiv}, abstract = {We implement a method for re-ranking top-10 results of a state-of-the-art question answering (QA) system. The goal of our re-ranking approach is to improve the answer selection given the user question and the top-10 candidates. We focus on improving deployed QA systems that do not allow re-training or re-training comes at a high cost. Our re-ranking approach learns a similarity function using n-gram based features using the query, the answer and the initial system confidence as input. Our contributions are: (1) we generate a QA training corpus starting from 877 answers from the customer care domain of T-Mobile Austria, (2) we implement a state-of-the-art QA pipeline using neural sentence embeddings that encode queries in the same space than the answer index, and (3) we evaluate the QA pipeline and our re-ranking approach using a separately provided test set. The test set can be considered to be available after deployment of the system, e.g., based on feedback of users. Our results show that the system performance, in terms of top-n accuracy and the mean reciprocal rank, benefits from re-ranking using gradient boosted regression trees. On average, the mean reciprocal rank improves by 9.15%.}, keywords = {}, pubstate = {published}, tppubtype = {article} } We implement a method for re-ranking top-10 results of a state-of-the-art question answering (QA) system. The goal of our re-ranking approach is to improve the answer selection given the user question and the top-10 candidates. We focus on improving deployed QA systems that do not allow re-training or re-training comes at a high cost. Our re-ranking approach learns a similarity function using n-gram based features using the query, the answer and the initial system confidence as input. Our contributions are: (1) we generate a QA training corpus starting from 877 answers from the customer care domain of T-Mobile Austria, (2) we implement a state-of-the-art QA pipeline using neural sentence embeddings that encode queries in the same space than the answer index, and (3) we evaluate the QA pipeline and our re-ranking approach using a separately provided test set. The test set can be considered to be available after deployment of the system, e.g., based on feedback of users. Our results show that the system performance, in terms of top-n accuracy and the mean reciprocal rank, benefits from re-ranking using gradient boosted regression trees. On average, the mean reciprocal rank improves by 9.15%. |
Sonntag, Daniel Künstliche Intelligenz in der Medizin -- Holzweg oder Heilversprechen? Journal Article HNO, 67 , pp. 343-349, 2019. @article{10833, title = {Künstliche Intelligenz in der Medizin -- Holzweg oder Heilversprechen?}, author = {Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/10833_sonntag-hno-ki-DFKI-repository.pdf}, doi = {https://doi.org/10.1007/s00106-019-0665-z}, year = {2019}, date = {2019-01-01}, journal = {HNO}, volume = {67}, pages = {343-349}, publisher = {Springer}, abstract = {Künstliche Intelligenz (KI) hat in den letzten Jahren eine neue Reifephase erreicht und entwickelt sich zum Treiber der Digitalisierung in allen Lebensbereichen. Die KI ist eine Querschnittstechnologie, die für alle Bereiche der Medizin mit Bilddaten, Textdaten und Biodaten von großer Bedeutung ist. Es gibt keinen medizinischen Bereich, der nicht von KI beeinflusst werden wird. Dabei spielt die klinische Entscheidungsunterstützung eine wichtige Rolle. Gerade beim medizinischen Workflow-Management und bei der Vorhersage des Behandlungserfolgs bzw. Behandlungsergebnisses etablieren sich KI-Methoden. In der Bilddiagnose und im Patientenmanagement können KI-Systeme bereits unterstützen, aber sie können keine kritischen Entscheidungen vorschlagen. Die jeweiligen Präventions- oder Therapiemaßnahmen können mit KI-Unterstützung sinnvoller bewertet werden, allerdings ist die Abdeckung der Krankheiten noch viel zu gering, um robuste Systeme für den klinischen Alltag zu erstellen. Der flächendeckende Einsatz setzt Fortbildungsmaßnahmen für Ärzte voraus, um die Entscheidung treffen zu können, wann auf automatische Entscheidungsunterstützung vertraut werden kann.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Künstliche Intelligenz (KI) hat in den letzten Jahren eine neue Reifephase erreicht und entwickelt sich zum Treiber der Digitalisierung in allen Lebensbereichen. Die KI ist eine Querschnittstechnologie, die für alle Bereiche der Medizin mit Bilddaten, Textdaten und Biodaten von großer Bedeutung ist. Es gibt keinen medizinischen Bereich, der nicht von KI beeinflusst werden wird. Dabei spielt die klinische Entscheidungsunterstützung eine wichtige Rolle. Gerade beim medizinischen Workflow-Management und bei der Vorhersage des Behandlungserfolgs bzw. Behandlungsergebnisses etablieren sich KI-Methoden. In der Bilddiagnose und im Patientenmanagement können KI-Systeme bereits unterstützen, aber sie können keine kritischen Entscheidungen vorschlagen. Die jeweiligen Präventions- oder Therapiemaßnahmen können mit KI-Unterstützung sinnvoller bewertet werden, allerdings ist die Abdeckung der Krankheiten noch viel zu gering, um robuste Systeme für den klinischen Alltag zu erstellen. Der flächendeckende Einsatz setzt Fortbildungsmaßnahmen für Ärzte voraus, um die Entscheidung treffen zu können, wann auf automatische Entscheidungsunterstützung vertraut werden kann. |
Book Chapters |
Sonntag, Daniel Medical and Health Systems Book Chapter The Handbook of Multimodal-Multisensor Interfaces: Language Processing, Software, Commercialization, and Emerging Directions - Volume 3, pp. 423-476, Association for Computing Machinery and Morgan & Claypool, 2019. @inbook{10812, title = {Medical and Health Systems}, author = {Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/10812_Medical-and-Health-Systems.pdf}, doi = {https://doi.org/10.1145/3233795.3233808}, year = {2019}, date = {2019-01-01}, booktitle = {The Handbook of Multimodal-Multisensor Interfaces: Language Processing, Software, Commercialization, and Emerging Directions - Volume 3}, pages = {423-476}, publisher = {Association for Computing Machinery and Morgan & Claypool}, abstract = {In this chapter, we discuss the trends of mutlimodal-multisensor interfaces for medical and health systems. We emphasize the theoretical foundations of multimodal interfaces and systems in the healthcare domain. We aim to provide a basis for motivating and accelerating future interfaces for medical and health systems. Therefore, we provide many examples of existing and futuristic systems. For each of these systems, we define a classification into clinical systems and non-clinical systems, as well as sub-classes of multimodal and multisensor interfaces, to help structure the recent work in this emerging research field of medical and health systems.}, keywords = {}, pubstate = {published}, tppubtype = {inbook} } In this chapter, we discuss the trends of mutlimodal-multisensor interfaces for medical and health systems. We emphasize the theoretical foundations of multimodal interfaces and systems in the healthcare domain. We aim to provide a basis for motivating and accelerating future interfaces for medical and health systems. Therefore, we provide many examples of existing and futuristic systems. For each of these systems, we define a classification into clinical systems and non-clinical systems, as well as sub-classes of multimodal and multisensor interfaces, to help structure the recent work in this emerging research field of medical and health systems. |
Incollections |
Feld, Michael; Neßelrath, Robert; Schwartz, Tim Software Platforms and Toolkits for Building Multimodal Systems and Applications Incollection Oviatt, Sharon; Schuller, Björn; Cohen, Philip R; Potamianos, Gerasimos; Krüger, Antonio; Sonntag, Daniel (Ed.): The Handbook of Multimodal-Multisensor Interfaces, Volume 3 -- Language Processing, Software, Commercialization, and Emerging Directions, #23 , pp. 145-190, Morgan & Claypool Publishers, 2019. @incollection{10492, title = {Software Platforms and Toolkits for Building Multimodal Systems and Applications}, author = {Michael Feld and Robert Neßelrath and Tim Schwartz}, editor = {Sharon Oviatt and Björn Schuller and Philip R Cohen and Gerasimos Potamianos and Antonio Krüger and Daniel Sonntag}, url = {http://www.morganclaypoolpublishers.com/catalog_Orig/product_info.php?products_id=1428}, year = {2019}, date = {2019-07-01}, booktitle = {The Handbook of Multimodal-Multisensor Interfaces, Volume 3 -- Language Processing, Software, Commercialization, and Emerging Directions}, volume = {#23}, pages = {145-190}, publisher = {Morgan & Claypool Publishers}, abstract = {This chapter introduces various concepts needed for the realization of multimodal systems. Alongside an overview of the evolution of multimodal dialogue platform architectures, we give an overview of the major components found in most of today’s architectures: input and output processing; fusion and discourse processing; dialogue management; fission and presentation planning; and middleware. We compare several different dialogue management approaches, look in more detail at how the fusion component works, and introduce dialogue act annotation with communicative functions. We will explain the multimodal reference resolution process and consider the special case of cross-modal references. Finally, we present SiAM- dp, an actual multimodal dialogue platform used in a number of research projects and prototypes and highlight some of its particular features.}, keywords = {}, pubstate = {published}, tppubtype = {incollection} } This chapter introduces various concepts needed for the realization of multimodal systems. Alongside an overview of the evolution of multimodal dialogue platform architectures, we give an overview of the major components found in most of today’s architectures: input and output processing; fusion and discourse processing; dialogue management; fission and presentation planning; and middleware. We compare several different dialogue management approaches, look in more detail at how the fusion component works, and introduce dialogue act annotation with communicative functions. We will explain the multimodal reference resolution process and consider the special case of cross-modal references. Finally, we present SiAM- dp, an actual multimodal dialogue platform used in a number of research projects and prototypes and highlight some of its particular features. |
Inproceedings |
Biswas, Rajarshi; Mogadala, Aditya; Barz, Michael; Sonntag, Daniel; Klakow, Dietrich Automatic Judgement of Neural Network-Generated Image Captions Inproceedings Martin-Vide, Carlos; Purver, Matthew; Pollak, Senja (Ed.): Statistical Language and Speech Processing - 7th International Conference, Proceedings, pp. 261-272, Springer, Jamova cesta 39 1000 Ljubljana Slovenia, 2019. @inproceedings{10707, title = {Automatic Judgement of Neural Network-Generated Image Captions}, author = {Rajarshi Biswas and Aditya Mogadala and Michael Barz and Daniel Sonntag and Dietrich Klakow}, editor = {Carlos Martin-Vide and Matthew Purver and Senja Pollak}, url = {https://www.springerprofessional.de/en/automatic-judgement-of-neural-network-generated-image-captions/17214374}, year = {2019}, date = {2019-09-01}, booktitle = {Statistical Language and Speech Processing - 7th International Conference, Proceedings}, volume = {11816}, pages = {261-272}, publisher = {Springer}, address = {Jamova cesta 39 1000 Ljubljana Slovenia}, abstract = {Manual evaluation of individual results of natural language generation tasks is one of the bottlenecks. It is very time consuming and expensive if it is, for example, crowdsourced. In this work, we address this problem for the specific task of automatic image captioning. We automatically generate human-like judgements on grammatical correctness, image relevance and diversity of the captions obtained from a neural image caption generator. For this purpose, we use pool-based active learning with uncertainty sampling and represent the captions using fixed size vectors from Google’s Universal Sentence Encoder. In addition, we test common metrics, such as BLEU, ROUGE, METEOR, Levenshtein distance, and n-gram counts and report F1 score for the classifiers used under the active learning scheme for this task. To the best of our knowledge, our work is the first in this direction and promises to reduce time, cost, and human effort.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Manual evaluation of individual results of natural language generation tasks is one of the bottlenecks. It is very time consuming and expensive if it is, for example, crowdsourced. In this work, we address this problem for the specific task of automatic image captioning. We automatically generate human-like judgements on grammatical correctness, image relevance and diversity of the captions obtained from a neural image caption generator. For this purpose, we use pool-based active learning with uncertainty sampling and represent the captions using fixed size vectors from Google’s Universal Sentence Encoder. In addition, we test common metrics, such as BLEU, ROUGE, METEOR, Levenshtein distance, and n-gram counts and report F1 score for the classifiers used under the active learning scheme for this task. To the best of our knowledge, our work is the first in this direction and promises to reduce time, cost, and human effort. |
Kalimuthu, Marimuthu; Barz, Michael; Sonntag, Daniel Incremental Domain Adaptation for Neural Machine Translation in Low-Resource Settings Inproceedings Proceedings of the Fourth Arabic Natural Language Processing Workshop, pp. 1-10, Association for Computational Linguistics, 2019. @inproceedings{10520, title = {Incremental Domain Adaptation for Neural Machine Translation in Low-Resource Settings}, author = {Marimuthu Kalimuthu and Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/10520_W19-4601.pdf}, year = {2019}, date = {2019-08-01}, booktitle = {Proceedings of the Fourth Arabic Natural Language Processing Workshop}, pages = {1-10}, publisher = {Association for Computational Linguistics}, abstract = {We study the problem of incremental domain adaptation of a generic neural machine translation model with limited resources (e.g., budget and time) for human translations or model training. In this paper, we propose a novel query strategy for selecting ``unlabeled'' samples from a new domain based on sentence embeddings for Arabic. We accelerate the fine-tuning process of the generic model to the target domain. Specifically, our approach estimates the informativeness of instances from the target domain by comparing the distance of their sentence embeddings to embeddings from the generic domain. We perform machine translation experiments (Ar-to-En direction) for comparing a random sampling baseline with our new approach, similar to active learning, using two small update sets for simulating the work of human translators. For the prescribed setting we can save more than 50% of the annotation costs without loss in quality, demonstrating the effectiveness of our approach.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We study the problem of incremental domain adaptation of a generic neural machine translation model with limited resources (e.g., budget and time) for human translations or model training. In this paper, we propose a novel query strategy for selecting ``unlabeled'' samples from a new domain based on sentence embeddings for Arabic. We accelerate the fine-tuning process of the generic model to the target domain. Specifically, our approach estimates the informativeness of instances from the target domain by comparing the distance of their sentence embeddings to embeddings from the generic domain. We perform machine translation experiments (Ar-to-En direction) for comparing a random sampling baseline with our new approach, similar to active learning, using two small update sets for simulating the work of human translators. For the prescribed setting we can save more than 50% of the annotation costs without loss in quality, demonstrating the effectiveness of our approach. |
Prange, Alexander; Niemann, Mira; Latendorf, Antje; Steinert, Anika; Sonntag, Daniel Multimodal Speech-based Dialogue for the Mini-Mental State Examination Inproceedings Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, pp. CS13:1-CS13:8, ACM, 2019. @inproceedings{10353, title = {Multimodal Speech-based Dialogue for the Mini-Mental State Examination}, author = {Alexander Prange and Mira Niemann and Antje Latendorf and Anika Steinert and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/10353_2019_Multimodal_speech-based_dialogue_for_the_Mini-Mental_State_Examination.pdf http://doi.acm.org/10.1145/3290607.3299040}, year = {2019}, date = {2019-01-01}, booktitle = {Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems}, pages = {CS13:1-CS13:8}, publisher = {ACM}, abstract = {We present a system-initiative multimodal speech-based dialogue system for the Mini-Mental State Examination (MMSE). The MMSE is a questionnaire-based cognitive test, which is traditionally administered by a trained expert using pen and paper and afterwards scored manually to measure cognitive impairment. By using a digital pen and speech dialogue, we implement a multimodal system for the automatic execution and evaluation of the MMSE. User input is evaluated and scored in real-time. We present a user experience study with 15 participants and compare the usability of the proposed system with the traditional approach. Our experiment suggests that both modes perform equally well in terms of usability, but the proposed system has higher novelty ratings. We compare assessment scorings produced by our system with manual scorings made by domain experts.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We present a system-initiative multimodal speech-based dialogue system for the Mini-Mental State Examination (MMSE). The MMSE is a questionnaire-based cognitive test, which is traditionally administered by a trained expert using pen and paper and afterwards scored manually to measure cognitive impairment. By using a digital pen and speech dialogue, we implement a multimodal system for the automatic execution and evaluation of the MMSE. User input is evaluated and scored in real-time. We present a user experience study with 15 participants and compare the usability of the proposed system with the traditional approach. Our experiment suggests that both modes perform equally well in terms of usability, but the proposed system has higher novelty ratings. We compare assessment scorings produced by our system with manual scorings made by domain experts. |
Prange, Alexander; Sonntag, Daniel Modeling Cognitive Status through Automatic Scoring of a Digital Version of the Clock Drawing Test Inproceedings Proceedings of the 27th ACM Conference on User Modeling, Adaptation and Personalization, pp. 70-77, ACM, 2019. @inproceedings{10518, title = {Modeling Cognitive Status through Automatic Scoring of a Digital Version of the Clock Drawing Test}, author = {Alexander Prange and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/10518_umap.pdf}, doi = {https://doi.org/10.1145/3320435.3320452}, year = {2019}, date = {2019-01-01}, booktitle = {Proceedings of the 27th ACM Conference on User Modeling, Adaptation and Personalization}, pages = {70-77}, publisher = {ACM}, abstract = {The Clock Drawing Test is used as a cognitive assessment tool in geriatrics to detect signs of dementia or to model the progress of stroke recovery. The result is scored manually by a trained professional. We implement the Mendez scoring scheme and create a hierarchy of error categories that model the test characteristics of the clock drawing test, based on a set of impaired clock examples provided by a geriatrics clinic. Using a digital pen we recorded 120 clock samples for evaluating the automatic scoring system, with a total of 2400 error samples distributed over the 20 error classes of the Mendez scoring scheme. Error classes are scored automatically using a handwriting and gesture recognition framework. Results show that we provide a clinically relevant cognitive model for each subject. In addition, we heavily reduce the time spent on manual scoring. We compare manual scoring results with results produced by our automated system.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The Clock Drawing Test is used as a cognitive assessment tool in geriatrics to detect signs of dementia or to model the progress of stroke recovery. The result is scored manually by a trained professional. We implement the Mendez scoring scheme and create a hierarchy of error categories that model the test characteristics of the clock drawing test, based on a set of impaired clock examples provided by a geriatrics clinic. Using a digital pen we recorded 120 clock samples for evaluating the automatic scoring system, with a total of 2400 error samples distributed over the 20 error classes of the Mendez scoring scheme. Error classes are scored automatically using a handwriting and gesture recognition framework. Results show that we provide a clinically relevant cognitive model for each subject. In addition, we heavily reduce the time spent on manual scoring. We compare manual scoring results with results produced by our automated system. |
Technical Reports |
Sonntag, Daniel Wie funktionieren neuronale Netze eigentlich? Technical Report DFKI , 2019. @techreport{10725, title = {Wie funktionieren neuronale Netze eigentlich?}, author = {Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/10725_NN-DS.pdf}, year = {2019}, date = {2019-09-01}, volume = {1}, pages = {2}, institution = {DFKI}, abstract = {Wie funktionieren neuronale Netze eigentlich?}, keywords = {}, pubstate = {published}, tppubtype = {techreport} } Wie funktionieren neuronale Netze eigentlich? |
Profitlich, Hans-Jürgen; Sonntag, Daniel Interactivity and Transparency in Medical Risk Assessment with Supersparse Linear Integer Models Technical Report BMBF , 2019. @techreport{11177, title = {Interactivity and Transparency in Medical Risk Assessment with Supersparse Linear Integer Models}, author = {Hans-Jürgen Profitlich and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11177_integer.pdf http://arxiv.org/abs/1911.12119}, year = {2019}, date = {2019-01-01}, journal = {CoRR}, volume = {abs/1911.12119}, publisher = {ArXiv}, institution = {BMBF}, abstract = {Scoring systems are linear classification models that only require users to add or subtract a few small numbers in order to make a prediction. They are used for example by clinicians to assess the risk of medical conditions. This work focuses on our approach to implement an intuitive user interface to allow a clinician to generate such scoring systems interactively, based on the RiskSLIM machine learning library. We describe the technical architecture which allows a medical professional who is not specialised in developing and applying machine learning algorithms to create competitive transparent supersparse linear integer models in an interactive way. We demonstrate our prototype machine learning system in the nephrology domain, where doctors can interactively sub-select datasets to compute models, explore scoring tables that correspond to the learned models, and check the quality of the transparent solutions from a medical perspective.}, keywords = {}, pubstate = {published}, tppubtype = {techreport} } Scoring systems are linear classification models that only require users to add or subtract a few small numbers in order to make a prediction. They are used for example by clinicians to assess the risk of medical conditions. This work focuses on our approach to implement an intuitive user interface to allow a clinician to generate such scoring systems interactively, based on the RiskSLIM machine learning library. We describe the technical architecture which allows a medical professional who is not specialised in developing and applying machine learning algorithms to create competitive transparent supersparse linear integer models in an interactive way. We demonstrate our prototype machine learning system in the nephrology domain, where doctors can interactively sub-select datasets to compute models, explore scoring tables that correspond to the learned models, and check the quality of the transparent solutions from a medical perspective. |
2018 |
Journal Articles |
Prange, Alexander; Barz, Michael; Sonntag, Daniel A categorisation and implementation of digital pen features for behaviour characterisation Journal Article Computing Research Repository eprint Journal, abs/1810.03970 , pp. 1-42, 2018. @article{10183, title = {A categorisation and implementation of digital pen features for behaviour characterisation}, author = {Alexander Prange and Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/10183_1810.03970.pdf http://arxiv.org/abs/1810.03970}, year = {2018}, date = {2018-10-01}, journal = {Computing Research Repository eprint Journal}, volume = {abs/1810.03970}, pages = {1-42}, publisher = {arXiv}, abstract = {In this paper we provide a categorisation and implementation of digital ink features for behaviour characterisation. Based on four feature sets taken from literature, we provide a categorisation in different classes of syntactic and semantic features. We implemented a publicly available framework to calculate these features and show its deployment in the use case of analysing cognitive assessments performed using a digital pen.}, keywords = {}, pubstate = {published}, tppubtype = {article} } In this paper we provide a categorisation and implementation of digital ink features for behaviour characterisation. Based on four feature sets taken from literature, we provide a categorisation in different classes of syntactic and semantic features. We implemented a publicly available framework to calculate these features and show its deployment in the use case of analysing cognitive assessments performed using a digital pen. |
Zacharias, Jan; Barz, Michael; Sonntag, Daniel A Survey on Deep Learning Toolkits and Libraries for Intelligent User Interfaces Journal Article Computing Research Repository eprint Journal, abs/1803.04818 , pp. 1-10, 2018. @article{9857, title = {A Survey on Deep Learning Toolkits and Libraries for Intelligent User Interfaces}, author = {Jan Zacharias and Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/9857_2018_A_Survey_on_Deep_Learning_Toolkits_and_Libraries_for_Intelligent_User_Interfaces.pdf http://arxiv.org/abs/1803.04818}, year = {2018}, date = {2018-03-01}, journal = {Computing Research Repository eprint Journal}, volume = {abs/1803.04818}, pages = {1-10}, publisher = {arXiv.org}, abstract = {This paper provides an overview of prominent deep learning toolkits and, in particular, reports on recent publications that contributed open source software for implementing tasks that are common in intelligent user interfaces (IUI). We provide a scientific reference for researchers and software engineers who plan to utilise deep learning techniques within their IUI research and development projects.}, keywords = {}, pubstate = {published}, tppubtype = {article} } This paper provides an overview of prominent deep learning toolkits and, in particular, reports on recent publications that contributed open source software for implementing tasks that are common in intelligent user interfaces (IUI). We provide a scientific reference for researchers and software engineers who plan to utilise deep learning techniques within their IUI research and development projects. |
Sonntag, Daniel; Profitlich, Hans-Jürgen An architecture of open-source tools to combine textual information extraction, faceted search and information visualisation Journal Article Computing Research Repository eprint Journal, abs/1810.12627 , pp. 13-28, 2018. @article{11491, title = {An architecture of open-source tools to combine textual information extraction, faceted search and information visualisation}, author = {Daniel Sonntag and Hans-Jürgen Profitlich}, url = {https://www.dfki.de/fileadmin/user_upload/import/11491_1810.12627.pdf http://arxiv.org/abs/1810.12627}, year = {2018}, date = {2018-01-01}, journal = {Computing Research Repository eprint Journal}, volume = {abs/1810.12627}, pages = {13-28}, publisher = {Elsevier}, abstract = {This article presents our steps to integrate complex and partly unstructured medical data into a clinical research database with subsequent decision support. Our main application is an integrated faceted search tool, accompanied by the visualisation of results of automatic information extraction from textual documents. We describe the details of our technical architecture (open-source tools), to be replicated at other universities, research institutes, or hospitals. Our exemplary use cases are nephrology and mammography. The software was first developed in the nephrology domain and then adapted to the mammography use case. We report on these case studies, illustrating how the application can be used by a clinician and which questions can be answered. We show that our architecture and the employed software modules are suitable for both areas of application with a limited amount of adaptations. For example, in nephrology we try to answer questions about the temporal characteristics of event sequences to gain significant insight from the data for cohort selection. We present a versatile time-line tool that enables the user to explore relations between a multitude of diagnosis and laboratory values.}, keywords = {}, pubstate = {published}, tppubtype = {article} } This article presents our steps to integrate complex and partly unstructured medical data into a clinical research database with subsequent decision support. Our main application is an integrated faceted search tool, accompanied by the visualisation of results of automatic information extraction from textual documents. We describe the details of our technical architecture (open-source tools), to be replicated at other universities, research institutes, or hospitals. Our exemplary use cases are nephrology and mammography. The software was first developed in the nephrology domain and then adapted to the mammography use case. We report on these case studies, illustrating how the application can be used by a clinician and which questions can be answered. We show that our architecture and the employed software modules are suitable for both areas of application with a limited amount of adaptations. For example, in nephrology we try to answer questions about the temporal characteristics of event sequences to gain significant insight from the data for cohort selection. We present a versatile time-line tool that enables the user to explore relations between a multitude of diagnosis and laboratory values. |
Inproceedings |
Barz, Michael; Büyükdemircioglu, Neslihan; Surya, Rikhu Prasad; Polzehl, Tim; Sonntag, Daniel Device-Type Influence in Crowd-based Natural Language Translation Tasks (short paper) Inproceedings Aroyo, Lora; Dumitrache, Anca; Paritosh, Praveen; Quinn, Alexander J; Welty, Chris; Checco, Alessandro; Demartini, Gianluca; Gadiraju, Ujwal; Sarasua, Cristina (Ed.): Proceedings of the 1st Workshop on Subjectivity, Ambiguity and Disagreement in Crowdsourcing, and Short Paper Proceedings of the 1st Workshop on Disentangling the Relation Between Crowdsourcing and Bias Management (SAD 2018 and CrowdBias 2018), pp. 93-97, CEUR-WS.org, 2018. @inproceedings{10184, title = {Device-Type Influence in Crowd-based Natural Language Translation Tasks (short paper)}, author = {Michael Barz and Neslihan Büyükdemircioglu and Rikhu Prasad Surya and Tim Polzehl and Daniel Sonntag}, editor = {Lora Aroyo and Anca Dumitrache and Praveen Paritosh and Alexander J Quinn and Chris Welty and Alessandro Checco and Gianluca Demartini and Ujwal Gadiraju and Cristina Sarasua}, url = {https://www.dfki.de/fileadmin/user_upload/import/10184_paper12.pdf}, year = {2018}, date = {2018-12-01}, booktitle = {Proceedings of the 1st Workshop on Subjectivity, Ambiguity and Disagreement in Crowdsourcing, and Short Paper Proceedings of the 1st Workshop on Disentangling the Relation Between Crowdsourcing and Bias Management (SAD 2018 and CrowdBias 2018)}, volume = {2276}, pages = {93-97}, publisher = {CEUR-WS.org}, abstract = {The effect of users’ interaction devices and their platform (mobile vs. desktop) should be taken into account when evaluating the performance of translation tasks in crowdsourcing contexts. We investigate the influence of the device type and platform in a crowd-based translation workflow. We implement a crowd translation workflow and use it for translating a subset of the IWSLT parallel corpus from English to Arabic. In addition, we consider machine translations from a state-of-the-art machine translation system which can be used as translation candidates in a human computation workflow. The results of our experiment suggest that users with a mobile device judge translations systematically lower than users with a desktop device, when assessing the quality of machine translations. The perceived quality of shorter sentences is generally higher than the perceived quality of longer sentences.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The effect of users’ interaction devices and their platform (mobile vs. desktop) should be taken into account when evaluating the performance of translation tasks in crowdsourcing contexts. We investigate the influence of the device type and platform in a crowd-based translation workflow. We implement a crowd translation workflow and use it for translating a subset of the IWSLT parallel corpus from English to Arabic. In addition, we consider machine translations from a state-of-the-art machine translation system which can be used as translation candidates in a human computation workflow. The results of our experiment suggest that users with a mobile device judge translations systematically lower than users with a desktop device, when assessing the quality of machine translations. The perceived quality of shorter sentences is generally higher than the perceived quality of longer sentences. |
Stauden, Sven; Barz, Michael; Sonntag, Daniel Visual Search Target Inference Using Bag of Deep Visual Words Inproceedings Trollmann, Frank; Turhan, Anni-Yasmin (Ed.): KI 2018: Advances in Artificial Intelligence - 41st German Conference on AI, Springer, 2018. @inproceedings{10896, title = {Visual Search Target Inference Using Bag of Deep Visual Words}, author = {Sven Stauden and Michael Barz and Daniel Sonntag}, editor = {Frank Trollmann and Anni-Yasmin Turhan}, url = {https://www.dfki.de/fileadmin/user_upload/import/10896_2018_Visual_Search_Target_Inference_Using_Bag_of_Deep_Visual_Words.pdf}, year = {2018}, date = {2018-08-01}, booktitle = {KI 2018: Advances in Artificial Intelligence - 41st German Conference on AI}, publisher = {Springer}, abstract = {Visual Search target inference subsumes methods for predicting the target object through eye tracking. A person intents to find an object in a visual scene which we predict based on the fixation behavior. Knowing about the search target can improve intelligent user interaction. In this work, we implement a new feature encoding, the Bag of Deep Visual Words, for search target inference using a pre-trained convolutional neural network (CNN). Our work is based on a recent approach from the literature that uses Bag of Visual Words, common in computer vision applications. We evaluate our method using a gold standard dataset. The results show that our new feature encoding outperforms the baseline from the literature, in particular, when excluding fixations on the target.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Visual Search target inference subsumes methods for predicting the target object through eye tracking. A person intents to find an object in a visual scene which we predict based on the fixation behavior. Knowing about the search target can improve intelligent user interaction. In this work, we implement a new feature encoding, the Bag of Deep Visual Words, for search target inference using a pre-trained convolutional neural network (CNN). Our work is based on a recent approach from the literature that uses Bag of Visual Words, common in computer vision applications. We evaluate our method using a gold standard dataset. The results show that our new feature encoding outperforms the baseline from the literature, in particular, when excluding fixations on the target. |
Prange, Alexander; Barz, Michael; Sonntag, Daniel Medical 3D Images in Multimodal Virtual Reality Inproceedings IUI Companion, pp. 19:1-19:2, ACM, 2018. @inproceedings{9655, title = {Medical 3D Images in Multimodal Virtual Reality}, author = {Alexander Prange and Michael Barz and Daniel Sonntag}, year = {2018}, date = {2018-01-01}, booktitle = {IUI Companion}, pages = {19:1-19:2}, publisher = {ACM}, abstract = {We present a multimodal medical 3D image system for radiologists in an virtual reality (VR) environment. Users can walk freely inside the virtual room and interact with the system using speech, going through patient records, and manipulate 3D image data with hand gestures. Medical images are retrieved from the hospital's Picture and Archiving System (PACS) and displayed as 3D objects inside VR. Our system incorporates a dialogue-based decision support system for treatments. A central supervised patient database provides input to our predictive model and allows us, first, to add new examination reports by a pen-based mobile application on-the-fly, and second, to get therapy prediction results in real-time. This demo includes a visualisation of real patient records, 3D DICOM radiology image data, and real-time therapy predictions in VR.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We present a multimodal medical 3D image system for radiologists in an virtual reality (VR) environment. Users can walk freely inside the virtual room and interact with the system using speech, going through patient records, and manipulate 3D image data with hand gestures. Medical images are retrieved from the hospital's Picture and Archiving System (PACS) and displayed as 3D objects inside VR. Our system incorporates a dialogue-based decision support system for treatments. A central supervised patient database provides input to our predictive model and allows us, first, to add new examination reports by a pen-based mobile application on-the-fly, and second, to get therapy prediction results in real-time. This demo includes a visualisation of real patient records, 3D DICOM radiology image data, and real-time therapy predictions in VR. |
Barz, Michael; Daiber, Florian; Sonntag, Daniel; Bulling, Andreas Error-aware Gaze-based Interfaces for Robust Mobile Gaze Interaction Inproceedings Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications, pp. 24:1-24:10, ACM, 2018. @inproceedings{9818, title = {Error-aware Gaze-based Interfaces for Robust Mobile Gaze Interaction}, author = {Michael Barz and Florian Daiber and Daniel Sonntag and Andreas Bulling}, url = {https://www.dfki.de/fileadmin/user_upload/import/9818_a24-barz.pdf http://doi.acm.org/10.1145/3204493.3204536}, year = {2018}, date = {2018-01-01}, booktitle = {Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications}, pages = {24:1-24:10}, publisher = {ACM}, abstract = {Gaze estimation error can severely hamper usability and performance of mobile gaze-based interfaces given that the error varies constantly for different interaction positions. In this work, we explore error-aware gaze-based interfaces that estimate and adapt to gaze estimation error on-the-fly. We implement a sample error-aware user interface for gaze-based selection and different error compensation methods: a naïve approach that increases component size directly proportional to the absolute error, a recent model by Feit et al. that is based on the two-dimensional error distribution, and a novel predictive model that shifts gaze by a directional error estimate. We evaluate these models in a 12-participant user study and show that our predictive model significantly outperforms the others in terms of selection rate, particularly for small gaze targets. These results underline both the feasibility and potential of next generation error-aware gaze-based user interfaces.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Gaze estimation error can severely hamper usability and performance of mobile gaze-based interfaces given that the error varies constantly for different interaction positions. In this work, we explore error-aware gaze-based interfaces that estimate and adapt to gaze estimation error on-the-fly. We implement a sample error-aware user interface for gaze-based selection and different error compensation methods: a naïve approach that increases component size directly proportional to the absolute error, a recent model by Feit et al. that is based on the two-dimensional error distribution, and a novel predictive model that shifts gaze by a directional error estimate. We evaluate these models in a 12-participant user study and show that our predictive model significantly outperforms the others in terms of selection rate, particularly for small gaze targets. These results underline both the feasibility and potential of next generation error-aware gaze-based user interfaces. |
Niemann, Mira; Prange, Alexander; Sonntag, Daniel Towards a Multimodal Multisensory Cognitive Assessment Framework Inproceedings Proceedings of the 30th IEEE International Symposium on Computer-Based Medical System, IEEE, 2018. @inproceedings{9856, title = {Towards a Multimodal Multisensory Cognitive Assessment Framework}, author = {Mira Niemann and Alexander Prange and Daniel Sonntag}, year = {2018}, date = {2018-01-01}, booktitle = {Proceedings of the 30th IEEE International Symposium on Computer-Based Medical System}, publisher = {IEEE}, abstract = {Traditionally, neurocognitive testing is done using pen and paper, which is both expensive and time consuming and often leads to a biased outcome. In this paper, we present an approach towards selecting and digitizing existing cognitive tests and supporting the assessment of cognitive impairments through automated evaluation of different input modalities recorded during the assessments. Our multimodal multisensory framework currently records and analyzes handwriting input captured using a digital pen and electrodermal activity captured by the BITalino sensor board. Using artificial intelligence methods, we aim at analyzing the multisensory data in order to support objective assessments of cognitive impairments. In this work, we describe the current state of our framework and outline future research objectives.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Traditionally, neurocognitive testing is done using pen and paper, which is both expensive and time consuming and often leads to a biased outcome. In this paper, we present an approach towards selecting and digitizing existing cognitive tests and supporting the assessment of cognitive impairments through automated evaluation of different input modalities recorded during the assessments. Our multimodal multisensory framework currently records and analyzes handwriting input captured using a digital pen and electrodermal activity captured by the BITalino sensor board. Using artificial intelligence methods, we aim at analyzing the multisensory data in order to support objective assessments of cognitive impairments. In this work, we describe the current state of our framework and outline future research objectives. |
Miscellaneous |
Barz, Michael; Polzehl, Tim; Sonntag, Daniel Towards Hybrid Human-Machine Translation Services Miscellaneous EasyChair Preprint no. 333, 2018. @misc{9879, title = {Towards Hybrid Human-Machine Translation Services}, author = {Michael Barz and Tim Polzehl and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/9879_CI_2018_paper_22_(3).pdf http://ci.acm.org/2018/program/}, year = {2018}, date = {2018-01-01}, publisher = {EasyChair}, abstract = {Crowdsourcing is recently used to automate complex tasks when computational systems alone fail. The literature includes several contributions concerning natural language processing, e.g., language translation [Zaidan and Callison-Burch 2011; Minder and Bernstein 2012a; 2012b], also in combination with active learning [Green et al. 2015] and interactive model training [Zacharias et al. 2018]. In this work, we investigate (1) whether a (paid) crowd, that is acquired from a multilingual website’s community, is capable of translating coherent content from English to their mother tongue (we consider Arabic native speakers); and (2) in which cases state-of-the-art machine translation models can compete with human translations for automation in order to reduce task completion times and costs. The envisioned goal is a hybrid machine translation service that incrementally adapts machine translation models to new domains by employing human computation to make machine translation more competitive (see Figure 1). Recently, approaches for domain adoption of neural machine translation systems include filtering of generic corpora based on sentence embeddings of in-domain samples [Wang et al. 2017] have been proposed, as well as the fine-tuning with mixed batches containing domain and outof-domain samples [Chu et al. 2017] and with different regularization methods [Barone et al. 2017]. As a first step towards this goal, we conduct an experiment using a simple two-staged human computation algorithm for translating a subset of the IWSLT parallel corpus including English transcriptions of TED talks and reference translations in Arabic with a specifically acquired crowd. We compare the output with the state-of-the-art machine translation system Google Translate as a baseline.}, howpublished = {EasyChair Preprint no. 333}, keywords = {}, pubstate = {published}, tppubtype = {misc} } Crowdsourcing is recently used to automate complex tasks when computational systems alone fail. The literature includes several contributions concerning natural language processing, e.g., language translation [Zaidan and Callison-Burch 2011; Minder and Bernstein 2012a; 2012b], also in combination with active learning [Green et al. 2015] and interactive model training [Zacharias et al. 2018]. In this work, we investigate (1) whether a (paid) crowd, that is acquired from a multilingual website’s community, is capable of translating coherent content from English to their mother tongue (we consider Arabic native speakers); and (2) in which cases state-of-the-art machine translation models can compete with human translations for automation in order to reduce task completion times and costs. The envisioned goal is a hybrid machine translation service that incrementally adapts machine translation models to new domains by employing human computation to make machine translation more competitive (see Figure 1). Recently, approaches for domain adoption of neural machine translation systems include filtering of generic corpora based on sentence embeddings of in-domain samples [Wang et al. 2017] have been proposed, as well as the fine-tuning with mixed batches containing domain and outof-domain samples [Chu et al. 2017] and with different regularization methods [Barone et al. 2017]. As a first step towards this goal, we conduct an experiment using a simple two-staged human computation algorithm for translating a subset of the IWSLT parallel corpus including English transcriptions of TED talks and reference translations in Arabic with a specifically acquired crowd. We compare the output with the state-of-the-art machine translation system Google Translate as a baseline. |
2017 |
Journal Articles |
Sonntag, Daniel; Barz, Michael; Zacharias, Jan; Stauden, Sven; Rahmani, Vahid; Fóthi, Áron; andrincz, András Lő Fine-tuning deep CNN models on specific MS COCO categories Journal Article Computing Research Repository eprint Journal, abs/1709.01476 , pp. 0-3, 2017. @article{9241, title = {Fine-tuning deep CNN models on specific MS COCO categories}, author = {Daniel Sonntag and Michael Barz and Jan Zacharias and Sven Stauden and Vahid Rahmani and Áron Fóthi and András Lő andrincz}, url = {https://www.dfki.de/fileadmin/user_upload/import/9241_2017_Fine-tuning_deep_CNN_models_on_specific_MS_COCO_categories.pdf http://arxiv.org/abs/1709.01476}, year = {2017}, date = {2017-09-01}, journal = {Computing Research Repository eprint Journal}, volume = {abs/1709.01476}, pages = {0-3}, publisher = {arXiv.org}, abstract = {Fine-tuning of a deep convolutional neural network (CNN) is often desired. This paper provides an overview of our publicly available py-faster-rcnn-ft software library that can be used to fine-tune the VGG_CNN_M_1024 model on custom subsets of the Microsoft Common Objects in Context (MS COCO) dataset. For example, we improved the procedure so that the user does not have to look for suitable image files in the dataset by hand which can then be used in the demo program. Our implementation randomly selects images that contain at least one object of the categories on which the model is fine-tuned.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Fine-tuning of a deep convolutional neural network (CNN) is often desired. This paper provides an overview of our publicly available py-faster-rcnn-ft software library that can be used to fine-tune the VGG_CNN_M_1024 model on custom subsets of the Microsoft Common Objects in Context (MS COCO) dataset. For example, we improved the procedure so that the user does not have to look for suitable image files in the dataset by hand which can then be used in the demo program. Our implementation randomly selects images that contain at least one object of the categories on which the model is fine-tuned. |
Schmidt, Danilo; Budde, Klemens; Sonntag, Daniel; Profitlich, Hans-Jürgen; Ihle, Matthias; Staeck, Oliver A novel tool for the identification of correlations in medical data by faceted search Journal Article Computers in Biology and Medicine - An International Journal, 85 , pp. 98-105, 2017. @article{11493, title = {A novel tool for the identification of correlations in medical data by faceted search}, author = {Danilo Schmidt and Klemens Budde and Daniel Sonntag and Hans-Jürgen Profitlich and Matthias Ihle and Oliver Staeck}, url = {https://www.sciencedirect.com/science/article/pii/S0010482517300975}, year = {2017}, date = {2017-01-01}, journal = {Computers in Biology and Medicine - An International Journal}, volume = {85}, pages = {98-105}, publisher = {Elsevier}, abstract = {This work focuses on the integration of multifaceted extensive data sets (e.g. laboratory values, vital data, medications) and partly unstructured medical data such as discharge letters, diagnostic reports, clinical notes etc. in a research database. Our main application is an integrated faceted search in nephrology based on information extraction results. We describe the details of the application of transplant medicine and the resulting technical architecture of the faceted search application.}, keywords = {}, pubstate = {published}, tppubtype = {article} } This work focuses on the integration of multifaceted extensive data sets (e.g. laboratory values, vital data, medications) and partly unstructured medical data such as discharge letters, diagnostic reports, clinical notes etc. in a research database. Our main application is an integrated faceted search in nephrology based on information extraction results. We describe the details of the application of transplant medicine and the resulting technical architecture of the faceted search application. |
Inproceedings |
Barz, Michael; Poller, Peter; Schneider, Martin; Zillner, Sonja; Sonntag, Daniel; Mař, Vladimír; ík, Human-in-the-Loop Control Processes in Gas Turbine Maintenance Inproceedings Ma&#;ík, Vladimír; Strasser, Thomas; Kadera, Petr; Wahlster, Wolfgang (Ed.): Industrial Applications of Holonic and Multi-Agent Systems: 8th International Conference, HoloMAS 2017, Springer International Publishing, 2017. @inproceedings{9218, title = {Human-in-the-Loop Control Processes in Gas Turbine Maintenance}, author = {Michael Barz and Peter Poller and Martin Schneider and Sonja Zillner and Daniel Sonntag and Vladimír Mař and ík}, editor = {Vladimír Ma&#;ík and Thomas Strasser and Petr Kadera and Wolfgang Wahlster}, url = {https://www.dfki.de/fileadmin/user_upload/import/9218_2017_Human-in-the-Loop_Control_Processes_in_Gas_Turbine_Maintenance.pdf}, year = {2017}, date = {2017-08-01}, booktitle = {Industrial Applications of Holonic and Multi-Agent Systems: 8th International Conference, HoloMAS 2017}, publisher = {Springer International Publishing}, abstract = {In this applied research paper, we describe an architecture for seamlessly integrating factory workers in industrial cyber-physical production environments. Our human-in-the-loop control process uses novel input techniques and relies on state-of-the-art industry standards. Our architecture allows for real-time processing of semantically annotated data from multiple sources (e.g., machine sensors, user input devices) and real-time analysis of data for anomaly detection and recovery. We use a semantic knowledge base for storing and querying data (http://www.metaphacts.com) and the Business Process Model and Notation (BPMN) for modelling and controlling the process. We exemplify our industrial solution in the use case of the maintenance of a Siemens gas turbine. We report on this case study and show the advantages of our approach for smart factories. An informal evaluation in the gas turbine maintenance use case shows the utility of automated anomaly detection and handling: workers can fill in paper-based incident reports by using a digital pen; the digitised version is stored in metaphacts and linked to semantic knowledge sources such as process models, structure models, business process models, and user models. Subsequently, automatic maintenance and recovery processes that involve human experts are triggered.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } In this applied research paper, we describe an architecture for seamlessly integrating factory workers in industrial cyber-physical production environments. Our human-in-the-loop control process uses novel input techniques and relies on state-of-the-art industry standards. Our architecture allows for real-time processing of semantically annotated data from multiple sources (e.g., machine sensors, user input devices) and real-time analysis of data for anomaly detection and recovery. We use a semantic knowledge base for storing and querying data (http://www.metaphacts.com) and the Business Process Model and Notation (BPMN) for modelling and controlling the process. We exemplify our industrial solution in the use case of the maintenance of a Siemens gas turbine. We report on this case study and show the advantages of our approach for smart factories. An informal evaluation in the gas turbine maintenance use case shows the utility of automated anomaly detection and handling: workers can fill in paper-based incident reports by using a digital pen; the digitised version is stored in metaphacts and linked to semantic knowledge sources such as process models, structure models, business process models, and user models. Subsequently, automatic maintenance and recovery processes that involve human experts are triggered. |
Prange, Alexander; Chikobava, Margarita; Poller, Peter; Barz, Michael; Sonntag, Daniel A Multimodal Dialogue System for Medical Decision Support inside Virtual Reality Inproceedings Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pp. 23-26, Association for Computational Linguistics, 2017. @inproceedings{9219, title = {A Multimodal Dialogue System for Medical Decision Support inside Virtual Reality}, author = {Alexander Prange and Margarita Chikobava and Peter Poller and Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/9219_2017_A_Multimodal_Dialogue_System_for_Medical_Decision_Support_in_Virtual_Reality.pdf}, year = {2017}, date = {2017-08-01}, booktitle = {Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue}, pages = {23-26}, publisher = {Association for Computational Linguistics}, abstract = {We present a multimodal dialogue system that allows doctors to interact with a medical decision support system in virtual reality (VR). We integrate an interactive visualization of patient records and radiology image data, as well as therapy predictions. Therapy predictions are computed in real-time using a deep learning model.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We present a multimodal dialogue system that allows doctors to interact with a medical decision support system in virtual reality (VR). We integrate an interactive visualization of patient records and radiology image data, as well as therapy predictions. Therapy predictions are computed in real-time using a deep learning model. |
Prange, Alexander; Schmidt, Danilo; Sonntag, Daniel A Digital Pen Based Tool for Instant Digitisation and Digitalisation of Biopsy Protocols Inproceedings 2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS), pp. 773-774, IEEE Xplore, 2017. @inproceedings{11235, title = {A Digital Pen Based Tool for Instant Digitisation and Digitalisation of Biopsy Protocols}, author = {Alexander Prange and Danilo Schmidt and Daniel Sonntag}, year = {2017}, date = {2017-06-01}, booktitle = {2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS)}, pages = {773-774}, publisher = {IEEE Xplore}, abstract = {In order to improve medical processes in nephrology, we present an application that allows doctors to create biopsy protocols by using a digital pen on a tablet. The biopsy protocol app is seamlessly integrated into the existing infrastructure at the hospital (see figure 1). Compared to other reporting tools, we provide (1) real-time hand-writing/gesture recognition and real-time feedback on the recognition results on the screen; (2) a real-time digitisation into structured data and PDF documents; and (3) the mapping of the transcribed contents into concepts of the Banff classification. Our approach combines the benefits of paper with the automatic digitisation and digitalisation of hand-written user input. A fully digital and mobile approach should empower nephrologists to produce high quality data more effectively and in real-time so that it can be directly used in hospital processes.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } In order to improve medical processes in nephrology, we present an application that allows doctors to create biopsy protocols by using a digital pen on a tablet. The biopsy protocol app is seamlessly integrated into the existing infrastructure at the hospital (see figure 1). Compared to other reporting tools, we provide (1) real-time hand-writing/gesture recognition and real-time feedback on the recognition results on the screen; (2) a real-time digitisation into structured data and PDF documents; and (3) the mapping of the transcribed contents into concepts of the Banff classification. Our approach combines the benefits of paper with the automatic digitisation and digitalisation of hand-written user input. A fully digital and mobile approach should empower nephrologists to produce high quality data more effectively and in real-time so that it can be directly used in hospital processes. |
Barz, Michael; Poller, Peter; Sonntag, Daniel Evaluating Remote and Head-worn Eye Trackers in Multi-modal Speech-based HRI Inproceedings Mutlu, Bilge; Tscheligi, Manfred; Weiss, Astrid; Young, James E (Ed.): Companion of the 2017 ACM/IEEE International Conference on Human-Robot Interaction, pp. 79-80, ACM, 2017. @inproceedings{8991, title = {Evaluating Remote and Head-worn Eye Trackers in Multi-modal Speech-based HRI}, author = {Michael Barz and Peter Poller and Daniel Sonntag}, editor = {Bilge Mutlu and Manfred Tscheligi and Astrid Weiss and James E Young}, url = {https://www.dfki.de/fileadmin/user_upload/import/8991_2017_Evaluating_Remote_and_Head-worn_Eye_Trackers_in_Multi-modal_Speech-based_HRI.pdf http://doi.acm.org/10.1145/3029798.3038367}, year = {2017}, date = {2017-03-01}, booktitle = {Companion of the 2017 ACM/IEEE International Conference on Human-Robot Interaction}, pages = {79-80}, publisher = {ACM}, abstract = {Gaze is known to be a dominant modality for conveying spatial information, and it has been used for grounding in human-robot dialogues. In this work, we present the prototype of a gaze-supported multi-modal dialogue system that enhances two core tasks in human-robot collaboration: 1) our robot is able to learn new objects and their location from user instructions involving gaze, and 2) it can instruct the user to move objects and passively track this movement by interpreting the user's gaze. We performed a user study to investigate the impact of different eye trackers on user performance. In particular, we compare a head-worn device and an RGB-based remote eye tracker. Our results show that the head-mounted eye tracker outperforms the remote device in terms of task completion time and the required number of utterances due to its higher precision.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Gaze is known to be a dominant modality for conveying spatial information, and it has been used for grounding in human-robot dialogues. In this work, we present the prototype of a gaze-supported multi-modal dialogue system that enhances two core tasks in human-robot collaboration: 1) our robot is able to learn new objects and their location from user instructions involving gaze, and 2) it can instruct the user to move objects and passively track this movement by interpreting the user's gaze. We performed a user study to investigate the impact of different eye trackers on user performance. In particular, we compare a head-worn device and an RGB-based remote eye tracker. Our results show that the head-mounted eye tracker outperforms the remote device in terms of task completion time and the required number of utterances due to its higher precision. |
Barz, Michael; Poller, Peter; Sonntag, Daniel Evaluating Remote and Head-worn Eye Trackers in Multi-modal Speech-based HRI (Demo) Inproceedings Mutlu, Bilge; Tscheligi, Manfred; Weiss, Astrid; Young, James E (Ed.): Companion of the 2017 ACM/IEEE International Conference on Human-Robot Interaction, ACM, 2017. @inproceedings{8992, title = {Evaluating Remote and Head-worn Eye Trackers in Multi-modal Speech-based HRI (Demo)}, author = {Michael Barz and Peter Poller and Daniel Sonntag}, editor = {Bilge Mutlu and Manfred Tscheligi and Astrid Weiss and James E Young}, url = {https://www.dfki.de/fileadmin/user_upload/import/8992_2017_Evaluating_Remote_and_Head-worn_Eye_Trackers_in_Multi-modal_Speech-based_HRI_(Demo)_.pdf}, doi = {https://doi.org/10.1145/3029798.3036665}, year = {2017}, date = {2017-01-01}, booktitle = {Companion of the 2017 ACM/IEEE International Conference on Human-Robot Interaction}, publisher = {ACM}, abstract = {Gaze is known to be a dominant modality for conveying spatial information, and it has been used for grounding in human-robot dialogues. In this work, we present the prototype of a gaze-supported multi-modal dialogue system that enhances two core tasks in human-robot collaboration: 1) our robot is able to learn new objects and their location from user instructions involving gaze, and 2) it can instruct the user to move objects and passively track this movement by interpreting the user's gaze. We performed a user study to investigate the impact of different eye trackers on user performance. In particular, we compare a head-worn device and an RGB-based remote eye tracker. Our results show that the head-mounted eye tracker outperforms the remote device in terms of task completion time and the required number of utterances due to its higher precision.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Gaze is known to be a dominant modality for conveying spatial information, and it has been used for grounding in human-robot dialogues. In this work, we present the prototype of a gaze-supported multi-modal dialogue system that enhances two core tasks in human-robot collaboration: 1) our robot is able to learn new objects and their location from user instructions involving gaze, and 2) it can instruct the user to move objects and passively track this movement by interpreting the user's gaze. We performed a user study to investigate the impact of different eye trackers on user performance. In particular, we compare a head-worn device and an RGB-based remote eye tracker. Our results show that the head-mounted eye tracker outperforms the remote device in terms of task completion time and the required number of utterances due to its higher precision. |
Prange, Alexander; Barz, Michael; Sonntag, Daniel Speech-based Medical Decision Support in VR using a Deep Neural Network (Demonstration) Inproceedings Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pp. 5241-5242, IJCAI, 2017. @inproceedings{9198, title = {Speech-based Medical Decision Support in VR using a Deep Neural Network (Demonstration)}, author = {Alexander Prange and Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/9198_2017_Speech-based_Medical_Decision_Support_in_VR_using_a_Deep_Neural_Network.pdf}, doi = {https://doi.org/10.24963/ijcai.2017/777}, year = {2017}, date = {2017-01-01}, booktitle = {Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17}, pages = {5241-5242}, publisher = {IJCAI}, abstract = {We present a speech dialogue system that facilitates medical decision support for doctors in a virtual re- ality (VR) application. The therapy prediction is based on a recurrent neural network model that incorporates the examination history of patients. A central supervised patient database provides input to our predictive model and allows us, first, to add new examination reports by a pen-based mobile application on-the-fly, and second, to get therapy prediction results in real-time. This demo includes a visualisation of patient records, radiology image data, and the therapy prediction results in VR.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We present a speech dialogue system that facilitates medical decision support for doctors in a virtual re- ality (VR) application. The therapy prediction is based on a recurrent neural network model that incorporates the examination history of patients. A central supervised patient database provides input to our predictive model and allows us, first, to add new examination reports by a pen-based mobile application on-the-fly, and second, to get therapy prediction results in real-time. This demo includes a visualisation of patient records, radiology image data, and the therapy prediction results in VR. |
Sonntag, Daniel; Profitlich, Hans-Jürgen Integrated Decision Support by Combining Textual Information Extraction, Facetted Search and Information Visualisation Inproceedings 2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS), pp. 95-100, IEEE, 2017. @inproceedings{11492, title = {Integrated Decision Support by Combining Textual Information Extraction, Facetted Search and Information Visualisation}, author = {Daniel Sonntag and Hans-Jürgen Profitlich}, url = {https://ieeexplore.ieee.org/document/8104164}, year = {2017}, date = {2017-01-01}, booktitle = {2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS)}, pages = {95-100}, publisher = {IEEE}, abstract = {This work focusses on our integration steps of complex and partly unstructured medical data into a clinical research database with subsequent decision support. Our main application is an integrated facetted search tool, followed by information visualisation based on automatic information extraction results from textual documents. We describe the details of our technical architecture (open-source tools), to be replicated at other universities, research institutes, or hospitals. Our exemplary use case is nephrology, where we try to answer questions about the temporal characteristics of sequences and gain significant insight from the data for cohort selection. We report on this case study, illustrating how the application can be used by a clinician and which questions can be answered.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } This work focusses on our integration steps of complex and partly unstructured medical data into a clinical research database with subsequent decision support. Our main application is an integrated facetted search tool, followed by information visualisation based on automatic information extraction results from textual documents. We describe the details of our technical architecture (open-source tools), to be replicated at other universities, research institutes, or hospitals. Our exemplary use case is nephrology, where we try to answer questions about the temporal characteristics of sequences and gain significant insight from the data for cohort selection. We report on this case study, illustrating how the application can be used by a clinician and which questions can be answered. |
Miscellaneous |
Sonntag, Daniel Interakt - A Multimodal Multisensory Interactive Cognitive Assessment Tool Miscellaneous 2017. @misc{11489, title = {Interakt - A Multimodal Multisensory Interactive Cognitive Assessment Tool}, author = {Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11489_1709.01796.pdf http://arxiv.org/abs/1709.01796}, year = {2017}, date = {2017-01-01}, volume = {abs/1709.01796}, pages = {4}, abstract = {Cognitive assistance may be valuable in applications for doctors and therapists that reduce costs and improve quality in healthcare systems. Use cases and scenarios include the assessment of dementia. In this paper, we present our approach to the (semi-)automatic assessment of dementia.}, keywords = {}, pubstate = {published}, tppubtype = {misc} } Cognitive assistance may be valuable in applications for doctors and therapists that reduce costs and improve quality in healthcare systems. Use cases and scenarios include the assessment of dementia. In this paper, we present our approach to the (semi-)automatic assessment of dementia. |
2016 |
Journal Articles |
Sonntag, Daniel; Tresp, Volker; Zillner, Sonja; Cavallaro, Alexander; Hammon, Matthias; Reis, André; Fasching, Peter A; Sedlmayr, Martin; Ganslandt, Thomas; Prokosch, Hans-Ulrich; Budde, Klemens; Schmidt, Danilo; Hinrichs, Carl; Wittenberg, Thomas; Daumke, Philipp; Oppelt, Patricia G The Clinical Data Intelligence Project - A smart data initiative Journal Article Informatik Spektrum, 39 , pp. 290-300, 2016. @article{11490, title = {The Clinical Data Intelligence Project - A smart data initiative}, author = {Daniel Sonntag and Volker Tresp and Sonja Zillner and Alexander Cavallaro and Matthias Hammon and André Reis and Peter A Fasching and Martin Sedlmayr and Thomas Ganslandt and Hans-Ulrich Prokosch and Klemens Budde and Danilo Schmidt and Carl Hinrichs and Thomas Wittenberg and Philipp Daumke and Patricia G Oppelt}, doi = {https://doi.org/10.1007/s00287-015-0913-x}, year = {2016}, date = {2016-01-01}, journal = {Informatik Spektrum}, volume = {39}, pages = {290-300}, publisher = {Springer}, abstract = {This article is about a new project that combines clinical data intelligence and smart data. It provides an introduction to the “Klinische Datenintelligenz” (KDI) project which is founded by the Federal Ministry for Economic Affairs and Energy (BMWi); we transfer research and development results (R&D) of the analysis of data which are generated in the clinical routine in specific medical domain. We present the project structure and goals, how patient care should be improved, and the joint efforts of data and knowledge engineering, information extraction (from textual and other unstructured data), statistical machine learning, decision support, and their integration into special use cases moving towards individualised medicine. In particular, we describe some details of our medical use cases and cooperation with two major German university hospitals.}, keywords = {}, pubstate = {published}, tppubtype = {article} } This article is about a new project that combines clinical data intelligence and smart data. It provides an introduction to the “Klinische Datenintelligenz” (KDI) project which is founded by the Federal Ministry for Economic Affairs and Energy (BMWi); we transfer research and development results (R&D) of the analysis of data which are generated in the clinical routine in specific medical domain. We present the project structure and goals, how patient care should be improved, and the joint efforts of data and knowledge engineering, information extraction (from textual and other unstructured data), statistical machine learning, decision support, and their integration into special use cases moving towards individualised medicine. In particular, we describe some details of our medical use cases and cooperation with two major German university hospitals. |
Inproceedings |
Moniri, Mehdi; Luxenburger, Andreas; Schuffert, Winfried; Sonntag, Daniel Real-Time 3D Peripheral View Analysis Inproceedings Dirk, Reiners; Daisuke, Iwai; Frank, Steinicke (Ed.): Proceedings of the International Conference on Artificial Reality and Telexistence and Eurographics Symposium on Virtual Environments 2016, pp. 37-44, The Eurographics Association, 2016. @inproceedings{8850, title = {Real-Time 3D Peripheral View Analysis}, author = {Mehdi Moniri and Andreas Luxenburger and Winfried Schuffert and Daniel Sonntag}, editor = {Reiners Dirk and Iwai Daisuke and Steinicke Frank}, url = {https://www.dfki.de/fileadmin/user_upload/import/8850_2016_Real-Time_3D_Peripheral_View_Analysis.pdf}, year = {2016}, date = {2016-12-01}, booktitle = {Proceedings of the International Conference on Artificial Reality and Telexistence and Eurographics Symposium on Virtual Environments 2016}, pages = {37-44}, publisher = {The Eurographics Association}, abstract = {Human peripheral vision suffers from several limitations that differ among various regions of the visual field. Since these limitations result in natural visual impairments, many interesting intelligent user interfaces based on eye tracking could benefit from peripheral view calculations that aim to compensate for events occurring outside the very center of gaze. We present a general peripheral view calculation model which extends previous work on attention-based user interfaces that use eye gaze. An intuitive, two dimensional visibility measure based on the concept of solid angle is developed for determining to which extent an object of interest observed by a user intersects with each region of the underlying visual field model. The results are weighted considering the visual acuity in each visual field region to determine the total visibility of the object. We exemplify the proposed model in a virtual reality car simulation application incorporating a head-mounted display with integrated eye tracking functionality. In this context, we provide a quantitative evaluation in terms of a runtime analysis of the different steps of our approach. We provide also several example applications including an interactive web application which visualizes the concepts and calculations presented in this paper.}, howpublished = {ICAT-EGVE2016}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Human peripheral vision suffers from several limitations that differ among various regions of the visual field. Since these limitations result in natural visual impairments, many interesting intelligent user interfaces based on eye tracking could benefit from peripheral view calculations that aim to compensate for events occurring outside the very center of gaze. We present a general peripheral view calculation model which extends previous work on attention-based user interfaces that use eye gaze. An intuitive, two dimensional visibility measure based on the concept of solid angle is developed for determining to which extent an object of interest observed by a user intersects with each region of the underlying visual field model. The results are weighted considering the visual acuity in each visual field region to determine the total visibility of the object. We exemplify the proposed model in a virtual reality car simulation application incorporating a head-mounted display with integrated eye tracking functionality. In this context, we provide a quantitative evaluation in terms of a runtime analysis of the different steps of our approach. We provide also several example applications including an interactive web application which visualizes the concepts and calculations presented in this paper. |
Barz, Michael; Sonntag, Daniel Gaze-guided Object Classification Using Deep Neural Networks for Attention-based Computing Inproceedings Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct, pp. 253-256, ACM, 2016. @inproceedings{8936, title = {Gaze-guided Object Classification Using Deep Neural Networks for Attention-based Computing}, author = {Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/8936_2016_Gaze-guided_object_classification_using_deep_neural_networks_for_attention-based_computing.pdf}, doi = {https://doi.org/10.1145/2968219.2971389}, year = {2016}, date = {2016-09-01}, booktitle = {Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct}, pages = {253-256}, publisher = {ACM}, abstract = {Recent advances in eye tracking technologies opened the way to design novel attention-based user interfaces. This is promising for pro-active and assistive technologies for cyber-physical systems in the domains of, e.g., healthcare and industry 4.0. Prior approaches to recognize a user's attention are usually limited to the raw gaze signal or sensors in instrumented environments. We propose a system that (1) incorporates the gaze signal and the egocentric camera of the eye tracker to identify the objects the user focuses at; (2) employs object classification based on deep learning which we recompiled for our purposes on a GPU-based image classification server; (3) detects whether the user actually draws attention to that object; and (4) combines these modules for constructing episodic memories of egocentric events in real-time.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Recent advances in eye tracking technologies opened the way to design novel attention-based user interfaces. This is promising for pro-active and assistive technologies for cyber-physical systems in the domains of, e.g., healthcare and industry 4.0. Prior approaches to recognize a user's attention are usually limited to the raw gaze signal or sensors in instrumented environments. We propose a system that (1) incorporates the gaze signal and the egocentric camera of the eye tracker to identify the objects the user focuses at; (2) employs object classification based on deep learning which we recompiled for our purposes on a GPU-based image classification server; (3) detects whether the user actually draws attention to that object; and (4) combines these modules for constructing episodic memories of egocentric events in real-time. |
Prange, Alexander; Sonntag, Daniel Digital PI-RADS: Smartphone Sketches for Instant Knowledge Acquisition in Prostate Cancer Detection Inproceedings 2016 IEEE 29th International Symposium on Computer-Based Medical Systems (CBMS), pp. 13-18, IEEE Xplore, 2016. @inproceedings{11234, title = {Digital PI-RADS: Smartphone Sketches for Instant Knowledge Acquisition in Prostate Cancer Detection}, author = {Alexander Prange and Daniel Sonntag}, year = {2016}, date = {2016-06-01}, booktitle = {2016 IEEE 29th International Symposium on Computer-Based Medical Systems (CBMS)}, pages = {13-18}, publisher = {IEEE Xplore}, abstract = {In order to improve reporting practices for the detection of prostate cancer, we present an application that allows urologists to create structured reports by using a digital pen on a smartphone. In this domain, printed documents cannot be easily replaced by computer systems because they contain free-form sketches and textual annotations, and the acceptance of traditional PC reporting tools is rather low among the doctors. Our approach provides an instant knowledge acquisition system by automatically interpreting the written strokes, texts, and sketches. We have incorporated this structured reporting system for MRI of the prostate (PI-RADS). Our system imposes only minimal overhead on traditional form-filling processes and provides for a direct, ontology-based structuring of the user input for semantic search and retrieval applications.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } In order to improve reporting practices for the detection of prostate cancer, we present an application that allows urologists to create structured reports by using a digital pen on a smartphone. In this domain, printed documents cannot be easily replaced by computer systems because they contain free-form sketches and textual annotations, and the acceptance of traditional PC reporting tools is rather low among the doctors. Our approach provides an instant knowledge acquisition system by automatically interpreting the written strokes, texts, and sketches. We have incorporated this structured reporting system for MRI of the prostate (PI-RADS). Our system imposes only minimal overhead on traditional form-filling processes and provides for a direct, ontology-based structuring of the user input for semantic search and retrieval applications. |
Barz, Michael; Daiber, Florian; Bulling, Andreas Prediction of Gaze Estimation Error for Error-Aware Gaze-Based Interfaces Inproceedings Proceedings of the Symposium on Eye Tracking Research and Applications, ACM, 2016. @inproceedings{8242, title = {Prediction of Gaze Estimation Error for Error-Aware Gaze-Based Interfaces}, author = {Michael Barz and Florian Daiber and Andreas Bulling}, year = {2016}, date = {2016-01-01}, booktitle = {Proceedings of the Symposium on Eye Tracking Research and Applications}, publisher = {ACM}, abstract = {Gaze estimation error is inherent in head-mounted eye trackers and seriously impacts performance, usability, and user experience of gaze-based interfaces. Particularly in mobile settings, this error varies constantly as users move in front and look at different parts of a display. We envision a new class of gaze-based interfaces that are aware of the gaze estimation error and adapt to it in real time. As a first step towards this vision we introduce an error model that is able to predict the gaze estimation error. Our method covers major building blocks of mobile gaze estimation, specifically mapping of pupil positions to scene camera coordinates, marker-based display detection, and mapping of gaze from scene camera to on-screen coordinates. We develop our model through a series of principled measurements of a state-of-the-art head-mounted eye tracker.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Gaze estimation error is inherent in head-mounted eye trackers and seriously impacts performance, usability, and user experience of gaze-based interfaces. Particularly in mobile settings, this error varies constantly as users move in front and look at different parts of a display. We envision a new class of gaze-based interfaces that are aware of the gaze estimation error and adapt to it in real time. As a first step towards this vision we introduce an error model that is able to predict the gaze estimation error. Our method covers major building blocks of mobile gaze estimation, specifically mapping of pupil positions to scene camera coordinates, marker-based display detection, and mapping of gaze from scene camera to on-screen coordinates. We develop our model through a series of principled measurements of a state-of-the-art head-mounted eye tracker. |
Lessel, Pascal; Altmeyer, Maximilian; Kerber, Frederic; Barz, Michael; Leidinger, Cornelius; Krüger, Antonio WaterCoaster: A Device to Encourage People in a Playful Fashion to Reach Their Daily Water Intake Level Inproceedings Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems, pp. 1813-1820, ACM, 2016. @inproceedings{8342, title = {WaterCoaster: A Device to Encourage People in a Playful Fashion to Reach Their Daily Water Intake Level}, author = {Pascal Lessel and Maximilian Altmeyer and Frederic Kerber and Michael Barz and Cornelius Leidinger and Antonio Krüger}, url = {https://umtl.cs.uni-saarland.de/paper_preprints/paper_watercoaster_lessel.pdf http://doi.acm.org/10.1145/2851581.2892498}, year = {2016}, date = {2016-01-01}, booktitle = {Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems}, pages = {1813-1820}, publisher = {ACM}, abstract = {In this paper, we present WaterCoaster, a mobile device and a mobile application to motivate people to drink beverages more often and more regularly. The WaterCoaster measures the amount drunk and reminds the user to consume more, if necessary. The app is designed as a game in which the user needs to take care of a virtual character living in a fish tank, dropping the water level if the user does not consume beverages in a healthy way. We report results of a pilot study (N=17) running three weeks suggesting that our approach is appreciated and subjectively influences participants. Based on the results, we look forward to evaluating the system in a long-term study in the next iteration.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } In this paper, we present WaterCoaster, a mobile device and a mobile application to motivate people to drink beverages more often and more regularly. The WaterCoaster measures the amount drunk and reminds the user to consume more, if necessary. The app is designed as a game in which the user needs to take care of a virtual character living in a fish tank, dropping the water level if the user does not consume beverages in a healthy way. We report results of a pilot study (N=17) running three weeks suggesting that our approach is appreciated and subjectively influences participants. Based on the results, we look forward to evaluating the system in a long-term study in the next iteration. |
Barz, Michael; Moniri, Mehdi; Weber, Markus; Sonntag, Daniel Multimodal multisensor activity annotation tool Inproceedings Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct, pp. 17-20, ACM, 2016. @inproceedings{8770, title = {Multimodal multisensor activity annotation tool}, author = {Michael Barz and Mehdi Moniri and Markus Weber and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/8770_2016_Multimodal_multisensor_activity_annotation_tool.pdf http://dl.acm.org/citation.cfm?id=2971459}, year = {2016}, date = {2016-01-01}, booktitle = {Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct}, pages = {17-20}, publisher = {ACM}, abstract = {In this paper we describe a multimodal-multisensor annotation tool for physiological computing; for example mobile gesture-based interaction devices or health monitoring devices can be connected. It should be used as an expert authoring tool to annotate multiple video-based sensor streams for domain-specific activities. Resulting datasets can be used as supervised datasets for new machine learning tasks. Our tool provides connectors to commercially available sensor systems (e.g., Intel RealSense F200 3D camera, Leap Motion, and Myo) and a graphical user interface for annotation.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } In this paper we describe a multimodal-multisensor annotation tool for physiological computing; for example mobile gesture-based interaction devices or health monitoring devices can be connected. It should be used as an expert authoring tool to annotate multiple video-based sensor streams for domain-specific activities. Resulting datasets can be used as supervised datasets for new machine learning tasks. Our tool provides connectors to commercially available sensor systems (e.g., Intel RealSense F200 3D camera, Leap Motion, and Myo) and a graphical user interface for annotation. |
Moniri, Mehdi; Luxenburger, Andreas; Sonntag, Daniel Peripheral View Calculation in Virtual Reality Applications Inproceedings Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct, pp. 333-336, ACM, 2016. @inproceedings{8769, title = {Peripheral View Calculation in Virtual Reality Applications}, author = {Mehdi Moniri and Andreas Luxenburger and Daniel Sonntag}, url = {http://doi.acm.org/10.1145/2968219.2971391}, year = {2016}, date = {2016-01-01}, booktitle = {Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct}, pages = {333-336}, publisher = {ACM}, abstract = {We present an application based on a general peripheral view calculation model which extends previous work on attention-based user interfaces that use eye gaze. An intuitive, two dimensional visibility measure based on the concept of solid angle is developed. We determine to which extent an object of interest, observed by a user, intersects with each region of the underlying visual field model. The results are weighted (thereby considering the visual acuity in each visual field) to determine the total visibility of the object. As a proof of concept, we exemplify the proposed model in a virtual reality application which incorporates a head-mounted display with integrated eye tracking functionality. In this context, we implement several proactive system behaviors including contextual information presentation with an adaptive level of detail and attention guidance; the latter is implemented by detecting visual acuity limitations or attention drifts.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We present an application based on a general peripheral view calculation model which extends previous work on attention-based user interfaces that use eye gaze. An intuitive, two dimensional visibility measure based on the concept of solid angle is developed. We determine to which extent an object of interest, observed by a user, intersects with each region of the underlying visual field model. The results are weighted (thereby considering the visual acuity in each visual field) to determine the total visibility of the object. As a proof of concept, we exemplify the proposed model in a virtual reality application which incorporates a head-mounted display with integrated eye tracking functionality. In this context, we implement several proactive system behaviors including contextual information presentation with an adaptive level of detail and attention guidance; the latter is implemented by detecting visual acuity limitations or attention drifts. |
Luxenburger, Andreas; Prange, Alexander; Moniri, Mehdi; Sonntag, Daniel MedicalVR: Towards Medical Remote Collaboration Using Virtual Reality Inproceedings Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct, pp. 321-324, ACM, 2016. @inproceedings{8771, title = {MedicalVR: Towards Medical Remote Collaboration Using Virtual Reality}, author = {Andreas Luxenburger and Alexander Prange and Mehdi Moniri and Daniel Sonntag}, url = {http://doi.acm.org/10.1145/2968219.2971392}, year = {2016}, date = {2016-01-01}, booktitle = {Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct}, pages = {321-324}, publisher = {ACM}, abstract = {We present a virtual reality framework and assistive tool for design practices for new medical environments, grounded on human visual perception, attention and action. This includes an interactive visualization of shared electronic patient records, previously acquired with a remote tablet device, in a virtual environment incorporating hand tracking, eye tracking and a vision-based peripheral view monitoring. The goal is to influence medical environments' affordances, especially for e-health and m-health applications as well as user experience and design conception for tele-medicine.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We present a virtual reality framework and assistive tool for design practices for new medical environments, grounded on human visual perception, attention and action. This includes an interactive visualization of shared electronic patient records, previously acquired with a remote tablet device, in a virtual environment incorporating hand tracking, eye tracking and a vision-based peripheral view monitoring. The goal is to influence medical environments' affordances, especially for e-health and m-health applications as well as user experience and design conception for tele-medicine. |
Luxenburger, Andreas; Sonntag, Daniel Immersive Virtual Reality Games for Persuasion Inproceedings Meschtscherjakov, Alexander; Ruyter, Boris De; Fuchsberger, Verena; Murer, Martin; Tscheligi, Manfred (Ed.): Proceedings of the 11th International Conference on Persuasive Technology 2016: Adjunct, pp. 110-111, o.A., 2016. @inproceedings{8930, title = {Immersive Virtual Reality Games for Persuasion}, author = {Andreas Luxenburger and Daniel Sonntag}, editor = {Alexander Meschtscherjakov and Boris De Ruyter and Verena Fuchsberger and Martin Murer and Manfred Tscheligi}, url = {https://www.dfki.de/fileadmin/user_upload/import/8930_DC_Persuasive'16_Luxenburger.pdf}, year = {2016}, date = {2016-01-01}, booktitle = {Proceedings of the 11th International Conference on Persuasive Technology 2016: Adjunct}, pages = {110-111}, publisher = {o.A.}, abstract = {Virtual reality (VR) can create stunning and memorable experiences and has been used in many different areas such as entertainment and simulation. Immersion states a key factor in this context. Fusing immersive virtual environments with persuasive technology (PT) in a game setting paves the way for creating interactive platforms aiming at user-oriented behavioral change. This work outlines important aspects for designing an immersive VR game platform for persuasion. Our future research aims at investigating how and to which extent recent advances in intelligent user interfaces (IUIs) can benefit immersion and persuasion. In particular, this includes how interactions with a persuasive VR game platform are influenced by contextual or individual conditions and how associated designs can be adapted to target audiences in specific, like medical or educational contexts.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Virtual reality (VR) can create stunning and memorable experiences and has been used in many different areas such as entertainment and simulation. Immersion states a key factor in this context. Fusing immersive virtual environments with persuasive technology (PT) in a game setting paves the way for creating interactive platforms aiming at user-oriented behavioral change. This work outlines important aspects for designing an immersive VR game platform for persuasion. Our future research aims at investigating how and to which extent recent advances in intelligent user interfaces (IUIs) can benefit immersion and persuasion. In particular, this includes how interactions with a persuasive VR game platform are influenced by contextual or individual conditions and how associated designs can be adapted to target audiences in specific, like medical or educational contexts. |
2015 |
Inproceedings |
Orlosky, Jason; Weber, Markus; Gu, Yecheng; Sonntag, Daniel; Sosnovsky, Sergey An Interactive Pedestrian Environment Simulator for Cognitive Monitoring and Evaluation Inproceedings Proceedings of the 20th International Conference on Intelligent User Interfaces Companion, pp. 57-60, ACM, 2015. @inproceedings{7676, title = {An Interactive Pedestrian Environment Simulator for Cognitive Monitoring and Evaluation}, author = {Jason Orlosky and Markus Weber and Yecheng Gu and Daniel Sonntag and Sergey Sosnovsky}, url = {https://www.dfki.de/fileadmin/user_upload/import/7676_2015_An_Interactive_Pedestrian_Environment_Simulator_for_Cognitive_Monitoring_and_Evaluation.pdf}, year = {2015}, date = {2015-01-01}, booktitle = {Proceedings of the 20th International Conference on Intelligent User Interfaces Companion}, pages = {57-60}, publisher = {ACM}, abstract = {Recent advances in virtual and augmented reality have led to the development of a number of simulations for different applications. In particular, simulations for monitoring, evaluation, training, and education have started to emerge for the consumer market due to the availability and affordability of immersive display technology. In this work, we introduce a virtual reality environment that provides an immersive traffic simulation designed to observe behavior and monitor relevant skills and abilities of pedestrians who may be at risk, such as elderly persons with cognitive impairments. The system provides basic reactive functionality, such as display of navigation instructions and notifications of dangerous obstacles during navigation tasks. Methods for interaction using hand and arm gestures are also implemented to allow users explore the environment in a more natural manner.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Recent advances in virtual and augmented reality have led to the development of a number of simulations for different applications. In particular, simulations for monitoring, evaluation, training, and education have started to emerge for the consumer market due to the availability and affordability of immersive display technology. In this work, we introduce a virtual reality environment that provides an immersive traffic simulation designed to observe behavior and monitor relevant skills and abilities of pedestrians who may be at risk, such as elderly persons with cognitive impairments. The system provides basic reactive functionality, such as display of navigation instructions and notifications of dangerous obstacles during navigation tasks. Methods for interaction using hand and arm gestures are also implemented to allow users explore the environment in a more natural manner. |
Prange, Alexander; Sonntag, Daniel Easy Deployment of Spoken Dialogue Technology on Smartwatches for Mental Healthcare Inproceedings Pervasive Computing Paradigms for Mental Health - 5th International Conference, MindCare 2015, Milan, Italy, September 24-25, 2015, Revised Selected Papers, pp. 150-156, Springer, 2015. @inproceedings{11231, title = {Easy Deployment of Spoken Dialogue Technology on Smartwatches for Mental Healthcare}, author = {Alexander Prange and Daniel Sonntag}, doi = {https://doi.org/10.1007/978-3-319-32270-4_15}, year = {2015}, date = {2015-01-01}, booktitle = {Pervasive Computing Paradigms for Mental Health - 5th International Conference, MindCare 2015, Milan, Italy, September 24-25, 2015, Revised Selected Papers}, volume = {604}, pages = {150-156}, publisher = {Springer}, abstract = {Smartwatches are becoming increasingly sophisticated and popular as several major smartphone manufacturers, including Apple, have released their new models recently. We believe that these devices can serve as smart objects for people suffering from mental disorders such as memory loss. In this paper, we describe how to utilise smartwatches to create intelligent user interfaces that can be used to provide cognitive assistance in daily life situations of dementia patients. By using automatic speech recognisers and text-to-speech synthesis, we create a dialogue application that allows patients to interact through natural language. We compare several available libraries for Android and show an example of integrating a smartwatch application into an existing healthcare infrastructure.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Smartwatches are becoming increasingly sophisticated and popular as several major smartphone manufacturers, including Apple, have released their new models recently. We believe that these devices can serve as smart objects for people suffering from mental disorders such as memory loss. In this paper, we describe how to utilise smartwatches to create intelligent user interfaces that can be used to provide cognitive assistance in daily life situations of dementia patients. By using automatic speech recognisers and text-to-speech synthesis, we create a dialogue application that allows patients to interact through natural language. We compare several available libraries for Android and show an example of integrating a smartwatch application into an existing healthcare infrastructure. |
Prange, Alexander; Sandrala, Indra Praveen; Weber, Markus; Sonntag, Daniel Robot Companions and Smartpens for Improved Social Communication of Dementia Patients Inproceedings Proceedings of the 20th International Conference on Intelligent User Interfaces Companion, Association for Computing Machinery, 2015. @inproceedings{11232, title = {Robot Companions and Smartpens for Improved Social Communication of Dementia Patients}, author = {Alexander Prange and Indra Praveen Sandrala and Markus Weber and Daniel Sonntag}, doi = {https://doi.org/10.1145/2732158.2732174}, year = {2015}, date = {2015-01-01}, booktitle = {Proceedings of the 20th International Conference on Intelligent User Interfaces Companion}, publisher = {Association for Computing Machinery}, abstract = {In this demo paper we describe how a digital pen and a humanoid robot companion can improve the social communication of a dementia patient. We propose the use of NAO, a humanoid robot, as a companion to the dementia patient in order to continuously monitor his or her activities and provide cognitive assistance in daily life situations. For example, patients can communicate with NAO through natural language by the speech dialogue functionality we integrated. Most importantly, to improve communication, i.e., sending digital messages (texting, emails), we propose the usage of a smartpen, where the patients write messages on normal paper with an invisible dot pattern to initiate hand-writing and sketch recognition in real-time. The smartpen application is embedded into the human-robot speech dialogue.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } In this demo paper we describe how a digital pen and a humanoid robot companion can improve the social communication of a dementia patient. We propose the use of NAO, a humanoid robot, as a companion to the dementia patient in order to continuously monitor his or her activities and provide cognitive assistance in daily life situations. For example, patients can communicate with NAO through natural language by the speech dialogue functionality we integrated. Most importantly, to improve communication, i.e., sending digital messages (texting, emails), we propose the usage of a smartpen, where the patients write messages on normal paper with an invisible dot pattern to initiate hand-writing and sketch recognition in real-time. The smartpen application is embedded into the human-robot speech dialogue. |
Prange, Alexander; Toyama, Takumi; Sonntag, Daniel Towards Gaze and Gesture Based Human-Robot Interaction for Dementia Patients Inproceedings 2015 AAAI Fall Symposia, Arlington, Virginia, USA, November 12-14, 2015, pp. 111-113, AAAI Press, 2015. @inproceedings{11233, title = {Towards Gaze and Gesture Based Human-Robot Interaction for Dementia Patients}, author = {Alexander Prange and Takumi Toyama and Daniel Sonntag}, url = {http://www.aaai.org/ocs/index.php/FSS/FSS15/paper/view/11696}, year = {2015}, date = {2015-01-01}, booktitle = {2015 AAAI Fall Symposia, Arlington, Virginia, USA, November 12-14, 2015}, pages = {111-113}, publisher = {AAAI Press}, abstract = {Gaze and gestures are important modalities in human-human interactions and hence important to human-robot interaction. We describe how to use human gaze and robot pointing gestures to disambiguate and extend a human-robot speech dialogue developed for aiding people suffering from dementia.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Gaze and gestures are important modalities in human-human interactions and hence important to human-robot interaction. We describe how to use human gaze and robot pointing gestures to disambiguate and extend a human-robot speech dialogue developed for aiding people suffering from dementia. |
2020 |
Inproceedings |
Digital Pen Features Predict Task Difficulty and User Performance of Cognitive Tests Inproceedings Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization, ACM, 2020. |
Visual Search Target Inference in Natural Interaction Settings with Machine Learning Inproceedings Bulling, Andreas; Huckauf, Anke; Jain, Eakta; Radach, Ralph; Weiskopf, Daniel (Ed.): ACM Symposium on Eye Tracking Research and Applications, Association for Computing Machinery, 2020. |
A Study on the Fusion of Pixels and Patient Metadata in CNN-Based Classification of Skin Lesion Images Inproceedings Holzinger, Andreas; Kieseberg, Peter; Tjoa, Min A; Weippl, Edgar (Ed.): Machine Learning and Knowledge Extraction, pp. 191-208, Springer International Publishing, 2020. |
Game of TUK: deploying a large-scale activity-boosting gamification project in a university context Inproceedings Mensch und Computer, ACM, 2020. |
Technical Reports |
The Skincare project, an interactive deep learning system for differential diagnosis of malignant skin lesions. Technical Report BMBF, H2020 Bundesministerium für Bildung und Forschung Kapelle-Ufer 1 D-10117 Berlin, , 2020. |
Künstliche Intelligenz gegen das Coronavirus Technical Report DFKI, BMBF, BMG , 2020. |
A Competitive Deep Neural Network Approach for the ImageCLEFmed Caption 2020 Task Technical Report German Research Center for Artificial Intelligence , 2020. |
2019 |
Journal Articles |
Incremental Improvement of a Question Answering System by Re-ranking Answer Candidates using Machine Learning Journal Article Computing Research Repository eprint Journal, abs/1908.10149 , pp. 1-13, 2019. |
Künstliche Intelligenz in der Medizin -- Holzweg oder Heilversprechen? Journal Article HNO, 67 , pp. 343-349, 2019. |
Book Chapters |
Medical and Health Systems Book Chapter The Handbook of Multimodal-Multisensor Interfaces: Language Processing, Software, Commercialization, and Emerging Directions - Volume 3, pp. 423-476, Association for Computing Machinery and Morgan & Claypool, 2019. |
Incollections |
Software Platforms and Toolkits for Building Multimodal Systems and Applications Incollection Oviatt, Sharon; Schuller, Björn; Cohen, Philip R; Potamianos, Gerasimos; Krüger, Antonio; Sonntag, Daniel (Ed.): The Handbook of Multimodal-Multisensor Interfaces, Volume 3 -- Language Processing, Software, Commercialization, and Emerging Directions, #23 , pp. 145-190, Morgan & Claypool Publishers, 2019. |
Inproceedings |
Automatic Judgement of Neural Network-Generated Image Captions Inproceedings Martin-Vide, Carlos; Purver, Matthew; Pollak, Senja (Ed.): Statistical Language and Speech Processing - 7th International Conference, Proceedings, pp. 261-272, Springer, Jamova cesta 39 1000 Ljubljana Slovenia, 2019. |
Incremental Domain Adaptation for Neural Machine Translation in Low-Resource Settings Inproceedings Proceedings of the Fourth Arabic Natural Language Processing Workshop, pp. 1-10, Association for Computational Linguistics, 2019. |
Multimodal Speech-based Dialogue for the Mini-Mental State Examination Inproceedings Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, pp. CS13:1-CS13:8, ACM, 2019. |
Modeling Cognitive Status through Automatic Scoring of a Digital Version of the Clock Drawing Test Inproceedings Proceedings of the 27th ACM Conference on User Modeling, Adaptation and Personalization, pp. 70-77, ACM, 2019. |
Technical Reports |
Wie funktionieren neuronale Netze eigentlich? Technical Report DFKI , 2019. |
Interactivity and Transparency in Medical Risk Assessment with Supersparse Linear Integer Models Technical Report BMBF , 2019. |
2018 |
Journal Articles |
A categorisation and implementation of digital pen features for behaviour characterisation Journal Article Computing Research Repository eprint Journal, abs/1810.03970 , pp. 1-42, 2018. |
A Survey on Deep Learning Toolkits and Libraries for Intelligent User Interfaces Journal Article Computing Research Repository eprint Journal, abs/1803.04818 , pp. 1-10, 2018. |
An architecture of open-source tools to combine textual information extraction, faceted search and information visualisation Journal Article Computing Research Repository eprint Journal, abs/1810.12627 , pp. 13-28, 2018. |
Inproceedings |
Device-Type Influence in Crowd-based Natural Language Translation Tasks (short paper) Inproceedings Aroyo, Lora; Dumitrache, Anca; Paritosh, Praveen; Quinn, Alexander J; Welty, Chris; Checco, Alessandro; Demartini, Gianluca; Gadiraju, Ujwal; Sarasua, Cristina (Ed.): Proceedings of the 1st Workshop on Subjectivity, Ambiguity and Disagreement in Crowdsourcing, and Short Paper Proceedings of the 1st Workshop on Disentangling the Relation Between Crowdsourcing and Bias Management (SAD 2018 and CrowdBias 2018), pp. 93-97, CEUR-WS.org, 2018. |
Visual Search Target Inference Using Bag of Deep Visual Words Inproceedings Trollmann, Frank; Turhan, Anni-Yasmin (Ed.): KI 2018: Advances in Artificial Intelligence - 41st German Conference on AI, Springer, 2018. |
Medical 3D Images in Multimodal Virtual Reality Inproceedings IUI Companion, pp. 19:1-19:2, ACM, 2018. |
Error-aware Gaze-based Interfaces for Robust Mobile Gaze Interaction Inproceedings Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications, pp. 24:1-24:10, ACM, 2018. |
Towards a Multimodal Multisensory Cognitive Assessment Framework Inproceedings Proceedings of the 30th IEEE International Symposium on Computer-Based Medical System, IEEE, 2018. |
Miscellaneous |
Towards Hybrid Human-Machine Translation Services Miscellaneous EasyChair Preprint no. 333, 2018. |
2017 |
Journal Articles |
Fine-tuning deep CNN models on specific MS COCO categories Journal Article Computing Research Repository eprint Journal, abs/1709.01476 , pp. 0-3, 2017. |
A novel tool for the identification of correlations in medical data by faceted search Journal Article Computers in Biology and Medicine - An International Journal, 85 , pp. 98-105, 2017. |
Inproceedings |
Human-in-the-Loop Control Processes in Gas Turbine Maintenance Inproceedings Ma&#;ík, Vladimír; Strasser, Thomas; Kadera, Petr; Wahlster, Wolfgang (Ed.): Industrial Applications of Holonic and Multi-Agent Systems: 8th International Conference, HoloMAS 2017, Springer International Publishing, 2017. |
A Multimodal Dialogue System for Medical Decision Support inside Virtual Reality Inproceedings Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pp. 23-26, Association for Computational Linguistics, 2017. |
A Digital Pen Based Tool for Instant Digitisation and Digitalisation of Biopsy Protocols Inproceedings 2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS), pp. 773-774, IEEE Xplore, 2017. |
Evaluating Remote and Head-worn Eye Trackers in Multi-modal Speech-based HRI Inproceedings Mutlu, Bilge; Tscheligi, Manfred; Weiss, Astrid; Young, James E (Ed.): Companion of the 2017 ACM/IEEE International Conference on Human-Robot Interaction, pp. 79-80, ACM, 2017. |
Evaluating Remote and Head-worn Eye Trackers in Multi-modal Speech-based HRI (Demo) Inproceedings Mutlu, Bilge; Tscheligi, Manfred; Weiss, Astrid; Young, James E (Ed.): Companion of the 2017 ACM/IEEE International Conference on Human-Robot Interaction, ACM, 2017. |
Speech-based Medical Decision Support in VR using a Deep Neural Network (Demonstration) Inproceedings Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pp. 5241-5242, IJCAI, 2017. |
Integrated Decision Support by Combining Textual Information Extraction, Facetted Search and Information Visualisation Inproceedings 2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS), pp. 95-100, IEEE, 2017. |
Miscellaneous |
Interakt - A Multimodal Multisensory Interactive Cognitive Assessment Tool Miscellaneous 2017. |
2016 |
Journal Articles |
The Clinical Data Intelligence Project - A smart data initiative Journal Article Informatik Spektrum, 39 , pp. 290-300, 2016. |
Inproceedings |
Real-Time 3D Peripheral View Analysis Inproceedings Dirk, Reiners; Daisuke, Iwai; Frank, Steinicke (Ed.): Proceedings of the International Conference on Artificial Reality and Telexistence and Eurographics Symposium on Virtual Environments 2016, pp. 37-44, The Eurographics Association, 2016. |
Gaze-guided Object Classification Using Deep Neural Networks for Attention-based Computing Inproceedings Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct, pp. 253-256, ACM, 2016. |
Digital PI-RADS: Smartphone Sketches for Instant Knowledge Acquisition in Prostate Cancer Detection Inproceedings 2016 IEEE 29th International Symposium on Computer-Based Medical Systems (CBMS), pp. 13-18, IEEE Xplore, 2016. |
Prediction of Gaze Estimation Error for Error-Aware Gaze-Based Interfaces Inproceedings Proceedings of the Symposium on Eye Tracking Research and Applications, ACM, 2016. |
WaterCoaster: A Device to Encourage People in a Playful Fashion to Reach Their Daily Water Intake Level Inproceedings Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems, pp. 1813-1820, ACM, 2016. |
Multimodal multisensor activity annotation tool Inproceedings Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct, pp. 17-20, ACM, 2016. |
Peripheral View Calculation in Virtual Reality Applications Inproceedings Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct, pp. 333-336, ACM, 2016. |
MedicalVR: Towards Medical Remote Collaboration Using Virtual Reality Inproceedings Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct, pp. 321-324, ACM, 2016. |
Immersive Virtual Reality Games for Persuasion Inproceedings Meschtscherjakov, Alexander; Ruyter, Boris De; Fuchsberger, Verena; Murer, Martin; Tscheligi, Manfred (Ed.): Proceedings of the 11th International Conference on Persuasive Technology 2016: Adjunct, pp. 110-111, o.A., 2016. |
2015 |
Inproceedings |
An Interactive Pedestrian Environment Simulator for Cognitive Monitoring and Evaluation Inproceedings Proceedings of the 20th International Conference on Intelligent User Interfaces Companion, pp. 57-60, ACM, 2015. |
Easy Deployment of Spoken Dialogue Technology on Smartwatches for Mental Healthcare Inproceedings Pervasive Computing Paradigms for Mental Health - 5th International Conference, MindCare 2015, Milan, Italy, September 24-25, 2015, Revised Selected Papers, pp. 150-156, Springer, 2015. |
Robot Companions and Smartpens for Improved Social Communication of Dementia Patients Inproceedings Proceedings of the 20th International Conference on Intelligent User Interfaces Companion, Association for Computing Machinery, 2015. |
Towards Gaze and Gesture Based Human-Robot Interaction for Dementia Patients Inproceedings 2015 AAAI Fall Symposia, Arlington, Virginia, USA, November 12-14, 2015, pp. 111-113, AAAI Press, 2015. |