2024 |
Journal Articles |
Selim, Abdulrahman Mohamed; Barz, Michael; Bhatti, Omair Shahzad; Alam, Hasan Md Tusfiqur; Sonntag, Daniel A review of machine learning in scanpath analysis for passive gaze-based interaction Journal Article Frontiers in Artificial Intelligence, 7 , pp. 1-28, 2024. @article{14976, title = {A review of machine learning in scanpath analysis for passive gaze-based interaction}, author = {Abdulrahman Mohamed Selim and Michael Barz and Omair Shahzad Bhatti and Hasan Md Tusfiqur Alam and Daniel Sonntag}, editor = {Maria Chiara Caschera}, url = {https://www.dfki.de/fileadmin/user_upload/import/14976_frai-07-1391745.pdf}, year = {2024}, date = {2024-06-01}, journal = {Frontiers in Artificial Intelligence}, volume = {7}, pages = {1-28}, publisher = {Frontiers Media SA}, abstract = {The scanpath is an important concept in eye tracking. It refers to a person's eye movements over a period of time, commonly represented as a series of alternating fixations and saccades. Machine learning has been increasingly used for the automatic interpretation of scanpaths over the past few years, particularly in research on passive gaze-based interaction, i.e., interfaces that implicitly observe and interpret human eye movements, with the goal of improving the interaction. This literature review investigates research on machine learning applications in scanpath analysis for passive gaze-based interaction between 2012 and 2022, starting from 2,425 publications and focussing on 77 publications. We provide insights on research domains and common learning tasks in passive gaze-based interaction and present common machine learning practices from data collection and preparation to model selection and evaluation. We discuss commonly followed practices and identify gaps and challenges, especially concerning emerging machine learning topics, to guide future research in the field.}, keywords = {}, pubstate = {published}, tppubtype = {article} } The scanpath is an important concept in eye tracking. It refers to a person's eye movements over a period of time, commonly represented as a series of alternating fixations and saccades. Machine learning has been increasingly used for the automatic interpretation of scanpaths over the past few years, particularly in research on passive gaze-based interaction, i.e., interfaces that implicitly observe and interpret human eye movements, with the goal of improving the interaction. This literature review investigates research on machine learning applications in scanpath analysis for passive gaze-based interaction between 2012 and 2022, starting from 2,425 publications and focussing on 77 publications. We provide insights on research domains and common learning tasks in passive gaze-based interaction and present common machine learning practices from data collection and preparation to model selection and evaluation. We discuss commonly followed practices and identify gaps and challenges, especially concerning emerging machine learning topics, to guide future research in the field. |
Bengler, Klaus; Damm, Werner; Luedtke, Andreas; Rieger, Jochem; Austel, Benedikt; Biebl, Bianca; Fränzle, Martin; Hagemann, Willem; Held, Moritz; Hess, David; Ihme, Klas; Kacianka, Severin; Kerscher, Alyssa J; Forrest, Laine; Lehnhoff, Sebastian; Pretschner, Alexander; Rakow, Astrid; Sonntag, Daniel; Sztipanovits, Janos; Schwammberger, Maike; Schweda, Mark; Unni, Anirudh; Veith, Eric A References Architecture for Human Cyber Physical Systems, Part II: Fundamental Design Principles for Human-CPS Interaction Journal Article ACM Transactions on Cyber-Physical Systems, 8 , pp. 1-27, 2024. @article{14704, title = {A References Architecture for Human Cyber Physical Systems, Part II: Fundamental Design Principles for Human-CPS Interaction}, author = {Klaus Bengler and Werner Damm and Andreas Luedtke and Jochem Rieger and Benedikt Austel and Bianca Biebl and Martin Fränzle and Willem Hagemann and Moritz Held and David Hess and Klas Ihme and Severin Kacianka and Alyssa J Kerscher and Laine Forrest and Sebastian Lehnhoff and Alexander Pretschner and Astrid Rakow and Daniel Sonntag and Janos Sztipanovits and Maike Schwammberger and Mark Schweda and Anirudh Unni and Eric Veith}, url = {https://www.dfki.de/fileadmin/user_upload/import/14704_a_references_architecture_for_human_cyber_physical_systems_2.pdf https://scholar.google.de/citations?view_op=view_citation&hl=en&user=v7i6Uz4AAAAJ&sortby=pubdate&citation_for_view=v7i6Uz4AAAAJ:IUKN3-7HHlwC}, year = {2024}, date = {2024-01-01}, journal = {ACM Transactions on Cyber-Physical Systems}, volume = {8}, pages = {1-27}, publisher = {ACM}, abstract = {As automation increases qualitatively and quantitatively in safety-critical human cyber-physical systems, it is becoming more and more challenging to increase the probability or ensure that human operators still per- ceive key artifacts and comprehend their roles in the system. In the companion paper, we proposed an abstract reference architecture capable of expressing all classes of system-level interactions in human cyber-physical systems. Here we demonstrate how this reference architecture supports the analysis of levels of communi- cation between agents and helps to identify the potential for misunderstandings and misconceptions. We then develop a metamodel for safe human machine interaction. Therefore, we ask what type of information exchange must be supported on what level so that humans and systems can cooperate as a team, what is the criticality of exchanged information, what are timing requirements for such interactions, and how can we communicate highly critical information in a limited time frame in spite of the many sources of a dis- torted perception. We highlight shared stumbling blocks and illustrate shared design principles, which rest on established ontologies specific to particular application classes. In order to overcome the partial opacity of internal states of agents, we anticipate a key role of virtual twins of both human and technical cooperation partners for designing a suitable communication.}, keywords = {}, pubstate = {published}, tppubtype = {article} } As automation increases qualitatively and quantitatively in safety-critical human cyber-physical systems, it is becoming more and more challenging to increase the probability or ensure that human operators still per- ceive key artifacts and comprehend their roles in the system. In the companion paper, we proposed an abstract reference architecture capable of expressing all classes of system-level interactions in human cyber-physical systems. Here we demonstrate how this reference architecture supports the analysis of levels of communi- cation between agents and helps to identify the potential for misunderstandings and misconceptions. We then develop a metamodel for safe human machine interaction. Therefore, we ask what type of information exchange must be supported on what level so that humans and systems can cooperate as a team, what is the criticality of exchanged information, what are timing requirements for such interactions, and how can we communicate highly critical information in a limited time frame in spite of the many sources of a dis- torted perception. We highlight shared stumbling blocks and illustrate shared design principles, which rest on established ontologies specific to particular application classes. In order to overcome the partial opacity of internal states of agents, we anticipate a key role of virtual twins of both human and technical cooperation partners for designing a suitable communication. |
Damm, Werner; Hess, David; Schweda, Mark; Sztipanovits, Janos; Bengler, Klaus; Biebl, Bianca; Fränzle, Martin; Hagemann, Willem; Held, Moritz; Ihme, Klas; Kacianka, Severin; Kerscher, Alyssa J; Lehnhoff, Sebastian; Luedtke, Andreas; Pretschner, Alexander; Rakow, Astrid; Rieger, Jochem; Sonntag, Daniel; Schwammberger, Maike; Austel, Benedikt; Unni, Anirudh; Veith, Eric A Reference Architecture of Human Cyber-Physical Systems – Part I: Fundamental Concepts Journal Article ACM Transactions on Cyber-Physical Systems, 8 , pp. 1-32, 2024. @article{14705, title = {A Reference Architecture of Human Cyber-Physical Systems – Part I: Fundamental Concepts}, author = {Werner Damm and David Hess and Mark Schweda and Janos Sztipanovits and Klaus Bengler and Bianca Biebl and Martin Fränzle and Willem Hagemann and Moritz Held and Klas Ihme and Severin Kacianka and Alyssa J Kerscher and Sebastian Lehnhoff and Andreas Luedtke and Alexander Pretschner and Astrid Rakow and Jochem Rieger and Daniel Sonntag and Maike Schwammberger and Benedikt Austel and Anirudh Unni and Eric Veith}, url = {https://www.dfki.de/fileadmin/user_upload/import/14705_a_references_architecture_for_human_cyber_physical_systems_1.pdf https://scholar.google.de/citations?view_op=view_citation&hl=en&user=v7i6Uz4AAAAJ&sortby=pubdate&citation_for_view=v7i6Uz4AAAAJ:mlAyqtXpCwEC}, year = {2024}, date = {2024-01-01}, journal = {ACM Transactions on Cyber-Physical Systems}, volume = {8}, pages = {1-32}, publisher = {ACM}, abstract = {We propose a reference architecture of safety-critical or industry-critical human cyber-physical systems (CPSs) capable of expressing essential classes of system-level interactions between CPS and humans rele- vant for the societal acceptance of such systems. To reach this quality gate, the expressivity of the model must go beyond classical viewpoints such as operational, functional, and architectural views and views used for safety and security analysis. The model does so by incorporating elements of such systems for mutual introspections in situational awareness, capabilities, and intentions to enable a synergetic, trusted relation in the interaction of humans and CPSs, which we see as a prerequisite for their societal acceptance. The refer- ence architecture is represented as a metamodel incorporating conceptual and behavioral semantic aspects. We illustrate the key concepts of the metamodel with examples from cooperative autonomous driving, the operating room of the future, cockpit-tower interaction, and crisis management.}, keywords = {}, pubstate = {published}, tppubtype = {article} } We propose a reference architecture of safety-critical or industry-critical human cyber-physical systems (CPSs) capable of expressing essential classes of system-level interactions between CPS and humans rele- vant for the societal acceptance of such systems. To reach this quality gate, the expressivity of the model must go beyond classical viewpoints such as operational, functional, and architectural views and views used for safety and security analysis. The model does so by incorporating elements of such systems for mutual introspections in situational awareness, capabilities, and intentions to enable a synergetic, trusted relation in the interaction of humans and CPSs, which we see as a prerequisite for their societal acceptance. The refer- ence architecture is represented as a metamodel incorporating conceptual and behavioral semantic aspects. We illustrate the key concepts of the metamodel with examples from cooperative autonomous driving, the operating room of the future, cockpit-tower interaction, and crisis management. |
Damm, Werner; Fränzle, Martin; Kerscher, Alyssa J; Laine, Forrest; Bengler, Klaus; Biebl, Bianca; Hagemann, Willem; Held, Moritz; Hess, David; Ihme, Klas; Kacianka, Severin; Lehnhoff, Sebastian; Lüdtke, Andreas; Pretschner, Alexander; Rakow, Astrid; Rieger, Jochem W; Sonntag, Daniel; Sztipanovits, Janos; Schwammberger, Maike; Schweda, Mark; Trende, Alexander; Unni, Anirudh; Veith, Eric M S P A Reference Architecture of Human Cyber-Physical Systems - Part III: Semantic Foundations Journal Article ACM Transactions on Cyber-Physical Systems, 8 , pp. 1-23, 2024. @article{14732, title = {A Reference Architecture of Human Cyber-Physical Systems - Part III: Semantic Foundations}, author = {Werner Damm and Martin Fränzle and Alyssa J Kerscher and Forrest Laine and Klaus Bengler and Bianca Biebl and Willem Hagemann and Moritz Held and David Hess and Klas Ihme and Severin Kacianka and Sebastian Lehnhoff and Andreas Lüdtke and Alexander Pretschner and Astrid Rakow and Jochem W Rieger and Daniel Sonntag and Janos Sztipanovits and Maike Schwammberger and Mark Schweda and Alexander Trende and Anirudh Unni and Eric M S P Veith}, url = {https://www.dfki.de/fileadmin/user_upload/import/14732_A_Reference_Architecture_of_Human_Cyber-Physical_Systems_Part_III.pdf https://dl.acm.org/doi/10.1145/3622881}, year = {2024}, date = {2024-01-01}, journal = {ACM Transactions on Cyber-Physical Systems}, volume = {8}, pages = {1-23}, publisher = {ACM}, abstract = {The design and analysis of multi-agent human cyber-physical systems in safety-critical or industry-critical domains calls for an adequate semantic foundation capable of exhaustively and rigorously describing all emergent effects in the joint dynamic behavior of the agents that are relevant to their safety and well-behavior. We present such a semantic foundation. This framework extends beyond previous approaches by extending the agent-local dynamic state beyond state components under direct control of the agent and belief about other agents (as previously suggested for understanding cooperative as well as rational behavior) to agent-local evidence and belief about the overall cooperative, competitive, or coopetitive game structure. We argue that this extension is necessary for rigorously analyzing systems of human cyber-physical systems because humans are known to employ cognitive replacement models of system dynamics that are both non-stationary and potentially incongruent. These replacement models induce visible and potentially harmful effects on their joint emergent behavior and the interaction with cyber-physical system components.}, keywords = {}, pubstate = {published}, tppubtype = {article} } The design and analysis of multi-agent human cyber-physical systems in safety-critical or industry-critical domains calls for an adequate semantic foundation capable of exhaustively and rigorously describing all emergent effects in the joint dynamic behavior of the agents that are relevant to their safety and well-behavior. We present such a semantic foundation. This framework extends beyond previous approaches by extending the agent-local dynamic state beyond state components under direct control of the agent and belief about other agents (as previously suggested for understanding cooperative as well as rational behavior) to agent-local evidence and belief about the overall cooperative, competitive, or coopetitive game structure. We argue that this extension is necessary for rigorously analyzing systems of human cyber-physical systems because humans are known to employ cognitive replacement models of system dynamics that are both non-stationary and potentially incongruent. These replacement models induce visible and potentially harmful effects on their joint emergent behavior and the interaction with cyber-physical system components. |
Book Chapters |
Barz, Michael; Karagiannis, Panagiotis; Kildal, Johan; Pinto, Andoni Rivera; de Munain, Judit Ruiz; Rosel, Jesús; Madarieta, Maria; Salagianni, Konstantina; Aivaliotis, Panagiotis; Makris, Sotiris; Sonntag, Daniel MASTER-XR: Mixed reAlity ecoSystem for TEaching Robotics in manufacturing Book Chapter Alam, Mohammad-Reza; Fathi, Madjid (Ed.): Integrated Systems: Innovations and Applications: Results of the 8th International Conference on Integrated Systems Design and Technology (ISDT 2023), pp. 1-16, Springer, 2024. @inbook{14702, title = {MASTER-XR: Mixed reAlity ecoSystem for TEaching Robotics in manufacturing}, author = {Michael Barz and Panagiotis Karagiannis and Johan Kildal and Andoni Rivera Pinto and Judit Ruiz de Munain and Jesús Rosel and Maria Madarieta and Konstantina Salagianni and Panagiotis Aivaliotis and Sotiris Makris and Daniel Sonntag}, editor = {Mohammad-Reza Alam and Madjid Fathi}, url = {https://www.amazon.de/Integrated-Systems-Innovations-Applications-International/dp/3031536517}, year = {2024}, date = {2024-01-01}, booktitle = {Integrated Systems: Innovations and Applications: Results of the 8th International Conference on Integrated Systems Design and Technology (ISDT 2023)}, pages = {1-16}, publisher = {Springer}, abstract = {Many industries are transitioning to Industry 4.0 production models by adopting robots in their manufacturing processes. In parallel, Extended Reality (XR) technologies have reached sufficient maturity to enter the industrial applications domain, with early success cases often related to training workers, remote assistance, access to contextual information, and interaction with digital twins. In the future, robots will be increasingly enhanced with XR applications, which requires that industrial workers understand both technologies and use and control hybrid solutions confidently. Specific education and training programs will be essential to this transition, especially for vocational school students and professionals in upskilling. They must learn how to program robots and establish a safe and productive human-robot collaboration. The new EU-funded project MASTER will improve the XR ecosystem for teaching and training robotics in manufacturing by providing an open XR platform that integrates key functionalities like creating safe robotic environments, programming flexible robotic applications, and integrating advanced interaction mechanisms based on eye tracking. It will also provide high-quality training materials for robotics. We report on the project plan, our objectives, and milestones.}, keywords = {}, pubstate = {published}, tppubtype = {inbook} } Many industries are transitioning to Industry 4.0 production models by adopting robots in their manufacturing processes. In parallel, Extended Reality (XR) technologies have reached sufficient maturity to enter the industrial applications domain, with early success cases often related to training workers, remote assistance, access to contextual information, and interaction with digital twins. In the future, robots will be increasingly enhanced with XR applications, which requires that industrial workers understand both technologies and use and control hybrid solutions confidently. Specific education and training programs will be essential to this transition, especially for vocational school students and professionals in upskilling. They must learn how to program robots and establish a safe and productive human-robot collaboration. The new EU-funded project MASTER will improve the XR ecosystem for teaching and training robotics in manufacturing by providing an open XR platform that integrates key functionalities like creating safe robotic environments, programming flexible robotic applications, and integrating advanced interaction mechanisms based on eye tracking. It will also provide high-quality training materials for robotics. We report on the project plan, our objectives, and milestones. |
Inproceedings |
Selim, Abdulrahman Mohamed; Rekrut, Maurice; Barz, Michael; Sonntag, Daniel Speech Imagery BCI Training Using Game with a Purpose Inproceedings Proceedings of the 2024 International Conference on Advanced Visual Interfaces, pp. 1-5, Association for Computing Machinery, 2024. @inproceedings{14962, title = {Speech Imagery BCI Training Using Game with a Purpose}, author = {Abdulrahman Mohamed Selim and Maurice Rekrut and Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/14962_avi2024-4.pdf}, doi = {https://doi.org/10.1145/3656650.3656654}, year = {2024}, date = {2024-06-01}, booktitle = {Proceedings of the 2024 International Conference on Advanced Visual Interfaces}, pages = {1-5}, publisher = {Association for Computing Machinery}, abstract = {Games are used in multiple fields of brain-computer interface (BCI) research and applications to improve participants’ engagement and enjoyment during electroencephalogram (EEG) data collection. However, despite potential benefits, no current studies have reported on implemented games for Speech Imagery BCI. Imagined speech is speech produced without audible sounds or active movement of the articulatory muscles. Collecting imagined speech EEG data is a time-consuming, mentally exhausting, and cumbersome process, which requires participants to read words off a computer screen and produce them as imagined speech. To improve this process for study participants, we implemented a maze-like game where a participant navigated a virtual robot capable of performing five actions that represented our words of interest while we recorded their EEG data. The study setup was evaluated with 15 participants. Based on their feedback, the game improved their engagement and enjoyment while resulting in a 69.10% average classification accuracy using a random forest classifier.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Games are used in multiple fields of brain-computer interface (BCI) research and applications to improve participants’ engagement and enjoyment during electroencephalogram (EEG) data collection. However, despite potential benefits, no current studies have reported on implemented games for Speech Imagery BCI. Imagined speech is speech produced without audible sounds or active movement of the articulatory muscles. Collecting imagined speech EEG data is a time-consuming, mentally exhausting, and cumbersome process, which requires participants to read words off a computer screen and produce them as imagined speech. To improve this process for study participants, we implemented a maze-like game where a participant navigated a virtual robot capable of performing five actions that represented our words of interest while we recorded their EEG data. The study setup was evaluated with 15 participants. Based on their feedback, the game improved their engagement and enjoyment while resulting in a 69.10% average classification accuracy using a random forest classifier. |
Liang, Siting; Sánchez, Pablo Valdunciel; Sonntag, Daniel Optimizing Relation Extraction in Medical Texts through Active Learning: A Comparative Analysis of Trade-offs Inproceedings Association for Computational Linguistics, ACL Anthology, 2024. @inproceedings{14721, title = {Optimizing Relation Extraction in Medical Texts through Active Learning: A Comparative Analysis of Trade-offs}, author = {Siting Liang and Pablo Valdunciel Sánchez and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/14721_Active_Learning_For_Medical_Relation_Extraction_(camera-ready).pdf https://openreview.net/pdf?id=RtQL9gTsqr}, year = {2024}, date = {2024-01-01}, booktitle = {Association for Computational Linguistics}, publisher = {ACL Anthology}, abstract = {Our work explores the effectiveness of employing Clinical BERT for Relation Extraction (RE) tasks in medical texts within an Active Learning (AL) framework. Our main objective is to optimize RE in medical texts through AL while examining the trade-offs between performance and computation time, comparing it with alternative methods like Random Forest and BiLSTM networks. Comparisons extend to feature engineering requirements, performance metrics, and considerations of annotation costs, including AL step times and annotation rates. The utilization of AL strategies aligns with our broader goal of enhancing the efficiency of relation classification models, particularly when dealing with the challenges of annotating complex medical texts in a Human-in-the-Loop (HITL) setting. The results indicate that uncertainty-based sampling achieves comparable performance with significantly fewer annotated samples across three categories of supervised learning methods, thereby reducing annotation costs for clinical and biomedical corpora. While Clinical BERT exhibits clear performance advantages across two different corpora, the trade-off involves longer computation times in interactive annotation processes. In real-world applications, where practical feasibility and timely results are crucial, optimizing this trade-off becomes imperative.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Our work explores the effectiveness of employing Clinical BERT for Relation Extraction (RE) tasks in medical texts within an Active Learning (AL) framework. Our main objective is to optimize RE in medical texts through AL while examining the trade-offs between performance and computation time, comparing it with alternative methods like Random Forest and BiLSTM networks. Comparisons extend to feature engineering requirements, performance metrics, and considerations of annotation costs, including AL step times and annotation rates. The utilization of AL strategies aligns with our broader goal of enhancing the efficiency of relation classification models, particularly when dealing with the challenges of annotating complex medical texts in a Human-in-the-Loop (HITL) setting. The results indicate that uncertainty-based sampling achieves comparable performance with significantly fewer annotated samples across three categories of supervised learning methods, thereby reducing annotation costs for clinical and biomedical corpora. While Clinical BERT exhibits clear performance advantages across two different corpora, the trade-off involves longer computation times in interactive annotation processes. In real-world applications, where practical feasibility and timely results are crucial, optimizing this trade-off becomes imperative. |
Kath, Hannes; Serafini, Patricia P; Campos, Ivan Braga; Gouvea, Thiago; Sonntag, Daniel Leveraging Transfer Learning and Active Learning for Sound Event Detection in Passive Acoustic Monitoring of Wildlife Inproceedings 3rd Annual AAAI Workshop on AI to Accelerate Science and Engineering, o.A., 2024. @inproceedings{14737, title = {Leveraging Transfer Learning and Active Learning for Sound Event Detection in Passive Acoustic Monitoring of Wildlife}, author = {Hannes Kath and Patricia P Serafini and Ivan Braga Campos and Thiago Gouvea and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/14737_Kath_et_al_2024_Leveraging_Transfer_Learning_and_Active_Learning_for_Sound_Event_Detection_in.pdf}, year = {2024}, date = {2024-01-01}, booktitle = {3rd Annual AAAI Workshop on AI to Accelerate Science and Engineering}, publisher = {o.A.}, abstract = {Passive Acoustic Monitoring (PAM) has emerged as a pivotal technology for wildlife monitoring, generating vast amounts of acoustic data. However, the successful application of machine learning methods for sound event detection in PAM datasets heavily relies on the availability of annotated data, which can be laborious to acquire. In this study, we investigate the effectiveness of transfer learning and active learning techniques to address the data annotation challenge in PAM. Transfer learning allows us to use pre-trained models from related tasks or datasets to bootstrap the learning process for sound event detection. Furthermore, active learning promises strategic selection of the most informative samples for annotation, effectively reducing the annotation cost and improving model performance. We evaluate an approach that combines transfer learning and active learning to efficiently exploit existing annotated data and optimize the annotation process for PAM datasets. Our transfer learning observations show that embeddings produced by BirdNet, a model trained on high signal-to-noise recordings of bird vocalisations, can be effectively used for predicting anurans in PAM data: a linear classifier constructed using these embeddings outperforms the benchmark by 21.7%. Our results indicate that active learning is superior to random sampling, although no clear winner emerges among the strategies employed. The proposed method holds promise for facilitating broader adoption of machine learning techniques in PAM and advancing our understanding of biodiversity dynamics through acoustic data analysis.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Passive Acoustic Monitoring (PAM) has emerged as a pivotal technology for wildlife monitoring, generating vast amounts of acoustic data. However, the successful application of machine learning methods for sound event detection in PAM datasets heavily relies on the availability of annotated data, which can be laborious to acquire. In this study, we investigate the effectiveness of transfer learning and active learning techniques to address the data annotation challenge in PAM. Transfer learning allows us to use pre-trained models from related tasks or datasets to bootstrap the learning process for sound event detection. Furthermore, active learning promises strategic selection of the most informative samples for annotation, effectively reducing the annotation cost and improving model performance. We evaluate an approach that combines transfer learning and active learning to efficiently exploit existing annotated data and optimize the annotation process for PAM datasets. Our transfer learning observations show that embeddings produced by BirdNet, a model trained on high signal-to-noise recordings of bird vocalisations, can be effectively used for predicting anurans in PAM data: a linear classifier constructed using these embeddings outperforms the benchmark by 21.7%. Our results indicate that active learning is superior to random sampling, although no clear winner emerges among the strategies employed. The proposed method holds promise for facilitating broader adoption of machine learning techniques in PAM and advancing our understanding of biodiversity dynamics through acoustic data analysis. |
Lüers, Bengt; Serafini, Patricia P; Campos, Ivan Braga; Gouvea, Thiago; Sonntag, Daniel BirdNET-Annotator: AI-Assisted Strong Labelling of Bird Sound Datasets Inproceedings 3rd Annual AAAI Workshop on AI to Accelerate Science and Engineering, o.A., 2024. @inproceedings{14738, title = {BirdNET-Annotator: AI-Assisted Strong Labelling of Bird Sound Datasets}, author = {Bengt Lüers and Patricia P Serafini and Ivan Braga Campos and Thiago Gouvea and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/14738_Lueers_et_al_2024_BirdNET-Annotator.pdf}, year = {2024}, date = {2024-01-01}, booktitle = {3rd Annual AAAI Workshop on AI to Accelerate Science and Engineering}, publisher = {o.A.}, abstract = {Monitoring biodiversity in biosphere reserves is challenging due to the vast regions to be monitored. Thus, conservation- ists have resorted to employing passive acoustic monitoring (PAM), which automates the audio recording process. PAM can create large, unlabeled datasets, but deriving knowledge from such recordings is usually still done manually. Machine learning enables the detection of vocalizations of species automatically, allowing summarizing the biodiversity in an area in terms of species richness. While pre-trained neu- ral network models for bird vocalization detection exist, they are often not-reliable enough to do way with the need for manual labeling of audio files. In this paper, we present BirdNET-Annotator, a tool for AI- assisted labeling of audio datasets co-developed by ecoacous- tics and ML experts. BirdNET-Annotator runs in the cloud free of charge, enabling end users to scale beyond the limita- tions of their local hardware. We evaluated the performance of our solution in the context of its intended workflow and found a reduction in annotation times. While our results show that our application now meets the user requirements, there are still opportunities to seize for additional performance and usability improvement. Our application illustrates how large, pre-trained neural mod- els can be integrated into the workflow of domain experts when packaged in a user-friendly manner. We observe that although our solution adds a step to the preexisting workflow, the overall annotation speed is significantly improved. This hints at further improvement to be realized in the future by consolidating more steps of the workflow into fewer tools.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Monitoring biodiversity in biosphere reserves is challenging due to the vast regions to be monitored. Thus, conservation- ists have resorted to employing passive acoustic monitoring (PAM), which automates the audio recording process. PAM can create large, unlabeled datasets, but deriving knowledge from such recordings is usually still done manually. Machine learning enables the detection of vocalizations of species automatically, allowing summarizing the biodiversity in an area in terms of species richness. While pre-trained neu- ral network models for bird vocalization detection exist, they are often not-reliable enough to do way with the need for manual labeling of audio files. In this paper, we present BirdNET-Annotator, a tool for AI- assisted labeling of audio datasets co-developed by ecoacous- tics and ML experts. BirdNET-Annotator runs in the cloud free of charge, enabling end users to scale beyond the limita- tions of their local hardware. We evaluated the performance of our solution in the context of its intended workflow and found a reduction in annotation times. While our results show that our application now meets the user requirements, there are still opportunities to seize for additional performance and usability improvement. Our application illustrates how large, pre-trained neural mod- els can be integrated into the workflow of domain experts when packaged in a user-friendly manner. We observe that although our solution adds a step to the preexisting workflow, the overall annotation speed is significantly improved. This hints at further improvement to be realized in the future by consolidating more steps of the workflow into fewer tools. |
Troshani, Ilira; Gouvea, Thiago; Sonntag, Daniel Leveraging Sound Collections for Animal Species Classification with Weakly Supervised Learning Inproceedings 3rd Annual AAAI Workshop on AI to Accelerate Science and Engineering, o.A., 2024. @inproceedings{14739, title = {Leveraging Sound Collections for Animal Species Classification with Weakly Supervised Learning}, author = {Ilira Troshani and Thiago Gouvea and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/14739_Troshani_et_al_2024_Leveraging_Sound_Collections_for_Animal_Species_Classification_with_Weakly.pdf}, year = {2024}, date = {2024-01-01}, booktitle = {3rd Annual AAAI Workshop on AI to Accelerate Science and Engineering}, publisher = {o.A.}, abstract = {The utilization of Passive Acoustic Monitoring (PAM) for wildlife monitoring remains hindered by the challenge of data analysis. While numerous supervised ML algorithms exist, their application is constrained by the scarcity of annotated data. Expert-curated sound collections are valuable knowl- edge sources that could bridge this gap. However, their uti- lization is hindered by the sporadic sounds to be identified in these recordings. In this study, we propose a weakly su- pervised approach to tackle this challenge and assess its per- formance using the AnuraSet dataset. We employ TALNet, a Convolutional Recurrent Neural Network (CRNN) model and train it on 60-second sound recordings labeled for the presence of 42 different anuran species. We conduct the eval- uation on 1-second segments, enabling precise sound event localization. Furthermore, we investigate the impact of vary- ing the length of the training input and explore different pool- ing functions’ effects on TALNet’s performance on AnuraSet. Our findings demonstrate the effectiveness of TALNet in har- nessing weakly annotated sound collections for wildlife mon- itoring.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The utilization of Passive Acoustic Monitoring (PAM) for wildlife monitoring remains hindered by the challenge of data analysis. While numerous supervised ML algorithms exist, their application is constrained by the scarcity of annotated data. Expert-curated sound collections are valuable knowl- edge sources that could bridge this gap. However, their uti- lization is hindered by the sporadic sounds to be identified in these recordings. In this study, we propose a weakly su- pervised approach to tackle this challenge and assess its per- formance using the AnuraSet dataset. We employ TALNet, a Convolutional Recurrent Neural Network (CRNN) model and train it on 60-second sound recordings labeled for the presence of 42 different anuran species. We conduct the eval- uation on 1-second segments, enabling precise sound event localization. Furthermore, we investigate the impact of vary- ing the length of the training input and explore different pool- ing functions’ effects on TALNet’s performance on AnuraSet. Our findings demonstrate the effectiveness of TALNet in har- nessing weakly annotated sound collections for wildlife mon- itoring. |
Liang, Siting; Profitlich, Hans-Jürgen; Klass, Maximilian; Möller-Grell, Niko; Bergmann, Celine-Fabienne; Heim, Simon; Niklas, Christian; Sonntag, Daniel Building A German Clinical Named Entity Recognition System without In-domain Training Data Inproceedings Association for Computational Linguistics, ACL Anthology, 2024. @inproceedings{14838, title = {Building A German Clinical Named Entity Recognition System without In-domain Training Data}, author = {Siting Liang and Hans-Jürgen Profitlich and Maximilian Klass and Niko Möller-Grell and Celine-Fabienne Bergmann and Simon Heim and Christian Niklas and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/14838_Building_A_German_NER_without_In_domain_Training_Data__clinical_NLP__(1).pdf}, year = {2024}, date = {2024-01-01}, booktitle = {Association for Computational Linguistics}, publisher = {ACL Anthology}, abstract = {Clinical Named Entity Recognition (NER) is essential for extracting important medical insights from clinical narratives. Given the challenges in obtaining expert training datasets for real-world clinical applications related to data protection regulations and the lack of standardised entity types, this work represents a collaborative initiative aimed at building a German clinical NER system with a focus on addressing these obstacles effectively. In response to the challenge of training data scarcity, we propose a textbfConditional Relevance Learning (CRL) approach in low-resource transfer learning scenarios. textbfCRL effectively leverages a pre-trained language model and domain-specific open resources, enabling the acquisition of a robust base model tailored for clinical NER tasks, particularly in the face of changing label sets. This flexibility empowers the implementation of a textbfMultilayered Semantic Annotation (MSA) schema in our NER system, capable of organizing a diverse array of entity types, thus significantly boosting the NER system's adaptability and utility across various clinical domains. In the case study, we demonstrate how our NER system can be applied to overcome resource constraints and comply with data privacy regulations. Lacking prior training on in-domain data, feedback from expert users in respective domains is essential in identifying areas for system refinement. Future work will focus on the integration of expert feedback to improve system performance in specific clinical contexts.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Clinical Named Entity Recognition (NER) is essential for extracting important medical insights from clinical narratives. Given the challenges in obtaining expert training datasets for real-world clinical applications related to data protection regulations and the lack of standardised entity types, this work represents a collaborative initiative aimed at building a German clinical NER system with a focus on addressing these obstacles effectively. In response to the challenge of training data scarcity, we propose a textbfConditional Relevance Learning (CRL) approach in low-resource transfer learning scenarios. textbfCRL effectively leverages a pre-trained language model and domain-specific open resources, enabling the acquisition of a robust base model tailored for clinical NER tasks, particularly in the face of changing label sets. This flexibility empowers the implementation of a textbfMultilayered Semantic Annotation (MSA) schema in our NER system, capable of organizing a diverse array of entity types, thus significantly boosting the NER system's adaptability and utility across various clinical domains. In the case study, we demonstrate how our NER system can be applied to overcome resource constraints and comply with data privacy regulations. Lacking prior training on in-domain data, feedback from expert users in respective domains is essential in identifying areas for system refinement. Future work will focus on the integration of expert feedback to improve system performance in specific clinical contexts. |
Miscellaneous |
Kadir, Md Abdul; Alam, Hasan Md Tusfiqur; Maul, Pascale; Profitlich, Hans-Jürgen; Wolf, Moritz; Sonntag, Daniel Modular Deep Active Learning Framework for Image Annotation: A Technical Report for the Ophthalmo-AI Project Miscellaneous 2024. @misc{14772, title = {Modular Deep Active Learning Framework for Image Annotation: A Technical Report for the Ophthalmo-AI Project}, author = {Md Abdul Kadir and Hasan Md Tusfiqur Alam and Pascale Maul and Hans-Jürgen Profitlich and Moritz Wolf and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/14772_2403.15143.pdf https://arxiv.org/abs/2403.15143}, year = {2024}, date = {2024-03-01}, abstract = {Image annotation is one of the most essential tasks for guaranteeing proper treatment for patients and tracking progress over the course of therapy in the field of medical imaging and disease diagnosis. However, manually annotating a lot of 2D and 3D imaging data can be extremely tedious. Deep Learning (DL) based segmentation algorithms have completely transformed this process and made it possible to automate image segmentation. By accurately segmenting medical images, these algorithms can greatly minimize the time and effort necessary for manual annotation. Additionally, by incorporating Active Learning (AL) methods, these segmentation algorithms can perform far more effectively with a smaller amount of ground truth data. We introduce MedDeepCyleAL, an end-to-end framework implementing the complete AL cycle. It provides researchers with the flexibility to choose the type of deep learning model they wish to employ and includes an annotation tool that supports the classification and segmentation of medical images. The user-friendly interface allows for easy alteration of the AL and DL model settings through a configuration file, requiring no prior programming experience. While MedDeepCyleAL can be applied to any kind of image data, we have specifically applied it to ophthalmology data in this project.}, keywords = {}, pubstate = {published}, tppubtype = {misc} } Image annotation is one of the most essential tasks for guaranteeing proper treatment for patients and tracking progress over the course of therapy in the field of medical imaging and disease diagnosis. However, manually annotating a lot of 2D and 3D imaging data can be extremely tedious. Deep Learning (DL) based segmentation algorithms have completely transformed this process and made it possible to automate image segmentation. By accurately segmenting medical images, these algorithms can greatly minimize the time and effort necessary for manual annotation. Additionally, by incorporating Active Learning (AL) methods, these segmentation algorithms can perform far more effectively with a smaller amount of ground truth data. We introduce MedDeepCyleAL, an end-to-end framework implementing the complete AL cycle. It provides researchers with the flexibility to choose the type of deep learning model they wish to employ and includes an annotation tool that supports the classification and segmentation of medical images. The user-friendly interface allows for easy alteration of the AL and DL model settings through a configuration file, requiring no prior programming experience. While MedDeepCyleAL can be applied to any kind of image data, we have specifically applied it to ophthalmology data in this project. |
2023 |
Journal Articles |
Tabrizchi, Hamed; Razmara, Jafar; Mosavi, Amirhosein Thermal prediction for energy management of clouds using a hybrid model based on CNN and stacking multi-layer bi-directional LSTM Journal Article Energy Reports, 9 , pp. 2253-2268, 2023. @article{12997, title = {Thermal prediction for energy management of clouds using a hybrid model based on CNN and stacking multi-layer bi-directional LSTM}, author = {Hamed Tabrizchi and Jafar Razmara and Amirhosein Mosavi}, year = {2023}, date = {2023-12-01}, journal = {Energy Reports}, volume = {9}, pages = {2253-2268}, publisher = {Elsevier}, abstract = {The fast advancement of technology and developers’ utilization of data centers have dramatically increased energy usage in today’s society. Thermal control is a key issue in hyper-scale cloud data centers. Hotspots form when the temperature of the host rises, increasing cooling costs and affecting dependability. Precise estimation of host temperatures is critical for optimal resource management. Thermal changes in the data center make estimating temperature a difficult challenge. Existing temperature estimating algorithms are ineffective due to their processing complexity as well as lack of accuracy. Regarding that data-driven approaches seem promising for temperature prediction, this research offers a unique efficient temperature prediction model. The model uses a combination of convolutional neural networks (CNN) and stacking multi-layer bi-directional long-term short memory (BiLSTM) for thermal prediction. The findings of the experiments reveal that the model successfully anticipates the temperature with the highest value of 97.15% and the lowest error rate of RMSE value of 0.2892, and an RMAE of 0.5003, which decreases the projection error as opposed to the other method.}, keywords = {}, pubstate = {published}, tppubtype = {article} } The fast advancement of technology and developers’ utilization of data centers have dramatically increased energy usage in today’s society. Thermal control is a key issue in hyper-scale cloud data centers. Hotspots form when the temperature of the host rises, increasing cooling costs and affecting dependability. Precise estimation of host temperatures is critical for optimal resource management. Thermal changes in the data center make estimating temperature a difficult challenge. Existing temperature estimating algorithms are ineffective due to their processing complexity as well as lack of accuracy. Regarding that data-driven approaches seem promising for temperature prediction, this research offers a unique efficient temperature prediction model. The model uses a combination of convolutional neural networks (CNN) and stacking multi-layer bi-directional long-term short memory (BiLSTM) for thermal prediction. The findings of the experiments reveal that the model successfully anticipates the temperature with the highest value of 97.15% and the lowest error rate of RMSE value of 0.2892, and an RMAE of 0.5003, which decreases the projection error as opposed to the other method. |
Kopácsi, László; Baffy, Benjámin; Baranyi, Gábor; Skaf, Joul; Sörös, Gábor; Szeier, Szilvia; andrincz, András Lő Sonntag, Daniel Cross-Viewpoint Semantic Mapping: Integrating Human and Robot Perspectives for Improved 3D Semantic Reconstruction Journal Article Sensors - Open Access Journal, 23 , pp. 1-17, 2023. @article{14703, title = {Cross-Viewpoint Semantic Mapping: Integrating Human and Robot Perspectives for Improved 3D Semantic Reconstruction}, author = {László Kopácsi and Benjámin Baffy and Gábor Baranyi and Joul Skaf and Gábor Sörös and Szilvia Szeier and András Lő andrincz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/14703_sensors-23-05126.pdf}, doi = {https://doi.org/10.3390/s23115126}, year = {2023}, date = {2023-05-01}, journal = {Sensors - Open Access Journal}, volume = {23}, pages = {1-17}, publisher = {MDPI}, abstract = {Allocentric semantic 3D maps are highly useful for a variety of human–machine interaction related tasks since egocentric viewpoints can be derived by the machine for the human partner. Class labels and map interpretations, however, may differ or could be missing for the participants due to the different perspectives. Particularly, when considering the viewpoint of a small robot, which significantly differs from the viewpoint of a human. In order to overcome this issue, and to establish common ground, we extend an existing real-time 3D semantic reconstruction pipeline with semantic matching across human and robot viewpoints. We use deep recognition networks, which usually perform well from higher (i.e., human) viewpoints but are inferior from lower viewpoints, such as that of a small robot. We propose several approaches for acquiring semantic labels for images taken from unusual perspectives. We start with a partial 3D semantic reconstruction from the human perspective that we transfer and adapt to the small robot’s perspective using superpixel segmentation and the geometry of the surroundings. The quality of the reconstruction is evaluated in the Habitat simulator and a real environment using a robot car with an RGBD camera. We show that the proposed approach provides high-quality semantic segmentation from the robot’s perspective, with accuracy comparable to the original one. In addition, we exploit the gained information and improve the recognition performance of the deep network for the lower viewpoints and show that the small robot alone is capable of generating high-quality semantic maps for the human partner. The computations are close to real-time, so the approach enables interactive applications.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Allocentric semantic 3D maps are highly useful for a variety of human–machine interaction related tasks since egocentric viewpoints can be derived by the machine for the human partner. Class labels and map interpretations, however, may differ or could be missing for the participants due to the different perspectives. Particularly, when considering the viewpoint of a small robot, which significantly differs from the viewpoint of a human. In order to overcome this issue, and to establish common ground, we extend an existing real-time 3D semantic reconstruction pipeline with semantic matching across human and robot viewpoints. We use deep recognition networks, which usually perform well from higher (i.e., human) viewpoints but are inferior from lower viewpoints, such as that of a small robot. We propose several approaches for acquiring semantic labels for images taken from unusual perspectives. We start with a partial 3D semantic reconstruction from the human perspective that we transfer and adapt to the small robot’s perspective using superpixel segmentation and the geometry of the surroundings. The quality of the reconstruction is evaluated in the Habitat simulator and a real environment using a robot car with an RGBD camera. We show that the proposed approach provides high-quality semantic segmentation from the robot’s perspective, with accuracy comparable to the original one. In addition, we exploit the gained information and improve the recognition performance of the deep network for the lower viewpoints and show that the small robot alone is capable of generating high-quality semantic maps for the human partner. The computations are close to real-time, so the approach enables interactive applications. |
Sonntag, Daniel Avoid Predatory Journals Journal Article KI - Künstliche Intelligenz, German Journal on Artificial Intelligence - Organ des Fachbereiches "Künstliche Intelligenz" der Gesellschaft für Informatik e.V., 37 , pp. 1-3, 2023. @article{14707, title = {Avoid Predatory Journals}, author = {Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/14707_avoid_predatory_journals.pdf https://scholar.google.de/citations?view_op=view_citation&hl=en&user=v7i6Uz4AAAAJ&sortby=pubdate&citation_for_view=v7i6Uz4AAAAJ:0izLItjtcgwC}, year = {2023}, date = {2023-05-01}, journal = {KI - Künstliche Intelligenz, German Journal on Artificial Intelligence - Organ des Fachbereiches "Künstliche Intelligenz" der Gesellschaft für Informatik e.V.}, volume = {37}, pages = {1-3}, publisher = {Springer}, abstract = {o.A.}, keywords = {}, pubstate = {published}, tppubtype = {article} } o.A. |
Gholami, Mahsa; Ghanbari-Adivi, Elham; Ehteram, Mohammad; Singh, Vijay P; Ahmed, Ali Najah; Mosavi, Amirhosein; El-Shafie, Ahmed Predicting longitudinal dispersion coefficient using ensemble models and optimized multi-layer perceptron models Journal Article Ain Shams Engineering Journal, 10 , pp. 2253-2277, 2023. @article{13154, title = {Predicting longitudinal dispersion coefficient using ensemble models and optimized multi-layer perceptron models}, author = {Mahsa Gholami and Elham Ghanbari-Adivi and Mohammad Ehteram and Vijay P Singh and Ali Najah Ahmed and Amirhosein Mosavi and Ahmed El-Shafie}, year = {2023}, date = {2023-04-01}, journal = {Ain Shams Engineering Journal}, volume = {10}, pages = {2253-2277}, publisher = {Elsevier}, abstract = {Prediction of the longitudinal dispersion coefficient (LDC) is essential for the river and water resources engineering and environmental management. This study proposes ensemble models for predicting LDC based on multilayer perceptron (MULP) methods and optimization algorithms. The honey badger optimization algorithm (HBOA), salp swarm algorithm (SASA), firefly algorithm (FIFA), and particle swarm optimization algorithm (PASOA) are used to adjust the MULP parameters. Then, the outputs of the MULP-HBOA, MULP-SASA, MULP-PASOA, MULP-FIFA, and MULP models were incorporated into an inclusive multiple model (IMM). For IMM at the testing level, the mean absolute error (MEAE) was 15, whereas it was 17, 18, 23, 24, and 25 for the MULP-HBOA, MULP-SASA, MULP-FIFA, MULP-PASOA, and MULP models. The study also modified the structure of MULP models using a goodness factor which decreased the CPU time. Removing redundant neurons reduces CPU time. Thus, the modified ANN model and the suggested IMM model can decrease the computational time and further improve the performance of models}, keywords = {}, pubstate = {published}, tppubtype = {article} } Prediction of the longitudinal dispersion coefficient (LDC) is essential for the river and water resources engineering and environmental management. This study proposes ensemble models for predicting LDC based on multilayer perceptron (MULP) methods and optimization algorithms. The honey badger optimization algorithm (HBOA), salp swarm algorithm (SASA), firefly algorithm (FIFA), and particle swarm optimization algorithm (PASOA) are used to adjust the MULP parameters. Then, the outputs of the MULP-HBOA, MULP-SASA, MULP-PASOA, MULP-FIFA, and MULP models were incorporated into an inclusive multiple model (IMM). For IMM at the testing level, the mean absolute error (MEAE) was 15, whereas it was 17, 18, 23, 24, and 25 for the MULP-HBOA, MULP-SASA, MULP-FIFA, MULP-PASOA, and MULP models. The study also modified the structure of MULP models using a goodness factor which decreased the CPU time. Removing redundant neurons reduces CPU time. Thus, the modified ANN model and the suggested IMM model can decrease the computational time and further improve the performance of models |
Hai, Tao; Sayed, Biju Theruvil; Majdi, Ali; Zhou, Jincheng; Sagban, Rafid; Band, Shahab S; Mosavi, Amirhosein An integrated GIS-based multivariate adaptive regression splines-cat swarm optimization for improving the accuracy of wildfire susceptibility mapping Journal Article Geocarto International, 38 , pp. 1-25, 2023. @article{13025, title = {An integrated GIS-based multivariate adaptive regression splines-cat swarm optimization for improving the accuracy of wildfire susceptibility mapping}, author = {Tao Hai and Biju Theruvil Sayed and Ali Majdi and Jincheng Zhou and Rafid Sagban and Shahab S Band and Amirhosein Mosavi}, year = {2023}, date = {2023-01-01}, journal = {Geocarto International}, volume = {38}, pages = {1-25}, publisher = {Taylor & Francis}, abstract = {A hybrid machine learning method is proposed for wildfire susceptibility mapping. For modeling a geographical information system (GIS) database including 11 influencing factors and 262 fire locations from 2013 to 2018 is used for developing an integrated multivariate adaptive regression splines (MARS). The cat swarm optimization (CSO) algorithm tunes the parameters of the MARS in order to generate accurate susceptibility maps. From the Pearson correlation results, it is observed that land use, temperature, and slope angle have strong correlation with the fire severity. The results demonstrate that the prediction capability of the MARS-CSO model outperforms model tree, reduced error pruning tree and MARS. The resulting wildfire risk map using MARS-CSO reveals that 20% of the study areas is categorized in the very low wildfire risk class, whereas 40% is under the very high class of fire hazard.}, keywords = {}, pubstate = {published}, tppubtype = {article} } A hybrid machine learning method is proposed for wildfire susceptibility mapping. For modeling a geographical information system (GIS) database including 11 influencing factors and 262 fire locations from 2013 to 2018 is used for developing an integrated multivariate adaptive regression splines (MARS). The cat swarm optimization (CSO) algorithm tunes the parameters of the MARS in order to generate accurate susceptibility maps. From the Pearson correlation results, it is observed that land use, temperature, and slope angle have strong correlation with the fire severity. The results demonstrate that the prediction capability of the MARS-CSO model outperforms model tree, reduced error pruning tree and MARS. The resulting wildfire risk map using MARS-CSO reveals that 20% of the study areas is categorized in the very low wildfire risk class, whereas 40% is under the very high class of fire hazard. |
Mirhashemi, Hengameh; Heydari, Mehdi; Karami, Omid; Ahmadi, Kourosh; Mosavi, Amirhosein Modeling Climate Change Effects on the Distribution of Oak Forests with Machine Learning Journal Article Forests, 14 , pp. 13220-13233, 2023. @article{13155, title = {Modeling Climate Change Effects on the Distribution of Oak Forests with Machine Learning}, author = {Hengameh Mirhashemi and Mehdi Heydari and Omid Karami and Kourosh Ahmadi and Amirhosein Mosavi}, year = {2023}, date = {2023-01-01}, journal = {Forests}, volume = {14}, pages = {13220-13233}, publisher = {MDPI}, abstract = {The present study models the effect of climate change on the distribution of Persian oak (Quercus brantii Lindl.) in the Zagros forests, located in the west of Iran. The modeling is conducted under the current and future climatic conditions by fitting the machine learning method of the Bayesian additive regression tree (BART). For the anticipation of the potential habitats for the Persian oak, two general circulation models (GCMs) of CCSM4 and HADGEM2-ES under the representative concentration pathways (RCPs) of 2.6 and 8.5 for 2050 and 2070 are used. The mean temperature (MT) of the wettest quarter (bio8), solar radiation, slope and precipitation of the wettest month (bio13) are respectively reported as the most important variables in the modeling. The results indicate that the suitable habitat of Persian oak will significantly decrease in the future under both climate change scenarios as much as 75.06% by 2070. The proposed study brings insight into the current condition and further projects the future conditions of the local forests for proper management and protection of endangered ecosystems.}, keywords = {}, pubstate = {published}, tppubtype = {article} } The present study models the effect of climate change on the distribution of Persian oak (Quercus brantii Lindl.) in the Zagros forests, located in the west of Iran. The modeling is conducted under the current and future climatic conditions by fitting the machine learning method of the Bayesian additive regression tree (BART). For the anticipation of the potential habitats for the Persian oak, two general circulation models (GCMs) of CCSM4 and HADGEM2-ES under the representative concentration pathways (RCPs) of 2.6 and 8.5 for 2050 and 2070 are used. The mean temperature (MT) of the wettest quarter (bio8), solar radiation, slope and precipitation of the wettest month (bio13) are respectively reported as the most important variables in the modeling. The results indicate that the suitable habitat of Persian oak will significantly decrease in the future under both climate change scenarios as much as 75.06% by 2070. The proposed study brings insight into the current condition and further projects the future conditions of the local forests for proper management and protection of endangered ecosystems. |
Altmeyer, Kristin; Barz, Michael; Lauer, Luisa; Peschel, Markus; Sonntag, Daniel; Brünken, Roland; Malone, Sarah Digital ink and differentiated subjective ratings for cognitive load measurement in middle childhood Journal Article British Journal of Educational Psychology, n/a , pp. 18, 2023. @article{13195, title = {Digital ink and differentiated subjective ratings for cognitive load measurement in middle childhood}, author = {Kristin Altmeyer and Michael Barz and Luisa Lauer and Markus Peschel and Daniel Sonntag and Roland Brünken and Sarah Malone}, url = {https://www.dfki.de/fileadmin/user_upload/import/13195_Brit_J_of_Edu_Psychol_-_2023_-_Altmeyer_-_Digital_ink_and_differentiated_subjective_ratings_for_cognitive_load_measurement.pdf https://bpspsychub.onlinelibrary.wiley.com/doi/abs/10.1111/bjep.12595}, year = {2023}, date = {2023-01-01}, journal = {British Journal of Educational Psychology}, volume = {n/a}, pages = {18}, publisher = {John Wiley & Sons, Ltd}, abstract = {Abstract Background New methods are constantly being developed to adapt cognitive load measurement to different contexts. However, research on middle childhood students' cognitive load measurement is rare. Research indicates that the three cognitive load dimensions (intrinsic, extraneous, and germane) can be measured well in adults and teenagers using differentiated subjective rating instruments. Moreover, digital ink recorded by smartpens could serve as an indicator for cognitive load in adults. Aims With the present research, we aimed at investigating the relation between subjective cognitive load ratings, velocity and pressure measures recorded with a smartpen, and performance in standardized sketching tasks in middle childhood students. Sample Thirty-six children (age 7–12) participated at the university's laboratory. Methods The children performed two standardized sketching tasks, each in two versions. The induced intrinsic cognitive load or the extraneous cognitive load was varied between the versions. Digital ink was recorded while the children drew with a smartpen on real paper and after each task, they were asked to report their perceived intrinsic and extraneous cognitive load using a newly developed 5-item scale. Results Results indicated that cognitive load ratings as well as velocity and pressure measures were substantially related to the induced cognitive load and to performance in both sketching tasks. However, cognitive load ratings and smartpen measures were not substantially related. Conclusions Both subjective rating and digital ink hold potential for cognitive load and performance measurement. However, it is questionable whether they measure the exact same constructs.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Abstract Background New methods are constantly being developed to adapt cognitive load measurement to different contexts. However, research on middle childhood students' cognitive load measurement is rare. Research indicates that the three cognitive load dimensions (intrinsic, extraneous, and germane) can be measured well in adults and teenagers using differentiated subjective rating instruments. Moreover, digital ink recorded by smartpens could serve as an indicator for cognitive load in adults. Aims With the present research, we aimed at investigating the relation between subjective cognitive load ratings, velocity and pressure measures recorded with a smartpen, and performance in standardized sketching tasks in middle childhood students. Sample Thirty-six children (age 7–12) participated at the university's laboratory. Methods The children performed two standardized sketching tasks, each in two versions. The induced intrinsic cognitive load or the extraneous cognitive load was varied between the versions. Digital ink was recorded while the children drew with a smartpen on real paper and after each task, they were asked to report their perceived intrinsic and extraneous cognitive load using a newly developed 5-item scale. Results Results indicated that cognitive load ratings as well as velocity and pressure measures were substantially related to the induced cognitive load and to performance in both sketching tasks. However, cognitive load ratings and smartpen measures were not substantially related. Conclusions Both subjective rating and digital ink hold potential for cognitive load and performance measurement. However, it is questionable whether they measure the exact same constructs. |
Book Chapters |
Kadir, Md Abdul; Addluri, Gowthamkrishna; Sonntag, Daniel Harmonizing Feature Attributions Across Deep Learning Architectures: Enhancing Interpretability and Consistency Book Chapter German Conference on Artificial Intelligence, pp. 90-97, Springer, Cham, 2023. @inbook{14630, title = {Harmonizing Feature Attributions Across Deep Learning Architectures: Enhancing Interpretability and Consistency}, author = {Md Abdul Kadir and Gowthamkrishna Addluri and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/14630_Harmonizing_Feature_Attributions_Across_Deep_Learning_Architectures__Enhancing_Interpretability_and_Consistency_KI2023.pdf https://rdcu.be/dvrE0}, year = {2023}, date = {2023-01-01}, booktitle = {German Conference on Artificial Intelligence}, pages = {90-97}, publisher = {Springer, Cham}, abstract = {Enhancing the interpretability and consistency of machine learning models is critical to their deployment in real-world applications. Feature attribution methods have gained significant attention, which provide local explanations of model predictions by attributing importance to individual input features. This study examines the generalization of feature attributions across various deep learning architectures, such as convolutional neural networks (CNNs) and vision transformers. We aim to assess the feasibility of utilizing a feature attribution method as a future detector and examine how these features can be harmonized across multiple models employing distinct architectures but trained on the same data distribution. By exploring this harmonization, we aim to develop a more coherent and optimistic understanding of feature attributions, enhancing the consistency of local explanations across diverse deep-learning models. Our findings highlight the potential for harmonized feature attribution methods to improve interpretability and foster trust in machine learning applications, regardless of the underlying architecture.}, keywords = {}, pubstate = {published}, tppubtype = {inbook} } Enhancing the interpretability and consistency of machine learning models is critical to their deployment in real-world applications. Feature attribution methods have gained significant attention, which provide local explanations of model predictions by attributing importance to individual input features. This study examines the generalization of feature attributions across various deep learning architectures, such as convolutional neural networks (CNNs) and vision transformers. We aim to assess the feasibility of utilizing a feature attribution method as a future detector and examine how these features can be harmonized across multiple models employing distinct architectures but trained on the same data distribution. By exploring this harmonization, we aim to develop a more coherent and optimistic understanding of feature attributions, enhancing the consistency of local explanations across diverse deep-learning models. Our findings highlight the potential for harmonized feature attribution methods to improve interpretability and foster trust in machine learning applications, regardless of the underlying architecture. |
Inproceedings |
Nguyen, Ho Minh Duy; Nguyen, Hoang; Diep, Nghiem T; Pham, Tan; Cao, Tri; Nguyen, Binh T; Swoboda, Paul; Ho, Nhat; Albarqouni, Shadi; Xie, Pengtao; Sonntag, Daniel; Niepert, Mathias LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching Inproceedings The Thirty-Seventh Annual Conference on Neural Information Processing Systems (NeurIPS 2023), Advances in Neural Information Processing Systems, 2023. @inproceedings{14309, title = {LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching}, author = {Ho Minh Duy Nguyen and Hoang Nguyen and Nghiem T Diep and Tan Pham and Tri Cao and Binh T Nguyen and Paul Swoboda and Nhat Ho and Shadi Albarqouni and Pengtao Xie and Daniel Sonntag and Mathias Niepert}, url = {https://www.dfki.de/fileadmin/user_upload/import/14309_LVM-Med_Camera_Version_2.pdf}, year = {2023}, date = {2023-12-01}, booktitle = {The Thirty-Seventh Annual Conference on Neural Information Processing Systems (NeurIPS 2023)}, publisher = {Advances in Neural Information Processing Systems}, abstract = {Obtaining large pre-trained models that can be fine-tuned to new tasks with limited annotated samples has remained an open challenge for medical imaging data. While pre-trained deep networks on ImageNet and vision-language foundation models trained on web-scale data are prevailing approaches, their effectiveness on medical tasks is limited due to the significant domain shift between natural and medical im- ages. To bridge this gap, we introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets, covering a large number of organs and modalities such as CT, MRI, X-ray, and Ultrasound. We benchmark several state-of-the-art self-supervised algorithms on this dataset and propose a novel self-supervised contrastive learning algorithm using a graph-matching formulation. The proposed approach makes three contributions: (i) it integrates prior pair-wise image similarity metrics based on local and global information; (ii) it captures the structural constraints of feature embeddings through a loss function constructed via a combinatorial graph-matching objective; and (iii) it can be trained efficiently end-to-end using modern gradient-estimation techniques for black-box solvers. We thoroughly evaluate the proposed LVM-Med on 15 downstream medical tasks ranging from segmentation and classification to object detection, and both for the in and out-of-distribution settings. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models. For challenging tasks such as Brain Tumor Classification or Diabetic Retinopathy Grading, LVM-Med improves previous vision-language models trained on 1 billion masks by 6-7% while using only a ResNet-50.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Obtaining large pre-trained models that can be fine-tuned to new tasks with limited annotated samples has remained an open challenge for medical imaging data. While pre-trained deep networks on ImageNet and vision-language foundation models trained on web-scale data are prevailing approaches, their effectiveness on medical tasks is limited due to the significant domain shift between natural and medical im- ages. To bridge this gap, we introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets, covering a large number of organs and modalities such as CT, MRI, X-ray, and Ultrasound. We benchmark several state-of-the-art self-supervised algorithms on this dataset and propose a novel self-supervised contrastive learning algorithm using a graph-matching formulation. The proposed approach makes three contributions: (i) it integrates prior pair-wise image similarity metrics based on local and global information; (ii) it captures the structural constraints of feature embeddings through a loss function constructed via a combinatorial graph-matching objective; and (iii) it can be trained efficiently end-to-end using modern gradient-estimation techniques for black-box solvers. We thoroughly evaluate the proposed LVM-Med on 15 downstream medical tasks ranging from segmentation and classification to object detection, and both for the in and out-of-distribution settings. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models. For challenging tasks such as Brain Tumor Classification or Diabetic Retinopathy Grading, LVM-Med improves previous vision-language models trained on 1 billion masks by 6-7% while using only a ResNet-50. |
Nguyen, Ho Minh Duy; Pham, Tan Ngoc; Diep, Nghiem Tuong; Phan, Nghi Quoc; Pham, Quang; Tong, Vinh; Nguyen, Binh T; Le, Ngan Hoang; Ho, Nhat; Xie, Pengtao; Sonntag, Daniel; Niepert, Mathias On the Out of Distribution Robustness of Foundation Models in Medical Image Segmentation Inproceedings The Thirty-Seventh Annual Conference on Neural Information Processing Systems (NeurIPS 2023), Advances in Neural Information Processing Systems, 2023. @inproceedings{14499, title = {On the Out of Distribution Robustness of Foundation Models in Medical Image Segmentation}, author = {Ho Minh Duy Nguyen and Tan Ngoc Pham and Nghiem Tuong Diep and Nghi Quoc Phan and Quang Pham and Vinh Tong and Binh T Nguyen and Ngan Hoang Le and Nhat Ho and Pengtao Xie and Daniel Sonntag and Mathias Niepert}, url = {https://www.dfki.de/fileadmin/user_upload/import/14499_52_on_the_out_of_distribution_rob.pdf}, year = {2023}, date = {2023-12-01}, booktitle = {The Thirty-Seventh Annual Conference on Neural Information Processing Systems (NeurIPS 2023)}, publisher = {Advances in Neural Information Processing Systems}, abstract = {Constructing a robust model that can effectively generalize to test samples under distribution shifts remains a significant challenge in the field of medical imaging. The foundational models for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach. It showcases impressive learning abilities across different tasks with the need for only a limited amount of annotated samples. While numerous techniques have focused on developing better fine-tuning strategies to adapt these models for specific domains, we instead examine their robustness to domain shifts in the medical image segmentation task. To this end, we compare the generalization performance to unseen domains of various pre-trained models after being fine-tuned on the same in-distribution dataset and show that foundation-based models enjoy better robustness than other architectures. From here, we further developed a new Bayesian uncertainty estimation for frozen models and used them as an indicator to characterize the model’s performance on out-of-distribution (OOD) data, proving particularly beneficial for real-world applications. Our experiments not only reveal the limitations of current indicators like accuracy on the line or agreement on the line commonly used in natural image applications but also emphasize the promise of the introduced Bayesian uncertainty. Specifically, lower uncertainty predictions usually tend to higher out-of-distribution (OOD) performance.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Constructing a robust model that can effectively generalize to test samples under distribution shifts remains a significant challenge in the field of medical imaging. The foundational models for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach. It showcases impressive learning abilities across different tasks with the need for only a limited amount of annotated samples. While numerous techniques have focused on developing better fine-tuning strategies to adapt these models for specific domains, we instead examine their robustness to domain shifts in the medical image segmentation task. To this end, we compare the generalization performance to unseen domains of various pre-trained models after being fine-tuned on the same in-distribution dataset and show that foundation-based models enjoy better robustness than other architectures. From here, we further developed a new Bayesian uncertainty estimation for frozen models and used them as an indicator to characterize the model’s performance on out-of-distribution (OOD) data, proving particularly beneficial for real-world applications. Our experiments not only reveal the limitations of current indicators like accuracy on the line or agreement on the line commonly used in natural image applications but also emphasize the promise of the introduced Bayesian uncertainty. Specifically, lower uncertainty predictions usually tend to higher out-of-distribution (OOD) performance. |
Johns, Christoph Albert; Barz, Michael; Sonntag, Daniel Interactive Link Prediction as a Downstream Task for Foundational GUI Understanding Models Inproceedings Seipel, Dietmar; Steen, Alexander (Ed.): KI 2023: Advances in Artificial Intelligence, pp. 75-89, Springer Nature Switzerland, 2023. @inproceedings{13988, title = {Interactive Link Prediction as a Downstream Task for Foundational GUI Understanding Models}, author = {Christoph Albert Johns and Michael Barz and Daniel Sonntag}, editor = {Dietmar Seipel and Alexander Steen}, url = {https://www.dfki.de/fileadmin/user_upload/import/13988_KI__23___Link_Prediction_as_a_Downstream_Task_in_GUI_Understanding_(2).pdf}, year = {2023}, date = {2023-09-01}, booktitle = {KI 2023: Advances in Artificial Intelligence}, pages = {75-89}, publisher = {Springer Nature Switzerland}, abstract = {AI models that can recognize and understand the semantics of graphical user interfaces (GUIs) enable a variety of use cases ranging from accessibility to automation. Recent efforts in this domain have pursued the development of a set of foundation models: generic GUI understanding models that can be used off-the-shelf to solve a variety of GUI-related tasks, including ones that they were not trained on. In order to develop such foundation models, meaningful downstream tasks and baselines for GUI-related use cases will be required. In this paper, we present interactive link prediction as a downstream task for GUI understanding models and provide baselines as well as testing tools to effectively and efficiently evaluate predictive GUI understanding models. In interactive link prediction, the task is to predict whether tapping on an element on one screen of a mobile application (source element) navigates the user to a second screen (target screen). If this task is solved sufficiently, it can demonstrate an understanding of the relationship between elements and components across screens and enable various applications in GUI design automation and assistance. To encourage and support research on interactive link prediction, this paper contributes (1) a pre-processed large-scale dataset of links in mobile applications (18,830 links from 5,362 applications) derived from the popular RICO dataset, (2) performance baselines from five heuristic-based and two learning-based GUI understanding models, (3) a small-scale dataset of links in mobile GUI prototypes including ratings from an online study with 36 end-users for out-of-sample testing, and (4) a Figma plugin that can leverage link prediction models to automate and assist mobile GUI prototyping.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } AI models that can recognize and understand the semantics of graphical user interfaces (GUIs) enable a variety of use cases ranging from accessibility to automation. Recent efforts in this domain have pursued the development of a set of foundation models: generic GUI understanding models that can be used off-the-shelf to solve a variety of GUI-related tasks, including ones that they were not trained on. In order to develop such foundation models, meaningful downstream tasks and baselines for GUI-related use cases will be required. In this paper, we present interactive link prediction as a downstream task for GUI understanding models and provide baselines as well as testing tools to effectively and efficiently evaluate predictive GUI understanding models. In interactive link prediction, the task is to predict whether tapping on an element on one screen of a mobile application (source element) navigates the user to a second screen (target screen). If this task is solved sufficiently, it can demonstrate an understanding of the relationship between elements and components across screens and enable various applications in GUI design automation and assistance. To encourage and support research on interactive link prediction, this paper contributes (1) a pre-processed large-scale dataset of links in mobile applications (18,830 links from 5,362 applications) derived from the popular RICO dataset, (2) performance baselines from five heuristic-based and two learning-based GUI understanding models, (3) a small-scale dataset of links in mobile GUI prototypes including ratings from an online study with 36 end-users for out-of-sample testing, and (4) a Figma plugin that can leverage link prediction models to automate and assist mobile GUI prototyping. |
Kath, Hannes; Lüers, Bengt; Gouvêa, Thiago S; Sonntag, Daniel Lost in Dialogue: A Review and Categorisation of Current Dialogue System Approaches and Technical Solutions Inproceedings Seipel, Dietmar; Steen, Alexander (Ed.): KI 2023: Advances in Artificial Intelligence, pp. 98-113, Springer, 2023. @inproceedings{14648, title = {Lost in Dialogue: A Review and Categorisation of Current Dialogue System Approaches and Technical Solutions}, author = {Hannes Kath and Bengt Lüers and Thiago S Gouvêa and Daniel Sonntag}, editor = {Dietmar Seipel and Alexander Steen}, url = {https://www.dfki.de/fileadmin/user_upload/import/14648_978-3-031-42608-7-seiten-2.pdf https://link.springer.com/chapter/10.1007/978-3-031-42608-7_9}, year = {2023}, date = {2023-09-01}, booktitle = {KI 2023: Advances in Artificial Intelligence}, volume = {14236}, pages = {98-113}, publisher = {Springer}, abstract = {Dialogue systems are an important and very active research area with many practical applications. However, researchers and practitioners new to the field may have difficulty with the categorisation, number and terminology of existing free and commercial systems. Our paper aims to achieve two main objectives. Firstly, based on our structured literature review, we provide a categorisation of dialogue systems according to the objective, modality, domain, architecture, and model, and provide information on the correlations among these categories. Secondly, we summarise and compare frameworks and applications of intelligent virtual assistants, commercial frameworks, research dialogue systems, and large language models according to these categories and provide system recommendations for researchers new to the field.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Dialogue systems are an important and very active research area with many practical applications. However, researchers and practitioners new to the field may have difficulty with the categorisation, number and terminology of existing free and commercial systems. Our paper aims to achieve two main objectives. Firstly, based on our structured literature review, we provide a categorisation of dialogue systems according to the objective, modality, domain, architecture, and model, and provide information on the correlations among these categories. Secondly, we summarise and compare frameworks and applications of intelligent virtual assistants, commercial frameworks, research dialogue systems, and large language models according to these categories and provide system recommendations for researchers new to the field. |
Anagnostopoulou, Aliki; Hartmann, Mareike; Sonntag, Daniel Towards Adaptable and Interactive Image Captioning with Data Augmentation and Episodic Memory Inproceedings Proceedings of The Fourth Workshop on Simple and Efficient Natural Language Processing (SustaiNLP), Association for Computational Linguistics, 2023. @inproceedings{13420, title = {Towards Adaptable and Interactive Image Captioning with Data Augmentation and Episodic Memory}, author = {Aliki Anagnostopoulou and Mareike Hartmann and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/13420__SustaiNLP__Interactive_Image_Captioning.pdf https://aclanthology.org/2023.sustainlp-1.19/}, year = {2023}, date = {2023-07-01}, booktitle = {Proceedings of The Fourth Workshop on Simple and Efficient Natural Language Processing (SustaiNLP)}, publisher = {Association for Computational Linguistics}, abstract = {Interactive machine learning (IML) is a beneficial learning paradigm in cases of limited data availability, as human feedback is incrementally integrated into the training process. In this paper, we present an IML pipeline for image captioning which allows us to incrementally adapt a pre-trained image captioning model to a new data distribution based on user input. In order to incorporate user input into the model, we explore the use of a combination of simple data augmentation methods to obtain larger data batches for each newly annotated data instance and implement continual learning methods to prevent catastrophic forgetting from repeated updates. For our experiments, we split a domain-specific image captioning dataset, namely VizWiz, into non-overlapping parts to simulate an incremental input flow for continually adapting the model to new data. We find that, while data augmentation worsens results, even when relatively small amounts of data are available, episodic memory is an effective strategy to retain knowledge from previously seen clusters.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Interactive machine learning (IML) is a beneficial learning paradigm in cases of limited data availability, as human feedback is incrementally integrated into the training process. In this paper, we present an IML pipeline for image captioning which allows us to incrementally adapt a pre-trained image captioning model to a new data distribution based on user input. In order to incorporate user input into the model, we explore the use of a combination of simple data augmentation methods to obtain larger data batches for each newly annotated data instance and implement continual learning methods to prevent catastrophic forgetting from repeated updates. For our experiments, we split a domain-specific image captioning dataset, namely VizWiz, into non-overlapping parts to simulate an incremental input flow for continually adapting the model to new data. We find that, while data augmentation worsens results, even when relatively small amounts of data are available, episodic memory is an effective strategy to retain knowledge from previously seen clusters. |
Bunde, Enrico; Eisenhardt, Daniel; Sonntag, Daniel; Profitlich, Hans-Jürgen; Meske, Christian Giving DIAnA More TIME – Guidance for the Design of XAI-Based Medical Decision Support Systems Inproceedings 18th International Conference on Design Science Research in Information Systems and Technology, DESRIST 2023, Springer Nature Switzerland, 2023. @inproceedings{14706, title = {Giving DIAnA More TIME – Guidance for the Design of XAI-Based Medical Decision Support Systems}, author = {Enrico Bunde and Daniel Eisenhardt and Daniel Sonntag and Hans-Jürgen Profitlich and Christian Meske}, url = {https://www.dfki.de/fileadmin/user_upload/import/14706_PublishedGivingDianamoreTimeDESRIST.pdf https://scholar.google.de/citations?view_op=view_citation&hl=en&user=v7i6Uz4AAAAJ&sortby=pubdate&citation_for_view=v7i6Uz4AAAAJ:anf4URPfarAC}, year = {2023}, date = {2023-05-01}, booktitle = {18th International Conference on Design Science Research in Information Systems and Technology, DESRIST 2023}, publisher = {Springer Nature Switzerland}, abstract = {Future healthcare ecosystems integrating human-centered artificial intelligence (AI) will be indispensable. AI-based healthcare technologies can sup- port diagnosis processes and make healthcare more accessible globally. In this con- text, we conducted a design science research project intending to introduce design principles for user interfaces (UIs) of explainable AI-based (XAI) medical deci- sion support systems (XAI-based MDSS). We used an archaeological approach to analyze the UI of an existing web-based system in the context of skin lesion classification called DIAnA (Dermatological Images – Analysis and Archiving). One of DIAnA’s unique characteristics is that it should be usable for the stake- holder groups of physicians and patients. We conducted the in-situ analysis with these stakeholders using the think-aloud method and semi-structured interviews. We anchored our interview guide in concepts of the Theory of Interactive Media Effects (TIME), which formulates UI features as causes and user psychology as effects. Based on the results, we derived 20 design requirements and developed nine design principles grounded in TIME for this class of XAI-based MDSS, either associated with the needs of physicians, patients, or both. Regarding evaluation, we first conducted semi-structured interviews with software developers to assess the reusability of our design principles. Afterward, we conducted a survey with user experience/interface designers. The evaluation uncovered that 77% of the participants would adopt the design principles, and 82% would recommend them to colleagues for a suitable project. The findings prove the reusability of the design principles and highlight a positive perception by potential implementers.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Future healthcare ecosystems integrating human-centered artificial intelligence (AI) will be indispensable. AI-based healthcare technologies can sup- port diagnosis processes and make healthcare more accessible globally. In this con- text, we conducted a design science research project intending to introduce design principles for user interfaces (UIs) of explainable AI-based (XAI) medical deci- sion support systems (XAI-based MDSS). We used an archaeological approach to analyze the UI of an existing web-based system in the context of skin lesion classification called DIAnA (Dermatological Images – Analysis and Archiving). One of DIAnA’s unique characteristics is that it should be usable for the stake- holder groups of physicians and patients. We conducted the in-situ analysis with these stakeholders using the think-aloud method and semi-structured interviews. We anchored our interview guide in concepts of the Theory of Interactive Media Effects (TIME), which formulates UI features as causes and user psychology as effects. Based on the results, we derived 20 design requirements and developed nine design principles grounded in TIME for this class of XAI-based MDSS, either associated with the needs of physicians, patients, or both. Regarding evaluation, we first conducted semi-structured interviews with software developers to assess the reusability of our design principles. Afterward, we conducted a survey with user experience/interface designers. The evaluation uncovered that 77% of the participants would adopt the design principles, and 82% would recommend them to colleagues for a suitable project. The findings prove the reusability of the design principles and highlight a positive perception by potential implementers. |
Nguyen, Ho Minh Duy; Nguyen, Hoang; Truong, Mai T N; Cao, Tri; Nguyen, Binh T; Ho, Nhat; Swoboda, Paul; Albarqouni, Shadi; Xie, Pengtao; Sonntag, Daniel Joint Self-Supervised Image-Volume Representation Learning with Intra-Inter Contrastive Clustering Inproceedings Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-2023), AAAI Press, 2023. @inproceedings{12923, title = {Joint Self-Supervised Image-Volume Representation Learning with Intra-Inter Contrastive Clustering}, author = {Ho Minh Duy Nguyen and Hoang Nguyen and Mai T N Truong and Tri Cao and Binh T Nguyen and Nhat Ho and Paul Swoboda and Shadi Albarqouni and Pengtao Xie and Daniel Sonntag}, url = {https://arxiv.org/pdf/2212.01893.pdf}, year = {2023}, date = {2023-02-01}, booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-2023)}, publisher = {AAAI Press}, abstract = {Collecting large-scale medical datasets with fully annotated samples for training of deep networks is prohibitively expensive, especially for 3D volume data. Recent breakthroughs in self-supervised learning (SSL) offer the ability to overcome the lack of labeled training samples by learning feature representations from unlabeled data. However, most current SSL techniques in the medical field have been designed for either 2D images or 3D volumes. In practice, this restricts the capability to fully leverage unlabeled data from numerous sources, which may include both 2D and 3D data. Additionally, the use of these pre-trained networks is constrained to downstream tasks with compatible data dimensions. In this paper, we propose a novel framework for unsupervised joint learning on 2D and 3D data modalities. Given a set of 2D images or 2D slices extracted from 3D volumes, we construct an SSL task based on a 2D contrastive clustering problem for distinct classes. The 3D volumes are exploited by computing vectored embedding at each slice and then assembling a holistic feature through deformable self-attention mechanisms in Transformer, allowing incorporating long-range dependencies between slices inside 3D volumes. These holistic features are further utilized to define a novel 3D clustering agreement-based SSL task and masking embedding prediction inspired by pre-trained language models. Experiments on downstream tasks, such as 3D brain segmentation, lung nodule detection, 3D heart structures segmentation, and abnormal chest X-ray detection, demonstrate the effectiveness of our joint 2D and 3D SSL approach. We improve plain 2D Deep-ClusterV2 and SwAV by a significant margin and also surpass various modern 2D and 3D SSL approaches.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Collecting large-scale medical datasets with fully annotated samples for training of deep networks is prohibitively expensive, especially for 3D volume data. Recent breakthroughs in self-supervised learning (SSL) offer the ability to overcome the lack of labeled training samples by learning feature representations from unlabeled data. However, most current SSL techniques in the medical field have been designed for either 2D images or 3D volumes. In practice, this restricts the capability to fully leverage unlabeled data from numerous sources, which may include both 2D and 3D data. Additionally, the use of these pre-trained networks is constrained to downstream tasks with compatible data dimensions. In this paper, we propose a novel framework for unsupervised joint learning on 2D and 3D data modalities. Given a set of 2D images or 2D slices extracted from 3D volumes, we construct an SSL task based on a 2D contrastive clustering problem for distinct classes. The 3D volumes are exploited by computing vectored embedding at each slice and then assembling a holistic feature through deformable self-attention mechanisms in Transformer, allowing incorporating long-range dependencies between slices inside 3D volumes. These holistic features are further utilized to define a novel 3D clustering agreement-based SSL task and masking embedding prediction inspired by pre-trained language models. Experiments on downstream tasks, such as 3D brain segmentation, lung nodule detection, 3D heart structures segmentation, and abnormal chest X-ray detection, demonstrate the effectiveness of our joint 2D and 3D SSL approach. We improve plain 2D Deep-ClusterV2 and SwAV by a significant margin and also surpass various modern 2D and 3D SSL approaches. |
Barz, Michael; Bhatti, Omair Shahzad; Alam, Hasan Md Tusfiqur; Nguyen, Ho Minh Duy; Sonntag, Daniel Interactive Fixation-to-AOI Mapping for Mobile Eye Tracking Data Based on Few-Shot Image Classification Inproceedings Companion Proceedings of the 28th International Conference on Intelligent User Interfaces, pp. 175-178, Association for Computing Machinery, 2023. @inproceedings{13196, title = {Interactive Fixation-to-AOI Mapping for Mobile Eye Tracking Data Based on Few-Shot Image Classification}, author = {Michael Barz and Omair Shahzad Bhatti and Hasan Md Tusfiqur Alam and Ho Minh Duy Nguyen and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/13196_3581754.3584179.pdf}, doi = {https://doi.org/10.1145/3581754.3584179}, year = {2023}, date = {2023-01-01}, booktitle = {Companion Proceedings of the 28th International Conference on Intelligent User Interfaces}, pages = {175-178}, publisher = {Association for Computing Machinery}, abstract = {Mobile eye tracking is an important tool in psychology and human-centred interaction design for understanding how people process visual scenes and user interfaces. However, analysing recordings from mobile eye trackers, which typically include an egocentric video of the scene and a gaze signal, is a time-consuming and largely manual process. To address this challenge, we propose a web-based annotation tool that leverages few-shot image classification and interactive machine learning (IML) to accelerate the annotation process. The tool allows users to efficiently map fixations to areas of interest (AOI) in a video-editing-style interface. It includes an IML component that generates suggestions and learns from user feedback using a few-shot image classification model initialised with a small number of images per AOI. Our goal is to improve the efficiency and accuracy of fixation-to-AOI mapping in mobile eye tracking.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Mobile eye tracking is an important tool in psychology and human-centred interaction design for understanding how people process visual scenes and user interfaces. However, analysing recordings from mobile eye trackers, which typically include an egocentric video of the scene and a gaze signal, is a time-consuming and largely manual process. To address this challenge, we propose a web-based annotation tool that leverages few-shot image classification and interactive machine learning (IML) to accelerate the annotation process. The tool allows users to efficiently map fixations to areas of interest (AOI) in a video-editing-style interface. It includes an IML component that generates suggestions and learns from user feedback using a few-shot image classification model initialised with a small number of images per AOI. Our goal is to improve the efficiency and accuracy of fixation-to-AOI mapping in mobile eye tracking. |
Kopácsi, László; Barz, Michael; Bhatti, Omair Shahzad; Sonntag, Daniel IMETA: An Interactive Mobile Eye Tracking Annotation Method for Semi-Automatic Fixation-to-AOI Mapping Inproceedings Companion Proceedings of the 28th International Conference on Intelligent User Interfaces, pp. 33-36, Association for Computing Machinery, 2023. @inproceedings{13201, title = {IMETA: An Interactive Mobile Eye Tracking Annotation Method for Semi-Automatic Fixation-to-AOI Mapping}, author = {László Kopácsi and Michael Barz and Omair Shahzad Bhatti and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/13201_3581754.3584125.pdf}, doi = {https://doi.org/10.1145/3581754.3584125}, year = {2023}, date = {2023-01-01}, booktitle = {Companion Proceedings of the 28th International Conference on Intelligent User Interfaces}, pages = {33-36}, publisher = {Association for Computing Machinery}, abstract = {Mobile eye tracking studies involve analyzing areas of interest (AOIs) and visual attention to these AOIs to understand how people process visual information. However, accurately annotating the data collected for user studies can be a challenging and time-consuming task. Current approaches for automatically or semi-automatically analyzing head-mounted eye tracking data in mobile eye tracking studies have limitations, including a lack of annotation flexibility or the inability to adapt to specific target domains. To address this problem, we present IMETA, an architecture for semi-automatic fixation-to-AOI mapping. When an annotator assigns an AOI label to a sequence of frames based on the respective fixation points, an interactive video object segmentation method is used to estimate the mask proposal of the AOI. Then, we use the 3D reconstruction of the visual scene created from the eye tracking video to map these AOI masks to 3D. The resulting 3D segmentation of the AOI can be used to suggest labels for the rest of the video, with the suggestions becoming increasingly accurate as more samples are provided by an annotator using interactive machine learning (IML). IMETA has the potential to reduce the annotation workload and speed up the evaluation of mobile eye tracking studies.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Mobile eye tracking studies involve analyzing areas of interest (AOIs) and visual attention to these AOIs to understand how people process visual information. However, accurately annotating the data collected for user studies can be a challenging and time-consuming task. Current approaches for automatically or semi-automatically analyzing head-mounted eye tracking data in mobile eye tracking studies have limitations, including a lack of annotation flexibility or the inability to adapt to specific target domains. To address this problem, we present IMETA, an architecture for semi-automatic fixation-to-AOI mapping. When an annotator assigns an AOI label to a sequence of frames based on the respective fixation points, an interactive video object segmentation method is used to estimate the mask proposal of the AOI. Then, we use the 3D reconstruction of the visual scene created from the eye tracking video to map these AOI masks to 3D. The resulting 3D segmentation of the AOI can be used to suggest labels for the rest of the video, with the suggestions becoming increasingly accurate as more samples are provided by an annotator using interactive machine learning (IML). IMETA has the potential to reduce the annotation workload and speed up the evaluation of mobile eye tracking studies. |
Kadir, Md Abdul; Selim, Abdulrahman Mohamed; Barz, Michael; Sonntag, Daniel A User Interface for Explaining Machine Learning Model Explanations Inproceedings Companion Proceedings of the 28th International Conference on Intelligent User Interfaces, pp. 59–63, Association for Computing Machinery, Sydney, NSW, Australia, 2023, ISBN: 9798400701078. @inproceedings{13200, title = {A User Interface for Explaining Machine Learning Model Explanations}, author = {Md Abdul Kadir and Abdulrahman Mohamed Selim and Michael Barz and Daniel Sonntag}, url = {https://doi.org/10.1145/3581754.3584131}, doi = {10.1145/3581754.3584131}, isbn = {9798400701078}, year = {2023}, date = {2023-01-01}, booktitle = {Companion Proceedings of the 28th International Conference on Intelligent User Interfaces}, pages = {59–63}, publisher = {Association for Computing Machinery}, address = {Sydney, NSW, Australia}, series = {IUI '23 Companion}, abstract = {Explainable Artificial Intelligence (XAI) is an emerging subdiscipline of Machine Learning (ML) and human-computer interaction. Discriminative models need to be understood. An explanation of such ML models is vital when an AI system makes decisions that have significant consequences, such as in healthcare or finance. By providing an input-specific explanation, users can gain confidence in an AI system’s decisions and be more willing to trust and rely on it. One problem is that interpreting example-based explanations for discriminative models, such as saliency maps, can be difficult because it is not always clear how the highlighted features contribute to the model’s overall prediction or decisions. Moreover, saliency maps, which are state-of-the-art visual explanation methods, do not provide concrete information on the influence of particular features. We propose an interactive visualisation tool called EMILE-UI that allows users to evaluate the provided explanations of an image-based classification task, specifically those provided by saliency maps. This tool allows users to evaluate the accuracy of a saliency map by reflecting the true attention or focus of the corresponding model. It visualises the relationship between the ML model and its explanation of input images, making it easier to interpret saliency maps and understand how the ML model actually predicts. Our tool supports a wide range of deep learning image classification models and image data as inputs.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Explainable Artificial Intelligence (XAI) is an emerging subdiscipline of Machine Learning (ML) and human-computer interaction. Discriminative models need to be understood. An explanation of such ML models is vital when an AI system makes decisions that have significant consequences, such as in healthcare or finance. By providing an input-specific explanation, users can gain confidence in an AI system’s decisions and be more willing to trust and rely on it. One problem is that interpreting example-based explanations for discriminative models, such as saliency maps, can be difficult because it is not always clear how the highlighted features contribute to the model’s overall prediction or decisions. Moreover, saliency maps, which are state-of-the-art visual explanation methods, do not provide concrete information on the influence of particular features. We propose an interactive visualisation tool called EMILE-UI that allows users to evaluate the provided explanations of an image-based classification task, specifically those provided by saliency maps. This tool allows users to evaluate the accuracy of a saliency map by reflecting the true attention or focus of the corresponding model. It visualises the relationship between the ML model and its explanation of input images, making it easier to interpret saliency maps and understand how the ML model actually predicts. Our tool supports a wide range of deep learning image classification models and image data as inputs. |
Gouvea, Thiago; Kath, Hannes; Troshani, Ilira; Lüers, Bengt; Serafini, Patrícia P; Campos, Ivan B; Afonso, André S; Leandro, Sérgio M F M; Swanepoel, Lourens; Theron, Nicholas; Swemmer, Anthony M; Sonntag, Daniel Interactive Machine Learning Solutions for Acoustic Monitoring of Animal Wildlife in Biosphere Reserves Inproceedings Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, International Joint Conferences on Artificial Intelligence, 2023. @inproceedings{13356, title = {Interactive Machine Learning Solutions for Acoustic Monitoring of Animal Wildlife in Biosphere Reserves}, author = {Thiago Gouvea and Hannes Kath and Ilira Troshani and Bengt Lüers and Patrícia P Serafini and Ivan B Campos and André S Afonso and Sérgio M F M Leandro and Lourens Swanepoel and Nicholas Theron and Anthony M Swemmer and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/13356_IJCAI_ProjectProposal_PAM_in_Biosphere_Reserves.pdf}, year = {2023}, date = {2023-01-01}, booktitle = {Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence}, publisher = {International Joint Conferences on Artificial Intelligence}, abstract = {Biodiversity loss is taking place at accelerated rates globally, and a business-as-usual trajectory will lead to missing internationally established conservation goals. Biosphere reserves are sites designed to be of global significance in terms of both the biodiversity within them and their potential for sustainable development, and are therefore ideal places for the development of local solutions to global challenges. While the protection of biodiversity is a primary goal of biosphere reserves, adequate information on the state and trends of biodiversity remains a critical gap for adaptive management in biosphere reserves. Passive acoustic monitoring (PAM) is an increasingly popular method for continued, reproducible, scalable, and cost-effective monitoring of animal wildlife. PAM adoption is on the rise, but its data management and analysis requirements pose a barrier for adoption for most agencies tasked with monitoring biodiversity. As an interdisciplinary team of machine learning scientists and ecologists experienced with PAM and working at biosphere reserves in marine and terrestrial ecosystems on three different continents, we report on the co-development of interactive machine learning tools for semi-automated assessment of animal wildlife.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Biodiversity loss is taking place at accelerated rates globally, and a business-as-usual trajectory will lead to missing internationally established conservation goals. Biosphere reserves are sites designed to be of global significance in terms of both the biodiversity within them and their potential for sustainable development, and are therefore ideal places for the development of local solutions to global challenges. While the protection of biodiversity is a primary goal of biosphere reserves, adequate information on the state and trends of biodiversity remains a critical gap for adaptive management in biosphere reserves. Passive acoustic monitoring (PAM) is an increasingly popular method for continued, reproducible, scalable, and cost-effective monitoring of animal wildlife. PAM adoption is on the rise, but its data management and analysis requirements pose a barrier for adoption for most agencies tasked with monitoring biodiversity. As an interdisciplinary team of machine learning scientists and ecologists experienced with PAM and working at biosphere reserves in marine and terrestrial ecosystems on three different continents, we report on the co-development of interactive machine learning tools for semi-automated assessment of animal wildlife. |
van Zoelen, Emma; Mioch, Tina; Tajaddini, Mani; Fleiner, Christian; Tsaneva, Stefani; Camin, Pietro; Gouvea, Thiago; Baraka, Kim; de Boer, Maaike H T; Neerincx, Mark A Developing Team Design Patterns for Hybrid Intelligence Systems Inproceedings Frontiers in Artificial Intelligence and Applications, IOS Press, 2023. @inproceedings{13357, title = {Developing Team Design Patterns for Hybrid Intelligence Systems}, author = {Emma van Zoelen and Tina Mioch and Mani Tajaddini and Christian Fleiner and Stefani Tsaneva and Pietro Camin and Thiago Gouvea and Kim Baraka and Maaike H T de Boer and Mark A Neerincx}, year = {2023}, date = {2023-01-01}, booktitle = {Frontiers in Artificial Intelligence and Applications}, publisher = {IOS Press}, abstract = {With artificial intelligence (AI) systems entering our working and leisure environments with increasing adaptation and learning capabilities, new opportunities arise for developing hybrid (human-AI) intelligence (HI) systems, comprising new ways of collaboration. However, there is not yet a structured way of specifying design solutions of collaboration for hybrid intelligence (HI) systems and there is a lack of best practices shared across application domains. We address this gap by investigating the generalization of specific design solutions into design patterns that can be shared and applied in different contexts. We present a human-centered bottom-up approach for the specification of design solutions and their abstraction into team design patterns. We apply the proposed approach for 4 concrete HI use cases and show the successful extraction of team design patterns that are generalizable, providing re-usable design components across various domains. This work advances previous research on team design patterns and designing applications of HI systems.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } With artificial intelligence (AI) systems entering our working and leisure environments with increasing adaptation and learning capabilities, new opportunities arise for developing hybrid (human-AI) intelligence (HI) systems, comprising new ways of collaboration. However, there is not yet a structured way of specifying design solutions of collaboration for hybrid intelligence (HI) systems and there is a lack of best practices shared across application domains. We address this gap by investigating the generalization of specific design solutions into design patterns that can be shared and applied in different contexts. We present a human-centered bottom-up approach for the specification of design solutions and their abstraction into team design patterns. We apply the proposed approach for 4 concrete HI use cases and show the successful extraction of team design patterns that are generalizable, providing re-usable design components across various domains. This work advances previous research on team design patterns and designing applications of HI systems. |
Kath, Hannes; Gouvea, Thiago; Sonntag, Daniel A Human-in-the-Loop Tool for Annotating Passive Acoustic Monitoring Datasets Inproceedings Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence. International Joint Conference on Artificial Intelligence (IJCAI-2023), located at IJCAI, August 19-25, Macao, Macao, International Joint Conferences on Artificial Intelligence, 2023. @inproceedings{13395, title = {A Human-in-the-Loop Tool for Annotating Passive Acoustic Monitoring Datasets}, author = {Hannes Kath and Thiago Gouvea and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/13395_A_human-in-the-loop_tool_for_annotating_passive_acoustic_monitoring_datasets_v2_standardisedPDF.pdf}, year = {2023}, date = {2023-01-01}, booktitle = {Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence. International Joint Conference on Artificial Intelligence (IJCAI-2023), located at IJCAI, August 19-25, Macao, Macao}, publisher = {International Joint Conferences on Artificial Intelligence}, abstract = {Deep learning methods are well suited for data analysis in several domains, but application is often limited by technical entry barriers and the availability of large annotated datasets. We present an interactive machine learning tool for annotating passive acoustic monitoring datasets created for wildlife monitoring, which are time-consuming and costly to annotate manually. The tool, designed as a web application, consists of an interactive user interface implementing a human-in-the-loop workflow. Class label annotations provided manually as bounding boxes drawn over a spectrogram are consumed by a deep generative model (DGM) that learns a low-dimensional representation of the input data, as well as the available class labels. The learned low-dimensional representation is displayed as an interactive interface element, where new bounding boxes can be efficiently generated by the user with lasso-selection; alternatively, the DGM can propose new, automatically generated bounding boxes on demand. The user can accept, edit, or reject annotations suggested by the model, thus owning final judgement. Generated annotations can be used to fine-tune the underlying model, thus closing the loop. Investigations of the prediction accuracy and first empirical experiments show promising results on an artificial data set, laying the ground for application to a real life scenario.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Deep learning methods are well suited for data analysis in several domains, but application is often limited by technical entry barriers and the availability of large annotated datasets. We present an interactive machine learning tool for annotating passive acoustic monitoring datasets created for wildlife monitoring, which are time-consuming and costly to annotate manually. The tool, designed as a web application, consists of an interactive user interface implementing a human-in-the-loop workflow. Class label annotations provided manually as bounding boxes drawn over a spectrogram are consumed by a deep generative model (DGM) that learns a low-dimensional representation of the input data, as well as the available class labels. The learned low-dimensional representation is displayed as an interactive interface element, where new bounding boxes can be efficiently generated by the user with lasso-selection; alternatively, the DGM can propose new, automatically generated bounding boxes on demand. The user can accept, edit, or reject annotations suggested by the model, thus owning final judgement. Generated annotations can be used to fine-tune the underlying model, thus closing the loop. Investigations of the prediction accuracy and first empirical experiments show promising results on an artificial data set, laying the ground for application to a real life scenario. |
Liang, Siting; Hartmann, Mareike; Sonntag, Daniel Cross-domain German Medical Named Entity Recognition using a Pre-Trained Language Model and Unified Medical Semantic Types Inproceedings Association for Computational Linguistics, ACL, 2023. @inproceedings{13402, title = {Cross-domain German Medical Named Entity Recognition using a Pre-Trained Language Model and Unified Medical Semantic Types}, author = {Siting Liang and Mareike Hartmann and Daniel Sonntag}, url = {https://aclanthology.org/2023.clinicalnlp-1.31/}, year = {2023}, date = {2023-01-01}, booktitle = {Association for Computational Linguistics}, publisher = {ACL}, abstract = {nformation extraction from clinical text has the potential to facilitate clinical research and personalized clinical care, but annotating large amounts of data for each set of target tasks is prohibitive. We present a German medical Named Entity Recognition (NER) system capable of cross-domain knowledge transferring. The system builds on a pre-trained German language model and a token-level binary classifier, employing semantic types sourced from the Unified Medical Language System (UMLS) as entity labels to identify corresponding entity spans within the input text. To enhance the system’s performance and robustness, we pre-train it using a medical literature corpus that incorporates UMLS semantic term annotations. We evaluate the system’s effectiveness on two German annotated datasets obtained from different clinics in zero- and few-shot settings. The results show that our approach outperforms task-specific Condition Random Fields (CRF) classifiers in terms of accuracy. Our work contributes to developing robust and transparent German medical NER models that can support the extraction of information from various clinical texts.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } nformation extraction from clinical text has the potential to facilitate clinical research and personalized clinical care, but annotating large amounts of data for each set of target tasks is prohibitive. We present a German medical Named Entity Recognition (NER) system capable of cross-domain knowledge transferring. The system builds on a pre-trained German language model and a token-level binary classifier, employing semantic types sourced from the Unified Medical Language System (UMLS) as entity labels to identify corresponding entity spans within the input text. To enhance the system’s performance and robustness, we pre-train it using a medical literature corpus that incorporates UMLS semantic term annotations. We evaluate the system’s effectiveness on two German annotated datasets obtained from different clinics in zero- and few-shot settings. The results show that our approach outperforms task-specific Condition Random Fields (CRF) classifiers in terms of accuracy. Our work contributes to developing robust and transparent German medical NER models that can support the extraction of information from various clinical texts. |
Kuznetsov, Konstantin; Barz, Michael; Sonntag, Daniel Detection of Contract Cheating in Pen-and-Paper Exams through the Analysis of Handwriting Style Inproceedings Companion Publication of the 25th International Conference on Multimodal Interaction, pp. 26-30, Association for Computing Machinery, 2023. @inproceedings{14164, title = {Detection of Contract Cheating in Pen-and-Paper Exams through the Analysis of Handwriting Style}, author = {Konstantin Kuznetsov and Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/14164_ICMI23_contract_cheating.pdf}, doi = {https://doi.org/10.1145/3610661.3617162}, year = {2023}, date = {2023-01-01}, booktitle = {Companion Publication of the 25th International Conference on Multimodal Interaction}, pages = {26-30}, publisher = {Association for Computing Machinery}, abstract = {Contract cheating, i.e., when a student employs another person to participate in an exam, appears to become a growing problem in academia. Cases of paid test takers are repeatedly reported in the media, but the number of unreported cases is unclear. Proctoring systems as a countermeasure are typically not appreciated by students and teachers because they may violate the students' privacy and can be imprecise and nontransparent. In this work, we propose to use automatic handwriting analysis based on digital ballpoint pens to identify individuals during exams unobtrusively. We implement a system that enables continuous authentication of the user during exams. We use a deep neural network architecture to model a user's handwriting style. An evaluation based on the large Deepwriting dataset shows that our system can successfully differentiate between the handwriting styles of different authors and hence detect simulated cases of contract cheating. In addition, we conducted a small validation study using digital ballpoint pens to assess the system's reliability in a more realistic environment.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Contract cheating, i.e., when a student employs another person to participate in an exam, appears to become a growing problem in academia. Cases of paid test takers are repeatedly reported in the media, but the number of unreported cases is unclear. Proctoring systems as a countermeasure are typically not appreciated by students and teachers because they may violate the students' privacy and can be imprecise and nontransparent. In this work, we propose to use automatic handwriting analysis based on digital ballpoint pens to identify individuals during exams unobtrusively. We implement a system that enables continuous authentication of the user during exams. We use a deep neural network architecture to model a user's handwriting style. An evaluation based on the large Deepwriting dataset shows that our system can successfully differentiate between the handwriting styles of different authors and hence detect simulated cases of contract cheating. In addition, we conducted a small validation study using digital ballpoint pens to assess the system's reliability in a more realistic environment. |
Kadir, Md Abdul; Alam, Hasan Md Tusfiqur; Sonntag, Daniel EdgeAL: An Edge Estimation Based Active Learning Approach for OCT Segmentation Inproceedings International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 79-89, Springer, Cham, 2023. @inproceedings{14635, title = {EdgeAL: An Edge Estimation Based Active Learning Approach for OCT Segmentation}, author = {Md Abdul Kadir and Hasan Md Tusfiqur Alam and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/14635_paper1593.pdf https://rdcu.be/dvx8f}, year = {2023}, date = {2023-01-01}, booktitle = {International Conference on Medical Image Computing and Computer-Assisted Intervention}, pages = {79-89}, publisher = {Springer, Cham}, abstract = {Active learning algorithms have become increasingly popular for training models with limited data. However, selecting data for annotation remains a challenging problem due to the limited information available on unseen data. To address this issue, we propose EdgeAL, which utilizes the edge information of unseen images as a priori information for measuring uncertainty. The uncertainty is quantified by analyzing the divergence and entropy in model predictions across edges. This measure is then used to select superpixels for annotation. We demonstrate the effectiveness of EdgeAL on multi-class Optical Coherence Tomography (OCT) segmentation tasks, where we achieved a 99% dice score while reducing the annotation label cost to 12%, 2.3%, and 3%, respectively, on three publicly available datasets (Duke, AROI, and UMN). The source code is available at https://github.com/Mak-Ta-Reque/EdgeAL.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Active learning algorithms have become increasingly popular for training models with limited data. However, selecting data for annotation remains a challenging problem due to the limited information available on unseen data. To address this issue, we propose EdgeAL, which utilizes the edge information of unseen images as a priori information for measuring uncertainty. The uncertainty is quantified by analyzing the divergence and entropy in model predictions across edges. This measure is then used to select superpixels for annotation. We demonstrate the effectiveness of EdgeAL on multi-class Optical Coherence Tomography (OCT) segmentation tasks, where we achieved a 99% dice score while reducing the annotation label cost to 12%, 2.3%, and 3%, respectively, on three publicly available datasets (Duke, AROI, and UMN). The source code is available at https://github.com/Mak-Ta-Reque/EdgeAL. |
Nayebi, Maleknaz; Kuznetsov, Konstantin; Zeller, Andreas; Ruhe, Guenther User Driven Functionality Deletion for Mobile Apps Inproceedings IEEE International Requirements Engineering Conference, IEEE, 2023. @inproceedings{14666, title = {User Driven Functionality Deletion for Mobile Apps}, author = {Maleknaz Nayebi and Konstantin Kuznetsov and Andreas Zeller and Guenther Ruhe}, url = {https://ieeexplore.ieee.org/document/10260783/}, year = {2023}, date = {2023-01-01}, booktitle = {IEEE International Requirements Engineering Conference}, publisher = {IEEE}, abstract = {Evolving software with an increasing number of features is harder to understand and thus harder to use. Software release planning has been concerned with planning these additions. Moreover, software of increasing size takes more effort to be maintained. In the domain of mobile apps, too much functionality can easily impact usability, maintainability, and resource consumption. Hence, it is important to understand the extent to which the law of continuous growth applies to mobile apps. Previous work showed that the deletion of functionality is common and sometimes driven by user reviews. However, it is unknown whether these deletions are visible or important to the app users. In this study, we surveyed 297 mobile app users to understand the significance of functionality deletion for them. Our results showed that for most users, the deletion of features corresponds with negative sentiments and change in usage and even churn. Motivated by these preliminary results, we propose Radiation to input user reviews and recommend if any functionality should be deleted from an app's User Interface (UI). We evaluate Radiation using historical data and surveying developers' opinions. From the analysis of 190,062 reviews from 115 randomly selected apps, we show that Radiation can recommend functionality deletion with an average F-Score of 74% and if sufficiently many negative user reviews suggest so.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Evolving software with an increasing number of features is harder to understand and thus harder to use. Software release planning has been concerned with planning these additions. Moreover, software of increasing size takes more effort to be maintained. In the domain of mobile apps, too much functionality can easily impact usability, maintainability, and resource consumption. Hence, it is important to understand the extent to which the law of continuous growth applies to mobile apps. Previous work showed that the deletion of functionality is common and sometimes driven by user reviews. However, it is unknown whether these deletions are visible or important to the app users. In this study, we surveyed 297 mobile app users to understand the significance of functionality deletion for them. Our results showed that for most users, the deletion of features corresponds with negative sentiments and change in usage and even churn. Motivated by these preliminary results, we propose Radiation to input user reviews and recommend if any functionality should be deleted from an app's User Interface (UI). We evaluate Radiation using historical data and surveying developers' opinions. From the analysis of 190,062 reviews from 115 randomly selected apps, we show that Radiation can recommend functionality deletion with an average F-Score of 74% and if sufficiently many negative user reviews suggest so. |
Kadir, Md Abdul; Mosavi, Amir; Sonntag, Daniel Evaluation Metrics for XAI: A Review, Taxonomy, and Practical Applications Inproceedings 2023 IEEE 27th International Conference on Intelligent Engineering Systems (INES), pp. 000111-000124, IEEE, 2023. @inproceedings{14708, title = {Evaluation Metrics for XAI: A Review, Taxonomy, and Practical Applications}, author = {Md Abdul Kadir and Amir Mosavi and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/14708_XAI_Evaluation_Metrics__Taxonomies__Concepts_and_Applications__INES_2023_-7.pdf https://ieeexplore.ieee.org/abstract/document/10297629}, year = {2023}, date = {2023-01-01}, booktitle = {2023 IEEE 27th International Conference on Intelligent Engineering Systems (INES)}, pages = {000111-000124}, publisher = {IEEE}, abstract = {Within the past few years, the accuracy of deep learning and machine learning models has been improving significantly while less attention has been paid to their responsibility, explainability, and interpretability. eXplainable Artificial Intelligence (XAI) methods, guidelines, concepts, and strategies offer the possibility of models' evaluation for improving fidelity, faithfulness, and overall explainability. Due to the diversity of data and learning methodologies, there needs to be a clear definition for the validity, reliability, and evaluation metrics of explainability. This article reviews evaluation metrics used for XAI through the PRISMA systematic guideline for a comprehensive and systematic literature review. Based on the results, this study suggests two taxonomy for the evaluation metrics. One taxonomy is based on the applications, and one is based on the evaluation metrics.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Within the past few years, the accuracy of deep learning and machine learning models has been improving significantly while less attention has been paid to their responsibility, explainability, and interpretability. eXplainable Artificial Intelligence (XAI) methods, guidelines, concepts, and strategies offer the possibility of models' evaluation for improving fidelity, faithfulness, and overall explainability. Due to the diversity of data and learning methodologies, there needs to be a clear definition for the validity, reliability, and evaluation metrics of explainability. This article reviews evaluation metrics used for XAI through the PRISMA systematic guideline for a comprehensive and systematic literature review. Based on the results, this study suggests two taxonomy for the evaluation metrics. One taxonomy is based on the applications, and one is based on the evaluation metrics. |
Miscellaneous |
Kadir, Md Abdul; Nunnari, Fabrizio; Sonntag, Daniel Fine-tuning of explainable CNNs for skin lesion classification based on dermatologists' feedback towards increasing trust Miscellaneous 2023. @misc{14709, title = {Fine-tuning of explainable CNNs for skin lesion classification based on dermatologists' feedback towards increasing trust}, author = {Md Abdul Kadir and Fabrizio Nunnari and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/14709_2304.01399.pdf https://arxiv.org/abs/2304.01399}, year = {2023}, date = {2023-01-01}, abstract = {In this paper, we propose a CNN fine-tuning method which enables users to give simultaneous feedback on two outputs: the classification itself and the visual explanation for the classification. We present the effect of this feedback strategy in a skin lesion classification task and measure how CNNs react to the two types of user feedback. To implement this approach, we propose a novel CNN architecture that integrates the Grad-CAM technique for explaining the model's decision in the training loop. Using simulated user feedback, we found that fine-tuning our model on both classification and explanation improves visual explanation while preserving classification accuracy, thus potentially increasing the trust of users in using CNN-based skin lesion classifiers.}, keywords = {}, pubstate = {published}, tppubtype = {misc} } In this paper, we propose a CNN fine-tuning method which enables users to give simultaneous feedback on two outputs: the classification itself and the visual explanation for the classification. We present the effect of this feedback strategy in a skin lesion classification task and measure how CNNs react to the two types of user feedback. To implement this approach, we propose a novel CNN architecture that integrates the Grad-CAM technique for explaining the model's decision in the training loop. Using simulated user feedback, we found that fine-tuning our model on both classification and explanation improves visual explanation while preserving classification accuracy, thus potentially increasing the trust of users in using CNN-based skin lesion classifiers. |
Technical Reports |
Kath, Hannes; Lüers, Bengt; Gouvea, Thiago; Sonntag, Daniel A Virtual Reality Tool for Representing, Visualizing and Updating Deep Learning Models Technical Report DFKI , 2023. @techreport{14710, title = {A Virtual Reality Tool for Representing, Visualizing and Updating Deep Learning Models}, author = {Hannes Kath and Bengt Lüers and Thiago Gouvea and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/14710_2305.15353.pdf}, doi = {https://doi.org/10.48550/arXiv.2305.15353}, year = {2023}, date = {2023-05-01}, volume = {2305.15353v1}, pages = {8}, institution = {DFKI}, abstract = {Deep learning is ubiquitous, but its lack of transparency limits its impact on several potential application areas. We demonstrate a virtual reality tool for automating the process of assigning data inputs to different categories. A dataset is represented as a cloud of points in virtual space. The user explores the cloud through movement and uses hand gestures to categorise portions of the cloud. This triggers gradual movements in the cloud: points of the same category are attracted to each other, different groups are pushed apart, while points are globally distributed in a way that utilises the entire space. The space, time, and forces observed in virtual reality can be mapped to well-defined machine learning concepts, namely the latent space, the training epochs and the backpropagation. Our tool illustrates how the inner workings of deep neural networks can be made tangible and transparent. We expect this approach to accelerate the autonomous development of deep learning applications by end users in novel areas.}, keywords = {}, pubstate = {published}, tppubtype = {techreport} } Deep learning is ubiquitous, but its lack of transparency limits its impact on several potential application areas. We demonstrate a virtual reality tool for automating the process of assigning data inputs to different categories. A dataset is represented as a cloud of points in virtual space. The user explores the cloud through movement and uses hand gestures to categorise portions of the cloud. This triggers gradual movements in the cloud: points of the same category are attracted to each other, different groups are pushed apart, while points are globally distributed in a way that utilises the entire space. The space, time, and forces observed in virtual reality can be mapped to well-defined machine learning concepts, namely the latent space, the training epochs and the backpropagation. Our tool illustrates how the inner workings of deep neural networks can be made tangible and transparent. We expect this approach to accelerate the autonomous development of deep learning applications by end users in novel areas. |
Kath, Hannes; Gouvea, Thiago; Sonntag, Daniel A Deep Generative Model for Interactive Data Annotation through Direct Manipulation in Latent Space Technical Report DFKI , 2023. @techreport{14711, title = {A Deep Generative Model for Interactive Data Annotation through Direct Manipulation in Latent Space}, author = {Hannes Kath and Thiago Gouvea and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/14711_2305.15337.pdf}, doi = {https://doi.org/10.48550/arXiv.2305.15337}, year = {2023}, date = {2023-05-01}, volume = {2305.15337v1}, pages = {7}, institution = {DFKI}, abstract = {The impact of machine learning (ML) in many fields of application is constrained by lack of annotated data. Among existing tools for ML-assisted data annotation, one little explored tool type relies on an analogy between the coordinates of a graphical user interface and the latent space of a neural network for interaction through direct manipulation. In the present work, we 1) expand the paradigm by proposing two new analogies: time and force as reflecting iterations and gradients of network training; 2) propose a network model for learning a compact graphical representation of the data that takes into account both its internal structure and user provided annotations; and 3) investigate the impact of model hyperparameters on the learned graphical representations of the data, identifying candidate model variants for a future user study.}, keywords = {}, pubstate = {published}, tppubtype = {techreport} } The impact of machine learning (ML) in many fields of application is constrained by lack of annotated data. Among existing tools for ML-assisted data annotation, one little explored tool type relies on an analogy between the coordinates of a graphical user interface and the latent space of a neural network for interaction through direct manipulation. In the present work, we 1) expand the paradigm by proposing two new analogies: time and force as reflecting iterations and gradients of network training; 2) propose a network model for learning a compact graphical representation of the data that takes into account both its internal structure and user provided annotations; and 3) investigate the impact of model hyperparameters on the learned graphical representations of the data, identifying candidate model variants for a future user study. |
2022 |
Journal Articles |
Rezaei, Mohammad Amin; Fathollahi, Arman; Rezaei, Sajad; Hu, Jiefeng; Gheisarnejad, Meysam; Teimouri, Ali Reza; Rituraj, Rituraj; Mosavi, Amirhosein; Khooban, Mohammad-Hassan Adaptation of A Real-Time Deep Learning Approach with An Analog Fault Detection Technique for Reliability Forecasting of Capacitor Banks Used in Mobile Vehicles Journal Article IEEE Access, 10 , pp. 132271-132287, 2022. @article{12980, title = {Adaptation of A Real-Time Deep Learning Approach with An Analog Fault Detection Technique for Reliability Forecasting of Capacitor Banks Used in Mobile Vehicles}, author = {Mohammad Amin Rezaei and Arman Fathollahi and Sajad Rezaei and Jiefeng Hu and Meysam Gheisarnejad and Ali Reza Teimouri and Rituraj Rituraj and Amirhosein Mosavi and Mohammad-Hassan Khooban}, year = {2022}, date = {2022-12-01}, journal = {IEEE Access}, volume = {10}, pages = {132271-132287}, publisher = {IEEE}, abstract = {The DC-Link capacitor is defined as the essential electronics element which sources or sinks the respective currents. The reliability of DC-link capacitor-banks (CBs) encounters many challenges due to their usage in electric vehicles. Heavy shocks may damage the internal capacitors without shutting down the CB. The fundamental development obstacles of CBs are: lack of considering capacitor degradation in reliability assessment, the impact of unforeseen sudden internal capacitor faults in forecasting CB lifetime, and the faults consequence on CB degradation. The sudden faults change the CB capacitance, which leads to reliability change. To more accurately estimate the reliability, the type of the fault needs to be detected for predicting the correct post-fault capacitance. To address these practical problems, a new CB model and reliability assessment formula covering all fault types are first presented, then, a new analog fault-detection method is presented, and a combination of online-learning long short-term memory (LSTM) and fault-detection method is subsequently performed, which adapt the sudden internal CB faults with the LSTM to correctly predict the CB degradation. To confirm the correct LSTM operation, four capacitors degradation is practically recorded for 2000-hours, and the off-line faultless degradation values predicted by the LSTM are compared with the actual data. The experimental findings validate the applicability of the proposed method. The codes and data are provided.}, keywords = {}, pubstate = {published}, tppubtype = {article} } The DC-Link capacitor is defined as the essential electronics element which sources or sinks the respective currents. The reliability of DC-link capacitor-banks (CBs) encounters many challenges due to their usage in electric vehicles. Heavy shocks may damage the internal capacitors without shutting down the CB. The fundamental development obstacles of CBs are: lack of considering capacitor degradation in reliability assessment, the impact of unforeseen sudden internal capacitor faults in forecasting CB lifetime, and the faults consequence on CB degradation. The sudden faults change the CB capacitance, which leads to reliability change. To more accurately estimate the reliability, the type of the fault needs to be detected for predicting the correct post-fault capacitance. To address these practical problems, a new CB model and reliability assessment formula covering all fault types are first presented, then, a new analog fault-detection method is presented, and a combination of online-learning long short-term memory (LSTM) and fault-detection method is subsequently performed, which adapt the sudden internal CB faults with the LSTM to correctly predict the CB degradation. To confirm the correct LSTM operation, four capacitors degradation is practically recorded for 2000-hours, and the off-line faultless degradation values predicted by the LSTM are compared with the actual data. The experimental findings validate the applicability of the proposed method. The codes and data are provided. |
Sandhu, Jasminder Kaur; Lilhore, Umesh Kumar; M, Poongodi; Kaur, Navpreet; Band, Shahab S; Hamdi, Mounir; Iwendi, Celestine; Simaiya, Sarita; Kamruzzaman, M M; Mosavi, Amirhosein Predicting the Risk of Heart Failure Based on Clinical Data Journal Article Human-centric Computing and Information Sciences, 12 , pp. 1322-1355, 2022. @article{12981, title = {Predicting the Risk of Heart Failure Based on Clinical Data}, author = {Jasminder Kaur Sandhu and Umesh Kumar Lilhore and Poongodi M and Navpreet Kaur and Shahab S Band and Mounir Hamdi and Celestine Iwendi and Sarita Simaiya and M M Kamruzzaman and Amirhosein Mosavi}, year = {2022}, date = {2022-12-01}, journal = {Human-centric Computing and Information Sciences}, volume = {12}, pages = {1322-1355}, publisher = {Kora Information Processing Soc (KIPS-CSWRG))}, abstract = {The disorder that directly impacts the heart and the blood vessels inside the body is cardiovascular disease (CVD). According to the World Health Organization reports, CVDs are the leading cause of mortality worldwide, claiming the human life of nearly 23.6 million people annually. The categorization of diseases in CVD includes coronary heart disease, strokes, and transient ischemic attacks (TIA), peripheral arterial disease, aortic disease. Most CVD fatalities are caused by strokes and heart attacks, with an estimated one-third of these deaths currently happening before 60. The standard medical organization "New York Heart Association" (NYHA) categorize the various stages of heart failure as Class I (with no symptoms), Class II (mild symptoms), Class III (comfortable only when in resting position), Class IV (severe condition or patient is bed-bound), and Class V (unable to determine the class). Machine learning-based methods play an essential role in clinical data analysis. This research presents the importance of various essential attributes related to heart disease based on a hybrid machine learning model. The proposed hybrid model SVM-GA is based on a support vector machine and the genetic algorithm. This research analyzed an online dataset obtainable at the UCI Machine Learning Repository with the medical data of 299 patients who suffered from heart failures and are classified as Class III or IV as per the standard NYHA. This dataset was collected through patients' available follow-up and checkup duration and involved thirteen clinical characteristics. The proposed machine learning models were used to calculate feature importance in this research. The proposed model and existing well-known machine learning based-models, i.e., Bayesian generalized linear model, ANN, Bagged CART, Bag Earth, and SVM, are implemented using Python and various performance measuring parameters, i.e., accuracy, processing time, precision, recall, F-measures are calculated. Experimental analysis shows the proposed SVM-GA model strengthens in terms of better accuracy, processing time, precision, recall, F-measures over existing methods.}, keywords = {}, pubstate = {published}, tppubtype = {article} } The disorder that directly impacts the heart and the blood vessels inside the body is cardiovascular disease (CVD). According to the World Health Organization reports, CVDs are the leading cause of mortality worldwide, claiming the human life of nearly 23.6 million people annually. The categorization of diseases in CVD includes coronary heart disease, strokes, and transient ischemic attacks (TIA), peripheral arterial disease, aortic disease. Most CVD fatalities are caused by strokes and heart attacks, with an estimated one-third of these deaths currently happening before 60. The standard medical organization "New York Heart Association" (NYHA) categorize the various stages of heart failure as Class I (with no symptoms), Class II (mild symptoms), Class III (comfortable only when in resting position), Class IV (severe condition or patient is bed-bound), and Class V (unable to determine the class). Machine learning-based methods play an essential role in clinical data analysis. This research presents the importance of various essential attributes related to heart disease based on a hybrid machine learning model. The proposed hybrid model SVM-GA is based on a support vector machine and the genetic algorithm. This research analyzed an online dataset obtainable at the UCI Machine Learning Repository with the medical data of 299 patients who suffered from heart failures and are classified as Class III or IV as per the standard NYHA. This dataset was collected through patients' available follow-up and checkup duration and involved thirteen clinical characteristics. The proposed machine learning models were used to calculate feature importance in this research. The proposed model and existing well-known machine learning based-models, i.e., Bayesian generalized linear model, ANN, Bagged CART, Bag Earth, and SVM, are implemented using Python and various performance measuring parameters, i.e., accuracy, processing time, precision, recall, F-measures are calculated. Experimental analysis shows the proposed SVM-GA model strengthens in terms of better accuracy, processing time, precision, recall, F-measures over existing methods. |
Manshadi, Mahsa; Mousavi, Milad; Soltani, M; Mosavi, Amirhosein; Kovacs, Levente Deep Learning for Modeling an Offshore Hybrid Wind–Wave Energy System Journal Article Energies, 15 , pp. 9484-9494, 2022. @article{12990, title = {Deep Learning for Modeling an Offshore Hybrid Wind–Wave Energy System}, author = {Mahsa Manshadi and Milad Mousavi and M Soltani and Amirhosein Mosavi and Levente Kovacs}, year = {2022}, date = {2022-12-01}, journal = {Energies}, volume = {15}, pages = {9484-9494}, publisher = {MDPI}, abstract = {The combination of an offshore wind turbine and a wave energy converter on an integrated platform is an economical solution for the electrical power demand in coastal countries. Due to the expensive installation cost, a prediction should be used to investigate whether the location is suitable for these sites. For this purpose, this research presents the feasibility of installing a combined hybrid site in the desired coastal location by predicting the net produced power due to the environmental parameters. For combining these two systems, an optimized array includes ten turbines and ten wave energy converters. The mathematical equations of the net force on the two introduced systems and the produced power of the wind turbines are proposed. The turbines’ maximum forces are 4 kN, and for the wave energy converters are 6 kN, respectively. Furthermore, the comparison is conducted in order to find the optimum system. The comparison shows that the most effective system of desired environmental condition is introduced. A number of machine learning and deep learning methods are used to predict key parameters after collecting the dataset. Moreover, a comparative analysis is conducted to find a suitable model. The models’ performance has been well studied through generating the confusion matrix and the receiver operating characteristic (ROC) curve of the hybrid site. The deep learning model outperformed other models, with an approximate accuracy of 0.96.}, keywords = {}, pubstate = {published}, tppubtype = {article} } The combination of an offshore wind turbine and a wave energy converter on an integrated platform is an economical solution for the electrical power demand in coastal countries. Due to the expensive installation cost, a prediction should be used to investigate whether the location is suitable for these sites. For this purpose, this research presents the feasibility of installing a combined hybrid site in the desired coastal location by predicting the net produced power due to the environmental parameters. For combining these two systems, an optimized array includes ten turbines and ten wave energy converters. The mathematical equations of the net force on the two introduced systems and the produced power of the wind turbines are proposed. The turbines’ maximum forces are 4 kN, and for the wave energy converters are 6 kN, respectively. Furthermore, the comparison is conducted in order to find the optimum system. The comparison shows that the most effective system of desired environmental condition is introduced. A number of machine learning and deep learning methods are used to predict key parameters after collecting the dataset. Moreover, a comparative analysis is conducted to find a suitable model. The models’ performance has been well studied through generating the confusion matrix and the receiver operating characteristic (ROC) curve of the hybrid site. The deep learning model outperformed other models, with an approximate accuracy of 0.96. |
Hartmann, Mareike; Du, Han; Feldhus, Nils; Kruijff-Korbayová, Ivana; Sonntag, Daniel XAINES: Explaining AI with Narratives Journal Article KI - Künstliche Intelligenz, German Journal on Artificial Intelligence - Organ des Fachbereiches "Künstliche Intelligenz" der Gesellschaft für Informatik e.V., 36 , pp. 287-296, 2022. @article{13116, title = {XAINES: Explaining AI with Narratives}, author = {Mareike Hartmann and Han Du and Nils Feldhus and Ivana Kruijff-Korbayová and Daniel Sonntag}, editor = {Ute Schmid and Britta Wrede}, url = {https://www.dfki.de/fileadmin/user_upload/import/13116_s13218-022-00780-8.pdf}, doi = {https://doi.org/10.1007/s13218-022-00780-8}, year = {2022}, date = {2022-12-01}, journal = {KI - Künstliche Intelligenz, German Journal on Artificial Intelligence - Organ des Fachbereiches "Künstliche Intelligenz" der Gesellschaft für Informatik e.V.}, volume = {36}, pages = {287-296}, publisher = {Springer}, abstract = {Artificial Intelligence (AI) systems are increasingly pervasive: Internet of Things, in-car intelligent devices, robots, and virtual assistants, and their large-scale adoption makes it necessary to explain their behaviour, for example to their users who are impacted by their decisions, or to their developers who need to ensure their functionality. This requires, on the one hand, to obtain an accurate representation of the chain of events that caused the system to behave in a certain way (e.g., to make a specific decision). On the other hand, this causal chain needs to be communicated to the users depending on their needs and expectations. In this phase of explanation delivery, allowing interaction between user and model has the potential to improve both model quality and user experience. The XAINES project investigates the explanation of AI systems through narratives targeted to the needs of a specific audience, focusing on two important aspects that are crucial for enabling successful explanation: generating and selecting appropriate explanation content, i.e. the information to be contained in the explanation, and delivering this information to the user in an appropriate way. In this article, we present the project’s roadmap towards enabling the explanation of AI with narratives.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Artificial Intelligence (AI) systems are increasingly pervasive: Internet of Things, in-car intelligent devices, robots, and virtual assistants, and their large-scale adoption makes it necessary to explain their behaviour, for example to their users who are impacted by their decisions, or to their developers who need to ensure their functionality. This requires, on the one hand, to obtain an accurate representation of the chain of events that caused the system to behave in a certain way (e.g., to make a specific decision). On the other hand, this causal chain needs to be communicated to the users depending on their needs and expectations. In this phase of explanation delivery, allowing interaction between user and model has the potential to improve both model quality and user experience. The XAINES project investigates the explanation of AI systems through narratives targeted to the needs of a specific audience, focusing on two important aspects that are crucial for enabling successful explanation: generating and selecting appropriate explanation content, i.e. the information to be contained in the explanation, and delivering this information to the user in an appropriate way. In this article, we present the project’s roadmap towards enabling the explanation of AI with narratives. |
YAN, SHU-RONG; TIAN, MANWEN; ALATTAS, KHALID A; MOHAMADZADEH, ARDASHIR; SABZALIAN, MOHAMMAD; Mosavi, Amirhosein An Experimental Machine Learning Approach for Mid-Term Energy Demand Forecasting Journal Article IEEE Access, 10 , pp. 118926-118940, 2022. @article{12991, title = {An Experimental Machine Learning Approach for Mid-Term Energy Demand Forecasting}, author = {SHU-RONG YAN and MANWEN TIAN and KHALID A ALATTAS and ARDASHIR MOHAMADZADEH and MOHAMMAD SABZALIAN and Amirhosein Mosavi}, year = {2022}, date = {2022-11-01}, journal = {IEEE Access}, volume = {10}, pages = {118926-118940}, publisher = {IEEE}, abstract = {In this study, a neural network-based approach is designed for mid-term load forecasting (MTLF). The structure and hyperparameters are tuned to obtain the best forecasting accuracy one year ahead. The suggested approach is practically applied to a region in Iran by the use of real-world data sets of 10 years. The influential factors such as economic, weather, and social factors are investigated, and their impact on accuracy is numerically analyzed. The bad data are detected by a suggested effective method. In addition to load peak, the 24-hours load pattern is also predicted, which helps for better mid-term planning. The simulations show that the suggested approach is practical, and the accuracy is more than 95%, even when there are drastic weather changes.}, keywords = {}, pubstate = {published}, tppubtype = {article} } In this study, a neural network-based approach is designed for mid-term load forecasting (MTLF). The structure and hyperparameters are tuned to obtain the best forecasting accuracy one year ahead. The suggested approach is practically applied to a region in Iran by the use of real-world data sets of 10 years. The influential factors such as economic, weather, and social factors are investigated, and their impact on accuracy is numerically analyzed. The bad data are detected by a suggested effective method. In addition to load peak, the 24-hours load pattern is also predicted, which helps for better mid-term planning. The simulations show that the suggested approach is practical, and the accuracy is more than 95%, even when there are drastic weather changes. |
Ott, Torben; Masset, Paul; Gouvea, Thiago; Kepecs, Adam Apparent sunk cost effect in rational agents Journal Article Science Advances, 8 , pp. 1-10, 2022. @article{12243, title = {Apparent sunk cost effect in rational agents}, author = {Torben Ott and Paul Masset and Thiago Gouvea and Adam Kepecs}, url = {https://www.science.org/doi/10.1126/sciadv.abi7004}, year = {2022}, date = {2022-02-01}, journal = {Science Advances}, volume = {8}, pages = {1-10}, publisher = {American Association for the Advancement of Science}, abstract = {Rational decision makers aim to maximize their gains, but humans and other animals often fail to do so, exhibiting biases and distortions in their choice behavior. In a recent study of economic decisions, humans, mice, and rats were reported to succumb to the sunk cost fallacy, making decisions based on irrecoverable past investments to the detriment of expected future returns. We challenge this interpretation because it is subject to a statistical fallacy, a form of attrition bias, and the observed behavior can be explained without invoking a sunk cost–dependent mechanism. Using a computational model, we illustrate how a rational decision maker with a reward-maximizing decision strategy reproduces the reported behavioral pattern and propose an improved task design to dissociate sunk costs from fluctuations in decision valuation. Similar statistical confounds may be common in analyses of cognitive behaviors, highlighting the need to use causal statistical inference and generative models for interpretation.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Rational decision makers aim to maximize their gains, but humans and other animals often fail to do so, exhibiting biases and distortions in their choice behavior. In a recent study of economic decisions, humans, mice, and rats were reported to succumb to the sunk cost fallacy, making decisions based on irrecoverable past investments to the detriment of expected future returns. We challenge this interpretation because it is subject to a statistical fallacy, a form of attrition bias, and the observed behavior can be explained without invoking a sunk cost–dependent mechanism. Using a computational model, we illustrate how a rational decision maker with a reward-maximizing decision strategy reproduces the reported behavioral pattern and propose an improved task design to dissociate sunk costs from fluctuations in decision valuation. Similar statistical confounds may be common in analyses of cognitive behaviors, highlighting the need to use causal statistical inference and generative models for interpretation. |
Barz, Michael; Bhatti, Omair Shahzad; Sonntag, Daniel Implicit Estimation of Paragraph Relevance from Eye Movements Journal Article Frontiers in Computer Science, 3 , pp. 13, 2022. @article{12165, title = {Implicit Estimation of Paragraph Relevance from Eye Movements}, author = {Michael Barz and Omair Shahzad Bhatti and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/12165_fcomp-03-808507.pdf https://www.frontiersin.org/articles/10.3389/fcomp.2021.808507}, year = {2022}, date = {2022-01-01}, journal = {Frontiers in Computer Science}, volume = {3}, pages = {13}, publisher = {Frontiers Media S.A.}, abstract = {Eye movements were shown to be an effective source of implicit relevance feedback in constrained search and decision-making tasks. Recent research suggests that gaze-based features, extracted from scanpaths over short news articles (g-REL), can reveal the perceived relevance of read text with respect to a previously shown trigger question. In this work, we aim to confirm this finding and we investigate whether it generalizes to multi-paragraph documents from Wikipedia (Google Natural Questions) that require readers to scroll down to read the whole text. We conduct a user study (n=24) in which participants read single- and multi-paragraph articles and rate their relevance at the paragraph level with respect to a trigger question. We model the perceived document relevance using machine learning and features from the literature as input. Our results confirm that eye movements can be used to effectively model the relevance of short news articles, in particular if we exclude difficult cases: documents which are on topic of the trigger questions but irrelevant. However, our results do not clearly show that the modeling approach generalizes to multi-paragraph document settings. We publish our dataset and our code for feature extraction under an open source license to enable future research in the field of gaze-based implicit relevance feedback.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Eye movements were shown to be an effective source of implicit relevance feedback in constrained search and decision-making tasks. Recent research suggests that gaze-based features, extracted from scanpaths over short news articles (g-REL), can reveal the perceived relevance of read text with respect to a previously shown trigger question. In this work, we aim to confirm this finding and we investigate whether it generalizes to multi-paragraph documents from Wikipedia (Google Natural Questions) that require readers to scroll down to read the whole text. We conduct a user study (n=24) in which participants read single- and multi-paragraph articles and rate their relevance at the paragraph level with respect to a trigger question. We model the perceived document relevance using machine learning and features from the literature as input. Our results confirm that eye movements can be used to effectively model the relevance of short news articles, in particular if we exclude difficult cases: documents which are on topic of the trigger questions but irrelevant. However, our results do not clearly show that the modeling approach generalizes to multi-paragraph document settings. We publish our dataset and our code for feature extraction under an open source license to enable future research in the field of gaze-based implicit relevance feedback. |
Nguyen, Ho Minh Duy; Nguyen, Thu T; Vu, Huong; Pham, Quang; Nguyen, Manh-Duy; Nguyen, Binh T; Sonntag, Daniel TATL: Task Agnostic Transfer Learning for Skin Attributes Detection Journal Article Medical Image Analysis, 01 , pp. 1-27, 2022. @article{12216, title = {TATL: Task Agnostic Transfer Learning for Skin Attributes Detection}, author = {Ho Minh Duy Nguyen and Thu T Nguyen and Huong Vu and Quang Pham and Manh-Duy Nguyen and Binh T Nguyen and Daniel Sonntag}, url = {https://arxiv.org/pdf/2104.01641.pdf}, year = {2022}, date = {2022-01-01}, journal = {Medical Image Analysis}, volume = {01}, pages = {1-27}, publisher = {Elsevier}, abstract = {Existing skin attributes detection methods usually initialize with a pre-trained Imagenet network and then fine-tune on a medical target task. However, we argue that such approaches are suboptimal because medical datasets are largely different from ImageNet and often contain limited training samples. In this work, we propose Task Agnostic Transfer Learning (TATL), a novel framework motivated by dermatologists' behaviors in the skincare context. TATL learns an attribute-agnostic segmenter that detects lesion skin regions and then transfers this knowledge to a set of attribute-specific classifiers to detect each particular attribute. Since TATL's attribute-agnostic segmenter only detects skin attribute regions, it enjoys ample data from all attributes, allows transferring knowledge among features, and compensates for the lack of training data from rare attributes. We conduct extensive experiments to evaluate the proposed TATL transfer learning mechanism with various neural network architectures on two popular skin attributes detection benchmarks. The empirical results show that TATL not only works well with multiple architectures but also can achieve state-of-the-art performances, while enjoying minimal model and computational complexities. We also provide theoretical insights and explanations for why our transfer learning framework performs well in practice.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Existing skin attributes detection methods usually initialize with a pre-trained Imagenet network and then fine-tune on a medical target task. However, we argue that such approaches are suboptimal because medical datasets are largely different from ImageNet and often contain limited training samples. In this work, we propose Task Agnostic Transfer Learning (TATL), a novel framework motivated by dermatologists' behaviors in the skincare context. TATL learns an attribute-agnostic segmenter that detects lesion skin regions and then transfers this knowledge to a set of attribute-specific classifiers to detect each particular attribute. Since TATL's attribute-agnostic segmenter only detects skin attribute regions, it enjoys ample data from all attributes, allows transferring knowledge among features, and compensates for the lack of training data from rare attributes. We conduct extensive experiments to evaluate the proposed TATL transfer learning mechanism with various neural network architectures on two popular skin attributes detection benchmarks. The empirical results show that TATL not only works well with multiple architectures but also can achieve state-of-the-art performances, while enjoying minimal model and computational complexities. We also provide theoretical insights and explanations for why our transfer learning framework performs well in practice. |
Volkmar, Georg; Alexandrovsky, Dmitry; Eilks, Asmus Eike; Queck, Dirk; Herrlich, Marc; Malaka, Rainer Effects of PCG on Creativity in Playful City-Building Environments in VR Journal Article Proceedings of the ACM on Human-Computer Interaction, 6 , pp. 1-20, 2022. @article{12840, title = {Effects of PCG on Creativity in Playful City-Building Environments in VR}, author = {Georg Volkmar and Dmitry Alexandrovsky and Asmus Eike Eilks and Dirk Queck and Marc Herrlich and Rainer Malaka}, year = {2022}, date = {2022-01-01}, journal = {Proceedings of the ACM on Human-Computer Interaction}, volume = {6}, pages = {1-20}, publisher = {Association for Computing Machinery}, abstract = {The use of procedural content generation (PCG) in the context of video games has increased over the years as it provides an economical way to generate game content whilst enhancing their variety and replayability. For city-building games, this approach is often utilized to predefine map layouts, terrains, or cityscapes for the player. One core aspect of facilitating enjoyment in these games comes from creative expressivity. PCG, in this context, may support creativity by lowering the technical complexity for content creation, or it may hinder creativity by taking away control and freedom from the user. To examine these potential effects, this paper investigates if PCG has an impact on players' creativity in the context of VR city-building games. We present a VR prototype that provides varying degrees of procedural content: No PCG, terrain generation, city generation, and full (city + terrain) generation. In a remote user study, these conditions were compared regarding their capability to support creativity. Statistical tests for equivalence revealed that the presence of PCG did not affect creativity in any way. Our work suggests that PCG can be a useful integration into city-building games without notably decreasing players' ability to express themselves creatively.}, keywords = {}, pubstate = {published}, tppubtype = {article} } The use of procedural content generation (PCG) in the context of video games has increased over the years as it provides an economical way to generate game content whilst enhancing their variety and replayability. For city-building games, this approach is often utilized to predefine map layouts, terrains, or cityscapes for the player. One core aspect of facilitating enjoyment in these games comes from creative expressivity. PCG, in this context, may support creativity by lowering the technical complexity for content creation, or it may hinder creativity by taking away control and freedom from the user. To examine these potential effects, this paper investigates if PCG has an impact on players' creativity in the context of VR city-building games. We present a VR prototype that provides varying degrees of procedural content: No PCG, terrain generation, city generation, and full (city + terrain) generation. In a remote user study, these conditions were compared regarding their capability to support creativity. Statistical tests for equivalence revealed that the presence of PCG did not affect creativity in any way. Our work suggests that PCG can be a useful integration into city-building games without notably decreasing players' ability to express themselves creatively. |
2024 |
Journal Articles |
A review of machine learning in scanpath analysis for passive gaze-based interaction Journal Article Frontiers in Artificial Intelligence, 7 , pp. 1-28, 2024. |
A References Architecture for Human Cyber Physical Systems, Part II: Fundamental Design Principles for Human-CPS Interaction Journal Article ACM Transactions on Cyber-Physical Systems, 8 , pp. 1-27, 2024. |
A Reference Architecture of Human Cyber-Physical Systems – Part I: Fundamental Concepts Journal Article ACM Transactions on Cyber-Physical Systems, 8 , pp. 1-32, 2024. |
A Reference Architecture of Human Cyber-Physical Systems - Part III: Semantic Foundations Journal Article ACM Transactions on Cyber-Physical Systems, 8 , pp. 1-23, 2024. |
Book Chapters |
MASTER-XR: Mixed reAlity ecoSystem for TEaching Robotics in manufacturing Book Chapter Alam, Mohammad-Reza; Fathi, Madjid (Ed.): Integrated Systems: Innovations and Applications: Results of the 8th International Conference on Integrated Systems Design and Technology (ISDT 2023), pp. 1-16, Springer, 2024. |
Inproceedings |
Speech Imagery BCI Training Using Game with a Purpose Inproceedings Proceedings of the 2024 International Conference on Advanced Visual Interfaces, pp. 1-5, Association for Computing Machinery, 2024. |
Optimizing Relation Extraction in Medical Texts through Active Learning: A Comparative Analysis of Trade-offs Inproceedings Association for Computational Linguistics, ACL Anthology, 2024. |
Leveraging Transfer Learning and Active Learning for Sound Event Detection in Passive Acoustic Monitoring of Wildlife Inproceedings 3rd Annual AAAI Workshop on AI to Accelerate Science and Engineering, o.A., 2024. |
BirdNET-Annotator: AI-Assisted Strong Labelling of Bird Sound Datasets Inproceedings 3rd Annual AAAI Workshop on AI to Accelerate Science and Engineering, o.A., 2024. |
Leveraging Sound Collections for Animal Species Classification with Weakly Supervised Learning Inproceedings 3rd Annual AAAI Workshop on AI to Accelerate Science and Engineering, o.A., 2024. |
Building A German Clinical Named Entity Recognition System without In-domain Training Data Inproceedings Association for Computational Linguistics, ACL Anthology, 2024. |
Miscellaneous |
Modular Deep Active Learning Framework for Image Annotation: A Technical Report for the Ophthalmo-AI Project Miscellaneous 2024. |
2023 |
Journal Articles |
Thermal prediction for energy management of clouds using a hybrid model based on CNN and stacking multi-layer bi-directional LSTM Journal Article Energy Reports, 9 , pp. 2253-2268, 2023. |
Cross-Viewpoint Semantic Mapping: Integrating Human and Robot Perspectives for Improved 3D Semantic Reconstruction Journal Article Sensors - Open Access Journal, 23 , pp. 1-17, 2023. |
Avoid Predatory Journals Journal Article KI - Künstliche Intelligenz, German Journal on Artificial Intelligence - Organ des Fachbereiches "Künstliche Intelligenz" der Gesellschaft für Informatik e.V., 37 , pp. 1-3, 2023. |
Predicting longitudinal dispersion coefficient using ensemble models and optimized multi-layer perceptron models Journal Article Ain Shams Engineering Journal, 10 , pp. 2253-2277, 2023. |
An integrated GIS-based multivariate adaptive regression splines-cat swarm optimization for improving the accuracy of wildfire susceptibility mapping Journal Article Geocarto International, 38 , pp. 1-25, 2023. |
Modeling Climate Change Effects on the Distribution of Oak Forests with Machine Learning Journal Article Forests, 14 , pp. 13220-13233, 2023. |
Digital ink and differentiated subjective ratings for cognitive load measurement in middle childhood Journal Article British Journal of Educational Psychology, n/a , pp. 18, 2023. |
Book Chapters |
Harmonizing Feature Attributions Across Deep Learning Architectures: Enhancing Interpretability and Consistency Book Chapter German Conference on Artificial Intelligence, pp. 90-97, Springer, Cham, 2023. |
Inproceedings |
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching Inproceedings The Thirty-Seventh Annual Conference on Neural Information Processing Systems (NeurIPS 2023), Advances in Neural Information Processing Systems, 2023. |
On the Out of Distribution Robustness of Foundation Models in Medical Image Segmentation Inproceedings The Thirty-Seventh Annual Conference on Neural Information Processing Systems (NeurIPS 2023), Advances in Neural Information Processing Systems, 2023. |
Interactive Link Prediction as a Downstream Task for Foundational GUI Understanding Models Inproceedings Seipel, Dietmar; Steen, Alexander (Ed.): KI 2023: Advances in Artificial Intelligence, pp. 75-89, Springer Nature Switzerland, 2023. |
Lost in Dialogue: A Review and Categorisation of Current Dialogue System Approaches and Technical Solutions Inproceedings Seipel, Dietmar; Steen, Alexander (Ed.): KI 2023: Advances in Artificial Intelligence, pp. 98-113, Springer, 2023. |
Towards Adaptable and Interactive Image Captioning with Data Augmentation and Episodic Memory Inproceedings Proceedings of The Fourth Workshop on Simple and Efficient Natural Language Processing (SustaiNLP), Association for Computational Linguistics, 2023. |
Giving DIAnA More TIME – Guidance for the Design of XAI-Based Medical Decision Support Systems Inproceedings 18th International Conference on Design Science Research in Information Systems and Technology, DESRIST 2023, Springer Nature Switzerland, 2023. |
Joint Self-Supervised Image-Volume Representation Learning with Intra-Inter Contrastive Clustering Inproceedings Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-2023), AAAI Press, 2023. |
Interactive Fixation-to-AOI Mapping for Mobile Eye Tracking Data Based on Few-Shot Image Classification Inproceedings Companion Proceedings of the 28th International Conference on Intelligent User Interfaces, pp. 175-178, Association for Computing Machinery, 2023. |
IMETA: An Interactive Mobile Eye Tracking Annotation Method for Semi-Automatic Fixation-to-AOI Mapping Inproceedings Companion Proceedings of the 28th International Conference on Intelligent User Interfaces, pp. 33-36, Association for Computing Machinery, 2023. |
A User Interface for Explaining Machine Learning Model Explanations Inproceedings Companion Proceedings of the 28th International Conference on Intelligent User Interfaces, pp. 59–63, Association for Computing Machinery, Sydney, NSW, Australia, 2023, ISBN: 9798400701078. |
Interactive Machine Learning Solutions for Acoustic Monitoring of Animal Wildlife in Biosphere Reserves Inproceedings Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, International Joint Conferences on Artificial Intelligence, 2023. |
Developing Team Design Patterns for Hybrid Intelligence Systems Inproceedings Frontiers in Artificial Intelligence and Applications, IOS Press, 2023. |
A Human-in-the-Loop Tool for Annotating Passive Acoustic Monitoring Datasets Inproceedings Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence. International Joint Conference on Artificial Intelligence (IJCAI-2023), located at IJCAI, August 19-25, Macao, Macao, International Joint Conferences on Artificial Intelligence, 2023. |
Cross-domain German Medical Named Entity Recognition using a Pre-Trained Language Model and Unified Medical Semantic Types Inproceedings Association for Computational Linguistics, ACL, 2023. |
Detection of Contract Cheating in Pen-and-Paper Exams through the Analysis of Handwriting Style Inproceedings Companion Publication of the 25th International Conference on Multimodal Interaction, pp. 26-30, Association for Computing Machinery, 2023. |
EdgeAL: An Edge Estimation Based Active Learning Approach for OCT Segmentation Inproceedings International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 79-89, Springer, Cham, 2023. |
User Driven Functionality Deletion for Mobile Apps Inproceedings IEEE International Requirements Engineering Conference, IEEE, 2023. |
Evaluation Metrics for XAI: A Review, Taxonomy, and Practical Applications Inproceedings 2023 IEEE 27th International Conference on Intelligent Engineering Systems (INES), pp. 000111-000124, IEEE, 2023. |
Miscellaneous |
Fine-tuning of explainable CNNs for skin lesion classification based on dermatologists' feedback towards increasing trust Miscellaneous 2023. |
Technical Reports |
A Virtual Reality Tool for Representing, Visualizing and Updating Deep Learning Models Technical Report DFKI , 2023. |
A Deep Generative Model for Interactive Data Annotation through Direct Manipulation in Latent Space Technical Report DFKI , 2023. |
2022 |
Journal Articles |
Adaptation of A Real-Time Deep Learning Approach with An Analog Fault Detection Technique for Reliability Forecasting of Capacitor Banks Used in Mobile Vehicles Journal Article IEEE Access, 10 , pp. 132271-132287, 2022. |
Predicting the Risk of Heart Failure Based on Clinical Data Journal Article Human-centric Computing and Information Sciences, 12 , pp. 1322-1355, 2022. |
Deep Learning for Modeling an Offshore Hybrid Wind–Wave Energy System Journal Article Energies, 15 , pp. 9484-9494, 2022. |
XAINES: Explaining AI with Narratives Journal Article KI - Künstliche Intelligenz, German Journal on Artificial Intelligence - Organ des Fachbereiches "Künstliche Intelligenz" der Gesellschaft für Informatik e.V., 36 , pp. 287-296, 2022. |
An Experimental Machine Learning Approach for Mid-Term Energy Demand Forecasting Journal Article IEEE Access, 10 , pp. 118926-118940, 2022. |
Apparent sunk cost effect in rational agents Journal Article Science Advances, 8 , pp. 1-10, 2022. |
Implicit Estimation of Paragraph Relevance from Eye Movements Journal Article Frontiers in Computer Science, 3 , pp. 13, 2022. |
TATL: Task Agnostic Transfer Learning for Skin Attributes Detection Journal Article Medical Image Analysis, 01 , pp. 1-27, 2022. |
Effects of PCG on Creativity in Playful City-Building Environments in VR Journal Article Proceedings of the ACM on Human-Computer Interaction, 6 , pp. 1-20, 2022. |