2023 |
Journal Articles |
Tabrizchi, Hamed; Razmara, Jafar; Mosavi, Amirhosein Thermal prediction for energy management of clouds using a hybrid model based on CNN and stacking multi-layer bi-directional LSTM Journal Article Energy Reports, 9 , pp. 2253-2268, 2023. @article{12997, title = {Thermal prediction for energy management of clouds using a hybrid model based on CNN and stacking multi-layer bi-directional LSTM}, author = {Hamed Tabrizchi and Jafar Razmara and Amirhosein Mosavi}, year = {2023}, date = {2023-12-01}, journal = {Energy Reports}, volume = {9}, pages = {2253-2268}, publisher = {Elsevier}, abstract = {The fast advancement of technology and developers’ utilization of data centers have dramatically increased energy usage in today’s society. Thermal control is a key issue in hyper-scale cloud data centers. Hotspots form when the temperature of the host rises, increasing cooling costs and affecting dependability. Precise estimation of host temperatures is critical for optimal resource management. Thermal changes in the data center make estimating temperature a difficult challenge. Existing temperature estimating algorithms are ineffective due to their processing complexity as well as lack of accuracy. Regarding that data-driven approaches seem promising for temperature prediction, this research offers a unique efficient temperature prediction model. The model uses a combination of convolutional neural networks (CNN) and stacking multi-layer bi-directional long-term short memory (BiLSTM) for thermal prediction. The findings of the experiments reveal that the model successfully anticipates the temperature with the highest value of 97.15% and the lowest error rate of RMSE value of 0.2892, and an RMAE of 0.5003, which decreases the projection error as opposed to the other method.}, keywords = {}, pubstate = {published}, tppubtype = {article} } The fast advancement of technology and developers’ utilization of data centers have dramatically increased energy usage in today’s society. Thermal control is a key issue in hyper-scale cloud data centers. Hotspots form when the temperature of the host rises, increasing cooling costs and affecting dependability. Precise estimation of host temperatures is critical for optimal resource management. Thermal changes in the data center make estimating temperature a difficult challenge. Existing temperature estimating algorithms are ineffective due to their processing complexity as well as lack of accuracy. Regarding that data-driven approaches seem promising for temperature prediction, this research offers a unique efficient temperature prediction model. The model uses a combination of convolutional neural networks (CNN) and stacking multi-layer bi-directional long-term short memory (BiLSTM) for thermal prediction. The findings of the experiments reveal that the model successfully anticipates the temperature with the highest value of 97.15% and the lowest error rate of RMSE value of 0.2892, and an RMAE of 0.5003, which decreases the projection error as opposed to the other method. |
Gholami, Mahsa; Ghanbari-Adivi, Elham; Ehteram, Mohammad; Singh, Vijay P; Ahmed, Ali Najah; Mosavi, Amirhosein; El-Shafie, Ahmed Predicting longitudinal dispersion coefficient using ensemble models and optimized multi-layer perceptron models Journal Article Ain Shams Engineering Journal, 10 , pp. 2253-2277, 2023. @article{13154, title = {Predicting longitudinal dispersion coefficient using ensemble models and optimized multi-layer perceptron models}, author = {Mahsa Gholami and Elham Ghanbari-Adivi and Mohammad Ehteram and Vijay P Singh and Ali Najah Ahmed and Amirhosein Mosavi and Ahmed El-Shafie}, year = {2023}, date = {2023-04-01}, journal = {Ain Shams Engineering Journal}, volume = {10}, pages = {2253-2277}, publisher = {Elsevier}, abstract = {Prediction of the longitudinal dispersion coefficient (LDC) is essential for the river and water resources engineering and environmental management. This study proposes ensemble models for predicting LDC based on multilayer perceptron (MULP) methods and optimization algorithms. The honey badger optimization algorithm (HBOA), salp swarm algorithm (SASA), firefly algorithm (FIFA), and particle swarm optimization algorithm (PASOA) are used to adjust the MULP parameters. Then, the outputs of the MULP-HBOA, MULP-SASA, MULP-PASOA, MULP-FIFA, and MULP models were incorporated into an inclusive multiple model (IMM). For IMM at the testing level, the mean absolute error (MEAE) was 15, whereas it was 17, 18, 23, 24, and 25 for the MULP-HBOA, MULP-SASA, MULP-FIFA, MULP-PASOA, and MULP models. The study also modified the structure of MULP models using a goodness factor which decreased the CPU time. Removing redundant neurons reduces CPU time. Thus, the modified ANN model and the suggested IMM model can decrease the computational time and further improve the performance of models}, keywords = {}, pubstate = {published}, tppubtype = {article} } Prediction of the longitudinal dispersion coefficient (LDC) is essential for the river and water resources engineering and environmental management. This study proposes ensemble models for predicting LDC based on multilayer perceptron (MULP) methods and optimization algorithms. The honey badger optimization algorithm (HBOA), salp swarm algorithm (SASA), firefly algorithm (FIFA), and particle swarm optimization algorithm (PASOA) are used to adjust the MULP parameters. Then, the outputs of the MULP-HBOA, MULP-SASA, MULP-PASOA, MULP-FIFA, and MULP models were incorporated into an inclusive multiple model (IMM). For IMM at the testing level, the mean absolute error (MEAE) was 15, whereas it was 17, 18, 23, 24, and 25 for the MULP-HBOA, MULP-SASA, MULP-FIFA, MULP-PASOA, and MULP models. The study also modified the structure of MULP models using a goodness factor which decreased the CPU time. Removing redundant neurons reduces CPU time. Thus, the modified ANN model and the suggested IMM model can decrease the computational time and further improve the performance of models |
Hai, Tao; Sayed, Biju Theruvil; Majdi, Ali; Zhou, Jincheng; Sagban, Rafid; Band, Shahab S; Mosavi, Amirhosein An integrated GIS-based multivariate adaptive regression splines-cat swarm optimization for improving the accuracy of wildfire susceptibility mapping Journal Article Geocarto International, 38 , pp. 1-25, 2023. @article{13025, title = {An integrated GIS-based multivariate adaptive regression splines-cat swarm optimization for improving the accuracy of wildfire susceptibility mapping}, author = {Tao Hai and Biju Theruvil Sayed and Ali Majdi and Jincheng Zhou and Rafid Sagban and Shahab S Band and Amirhosein Mosavi}, year = {2023}, date = {2023-01-01}, journal = {Geocarto International}, volume = {38}, pages = {1-25}, publisher = {Taylor & Francis}, abstract = {A hybrid machine learning method is proposed for wildfire susceptibility mapping. For modeling a geographical information system (GIS) database including 11 influencing factors and 262 fire locations from 2013 to 2018 is used for developing an integrated multivariate adaptive regression splines (MARS). The cat swarm optimization (CSO) algorithm tunes the parameters of the MARS in order to generate accurate susceptibility maps. From the Pearson correlation results, it is observed that land use, temperature, and slope angle have strong correlation with the fire severity. The results demonstrate that the prediction capability of the MARS-CSO model outperforms model tree, reduced error pruning tree and MARS. The resulting wildfire risk map using MARS-CSO reveals that 20% of the study areas is categorized in the very low wildfire risk class, whereas 40% is under the very high class of fire hazard.}, keywords = {}, pubstate = {published}, tppubtype = {article} } A hybrid machine learning method is proposed for wildfire susceptibility mapping. For modeling a geographical information system (GIS) database including 11 influencing factors and 262 fire locations from 2013 to 2018 is used for developing an integrated multivariate adaptive regression splines (MARS). The cat swarm optimization (CSO) algorithm tunes the parameters of the MARS in order to generate accurate susceptibility maps. From the Pearson correlation results, it is observed that land use, temperature, and slope angle have strong correlation with the fire severity. The results demonstrate that the prediction capability of the MARS-CSO model outperforms model tree, reduced error pruning tree and MARS. The resulting wildfire risk map using MARS-CSO reveals that 20% of the study areas is categorized in the very low wildfire risk class, whereas 40% is under the very high class of fire hazard. |
Mirhashemi, Hengameh; Heydari, Mehdi; Karami, Omid; Ahmadi, Kourosh; Mosavi, Amirhosein Modeling Climate Change Effects on the Distribution of Oak Forests with Machine Learning Journal Article Forests, 14 , pp. 13220-13233, 2023. @article{13155, title = {Modeling Climate Change Effects on the Distribution of Oak Forests with Machine Learning}, author = {Hengameh Mirhashemi and Mehdi Heydari and Omid Karami and Kourosh Ahmadi and Amirhosein Mosavi}, year = {2023}, date = {2023-01-01}, journal = {Forests}, volume = {14}, pages = {13220-13233}, publisher = {MDPI}, abstract = {The present study models the effect of climate change on the distribution of Persian oak (Quercus brantii Lindl.) in the Zagros forests, located in the west of Iran. The modeling is conducted under the current and future climatic conditions by fitting the machine learning method of the Bayesian additive regression tree (BART). For the anticipation of the potential habitats for the Persian oak, two general circulation models (GCMs) of CCSM4 and HADGEM2-ES under the representative concentration pathways (RCPs) of 2.6 and 8.5 for 2050 and 2070 are used. The mean temperature (MT) of the wettest quarter (bio8), solar radiation, slope and precipitation of the wettest month (bio13) are respectively reported as the most important variables in the modeling. The results indicate that the suitable habitat of Persian oak will significantly decrease in the future under both climate change scenarios as much as 75.06% by 2070. The proposed study brings insight into the current condition and further projects the future conditions of the local forests for proper management and protection of endangered ecosystems.}, keywords = {}, pubstate = {published}, tppubtype = {article} } The present study models the effect of climate change on the distribution of Persian oak (Quercus brantii Lindl.) in the Zagros forests, located in the west of Iran. The modeling is conducted under the current and future climatic conditions by fitting the machine learning method of the Bayesian additive regression tree (BART). For the anticipation of the potential habitats for the Persian oak, two general circulation models (GCMs) of CCSM4 and HADGEM2-ES under the representative concentration pathways (RCPs) of 2.6 and 8.5 for 2050 and 2070 are used. The mean temperature (MT) of the wettest quarter (bio8), solar radiation, slope and precipitation of the wettest month (bio13) are respectively reported as the most important variables in the modeling. The results indicate that the suitable habitat of Persian oak will significantly decrease in the future under both climate change scenarios as much as 75.06% by 2070. The proposed study brings insight into the current condition and further projects the future conditions of the local forests for proper management and protection of endangered ecosystems. |
Altmeyer, Kristin; Barz, Michael; Lauer, Luisa; Peschel, Markus; Sonntag, Daniel; Brünken, Roland; Malone, Sarah Digital ink and differentiated subjective ratings for cognitive load measurement in middle childhood Journal Article British Journal of Educational Psychology, n/a , pp. 18, 2023. @article{13195, title = {Digital ink and differentiated subjective ratings for cognitive load measurement in middle childhood}, author = {Kristin Altmeyer and Michael Barz and Luisa Lauer and Markus Peschel and Daniel Sonntag and Roland Brünken and Sarah Malone}, url = {https://www.dfki.de/fileadmin/user_upload/import/13195_Brit_J_of_Edu_Psychol_-_2023_-_Altmeyer_-_Digital_ink_and_differentiated_subjective_ratings_for_cognitive_load_measurement.pdf https://bpspsychub.onlinelibrary.wiley.com/doi/abs/10.1111/bjep.12595}, year = {2023}, date = {2023-01-01}, journal = {British Journal of Educational Psychology}, volume = {n/a}, pages = {18}, publisher = {John Wiley & Sons, Ltd}, abstract = {Abstract Background New methods are constantly being developed to adapt cognitive load measurement to different contexts. However, research on middle childhood students' cognitive load measurement is rare. Research indicates that the three cognitive load dimensions (intrinsic, extraneous, and germane) can be measured well in adults and teenagers using differentiated subjective rating instruments. Moreover, digital ink recorded by smartpens could serve as an indicator for cognitive load in adults. Aims With the present research, we aimed at investigating the relation between subjective cognitive load ratings, velocity and pressure measures recorded with a smartpen, and performance in standardized sketching tasks in middle childhood students. Sample Thirty-six children (age 7–12) participated at the university's laboratory. Methods The children performed two standardized sketching tasks, each in two versions. The induced intrinsic cognitive load or the extraneous cognitive load was varied between the versions. Digital ink was recorded while the children drew with a smartpen on real paper and after each task, they were asked to report their perceived intrinsic and extraneous cognitive load using a newly developed 5-item scale. Results Results indicated that cognitive load ratings as well as velocity and pressure measures were substantially related to the induced cognitive load and to performance in both sketching tasks. However, cognitive load ratings and smartpen measures were not substantially related. Conclusions Both subjective rating and digital ink hold potential for cognitive load and performance measurement. However, it is questionable whether they measure the exact same constructs.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Abstract Background New methods are constantly being developed to adapt cognitive load measurement to different contexts. However, research on middle childhood students' cognitive load measurement is rare. Research indicates that the three cognitive load dimensions (intrinsic, extraneous, and germane) can be measured well in adults and teenagers using differentiated subjective rating instruments. Moreover, digital ink recorded by smartpens could serve as an indicator for cognitive load in adults. Aims With the present research, we aimed at investigating the relation between subjective cognitive load ratings, velocity and pressure measures recorded with a smartpen, and performance in standardized sketching tasks in middle childhood students. Sample Thirty-six children (age 7–12) participated at the university's laboratory. Methods The children performed two standardized sketching tasks, each in two versions. The induced intrinsic cognitive load or the extraneous cognitive load was varied between the versions. Digital ink was recorded while the children drew with a smartpen on real paper and after each task, they were asked to report their perceived intrinsic and extraneous cognitive load using a newly developed 5-item scale. Results Results indicated that cognitive load ratings as well as velocity and pressure measures were substantially related to the induced cognitive load and to performance in both sketching tasks. However, cognitive load ratings and smartpen measures were not substantially related. Conclusions Both subjective rating and digital ink hold potential for cognitive load and performance measurement. However, it is questionable whether they measure the exact same constructs. |
Inproceedings |
Nguyen, Ho Minh Duy; Nguyen, Hoang; Truong, Mai T N; Cao, Tri; Nguyen, Binh T; Ho, Nhat; Swoboda, Paul; Albarqouni, Shadi; Xie, Pengtao; Sonntag, Daniel Joint Self-Supervised Image-Volume Representation Learning with Intra-Inter Contrastive Clustering Inproceedings Proceedings of the thirty-seventh AAAI Conference on Artificial Intelligence, AAAI Press, 2023. @inproceedings{12923, title = {Joint Self-Supervised Image-Volume Representation Learning with Intra-Inter Contrastive Clustering}, author = {Ho Minh Duy Nguyen and Hoang Nguyen and Mai T N Truong and Tri Cao and Binh T Nguyen and Nhat Ho and Paul Swoboda and Shadi Albarqouni and Pengtao Xie and Daniel Sonntag}, url = {https://arxiv.org/pdf/2212.01893.pdf}, year = {2023}, date = {2023-02-01}, booktitle = {Proceedings of the thirty-seventh AAAI Conference on Artificial Intelligence}, publisher = {AAAI Press}, abstract = {Collecting large-scale medical datasets with fully annotated samples for training of deep networks is prohibitively expensive, especially for 3D volume data. Recent breakthroughs in self-supervised learning (SSL) offer the ability to overcome the lack of labeled training samples by learning feature representations from unlabeled data. However, most current SSL techniques in the medical field have been designed for either 2D images or 3D volumes. In practice, this restricts the capability to fully leverage unlabeled data from numerous sources, which may include both 2D and 3D data. Additionally, the use of these pre-trained networks is constrained to downstream tasks with compatible data dimensions. In this paper, we propose a novel framework for unsupervised joint learning on 2D and 3D data modalities. Given a set of 2D images or 2D slices extracted from 3D volumes, we construct an SSL task based on a 2D contrastive clustering problem for distinct classes. The 3D volumes are exploited by computing vectored embedding at each slice and then assembling a holistic feature through deformable self-attention mechanisms in Transformer, allowing incorporating long-range dependencies between slices inside 3D volumes. These holistic features are further utilized to define a novel 3D clustering agreement-based SSL task and masking embedding prediction inspired by pre-trained language models. Experiments on downstream tasks, such as 3D brain segmentation, lung nodule detection, 3D heart structures segmentation, and abnormal chest X-ray detection, demonstrate the effectiveness of our joint 2D and 3D SSL approach. We improve plain 2D Deep-ClusterV2 and SwAV by a significant margin and also surpass various modern 2D and 3D SSL approaches.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Collecting large-scale medical datasets with fully annotated samples for training of deep networks is prohibitively expensive, especially for 3D volume data. Recent breakthroughs in self-supervised learning (SSL) offer the ability to overcome the lack of labeled training samples by learning feature representations from unlabeled data. However, most current SSL techniques in the medical field have been designed for either 2D images or 3D volumes. In practice, this restricts the capability to fully leverage unlabeled data from numerous sources, which may include both 2D and 3D data. Additionally, the use of these pre-trained networks is constrained to downstream tasks with compatible data dimensions. In this paper, we propose a novel framework for unsupervised joint learning on 2D and 3D data modalities. Given a set of 2D images or 2D slices extracted from 3D volumes, we construct an SSL task based on a 2D contrastive clustering problem for distinct classes. The 3D volumes are exploited by computing vectored embedding at each slice and then assembling a holistic feature through deformable self-attention mechanisms in Transformer, allowing incorporating long-range dependencies between slices inside 3D volumes. These holistic features are further utilized to define a novel 3D clustering agreement-based SSL task and masking embedding prediction inspired by pre-trained language models. Experiments on downstream tasks, such as 3D brain segmentation, lung nodule detection, 3D heart structures segmentation, and abnormal chest X-ray detection, demonstrate the effectiveness of our joint 2D and 3D SSL approach. We improve plain 2D Deep-ClusterV2 and SwAV by a significant margin and also surpass various modern 2D and 3D SSL approaches. |
Barz, Michael; Bhatti, Omair Shahzad; Alam, Hasan Md Tusfiqur; Nguyen, Ho Minh Duy; Sonntag, Daniel Interactive Fixation-to-AOI Mapping for Mobile Eye Tracking Data Based on Few-Shot Image Classification Inproceedings Companion Proceedings of the 28th International Conference on Intelligent User Interfaces, pp. 175-178, Association for Computing Machinery, 2023. @inproceedings{13196, title = {Interactive Fixation-to-AOI Mapping for Mobile Eye Tracking Data Based on Few-Shot Image Classification}, author = {Michael Barz and Omair Shahzad Bhatti and Hasan Md Tusfiqur Alam and Ho Minh Duy Nguyen and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/13196_3581754.3584179.pdf}, doi = {https://doi.org/10.1145/3581754.3584179}, year = {2023}, date = {2023-01-01}, booktitle = {Companion Proceedings of the 28th International Conference on Intelligent User Interfaces}, pages = {175-178}, publisher = {Association for Computing Machinery}, abstract = {Mobile eye tracking is an important tool in psychology and human-centred interaction design for understanding how people process visual scenes and user interfaces. However, analysing recordings from mobile eye trackers, which typically include an egocentric video of the scene and a gaze signal, is a time-consuming and largely manual process. To address this challenge, we propose a web-based annotation tool that leverages few-shot image classification and interactive machine learning (IML) to accelerate the annotation process. The tool allows users to efficiently map fixations to areas of interest (AOI) in a video-editing-style interface. It includes an IML component that generates suggestions and learns from user feedback using a few-shot image classification model initialised with a small number of images per AOI. Our goal is to improve the efficiency and accuracy of fixation-to-AOI mapping in mobile eye tracking.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Mobile eye tracking is an important tool in psychology and human-centred interaction design for understanding how people process visual scenes and user interfaces. However, analysing recordings from mobile eye trackers, which typically include an egocentric video of the scene and a gaze signal, is a time-consuming and largely manual process. To address this challenge, we propose a web-based annotation tool that leverages few-shot image classification and interactive machine learning (IML) to accelerate the annotation process. The tool allows users to efficiently map fixations to areas of interest (AOI) in a video-editing-style interface. It includes an IML component that generates suggestions and learns from user feedback using a few-shot image classification model initialised with a small number of images per AOI. Our goal is to improve the efficiency and accuracy of fixation-to-AOI mapping in mobile eye tracking. |
Kopácsi, László; Barz, Michael; Bhatti, Omair Shahzad; Sonntag, Daniel IMETA: An Interactive Mobile Eye Tracking Annotation Method for Semi-Automatic Fixation-to-AOI Mapping Inproceedings Companion Proceedings of the 28th International Conference on Intelligent User Interfaces, pp. 33-36, Association for Computing Machinery, 2023. @inproceedings{13201, title = {IMETA: An Interactive Mobile Eye Tracking Annotation Method for Semi-Automatic Fixation-to-AOI Mapping}, author = {László Kopácsi and Michael Barz and Omair Shahzad Bhatti and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/13201_3581754.3584125.pdf}, doi = {https://doi.org/10.1145/3581754.3584125}, year = {2023}, date = {2023-01-01}, booktitle = {Companion Proceedings of the 28th International Conference on Intelligent User Interfaces}, pages = {33-36}, publisher = {Association for Computing Machinery}, abstract = {Mobile eye tracking studies involve analyzing areas of interest (AOIs) and visual attention to these AOIs to understand how people process visual information. However, accurately annotating the data collected for user studies can be a challenging and time-consuming task. Current approaches for automatically or semi-automatically analyzing head-mounted eye tracking data in mobile eye tracking studies have limitations, including a lack of annotation flexibility or the inability to adapt to specific target domains. To address this problem, we present IMETA, an architecture for semi-automatic fixation-to-AOI mapping. When an annotator assigns an AOI label to a sequence of frames based on the respective fixation points, an interactive video object segmentation method is used to estimate the mask proposal of the AOI. Then, we use the 3D reconstruction of the visual scene created from the eye tracking video to map these AOI masks to 3D. The resulting 3D segmentation of the AOI can be used to suggest labels for the rest of the video, with the suggestions becoming increasingly accurate as more samples are provided by an annotator using interactive machine learning (IML). IMETA has the potential to reduce the annotation workload and speed up the evaluation of mobile eye tracking studies.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Mobile eye tracking studies involve analyzing areas of interest (AOIs) and visual attention to these AOIs to understand how people process visual information. However, accurately annotating the data collected for user studies can be a challenging and time-consuming task. Current approaches for automatically or semi-automatically analyzing head-mounted eye tracking data in mobile eye tracking studies have limitations, including a lack of annotation flexibility or the inability to adapt to specific target domains. To address this problem, we present IMETA, an architecture for semi-automatic fixation-to-AOI mapping. When an annotator assigns an AOI label to a sequence of frames based on the respective fixation points, an interactive video object segmentation method is used to estimate the mask proposal of the AOI. Then, we use the 3D reconstruction of the visual scene created from the eye tracking video to map these AOI masks to 3D. The resulting 3D segmentation of the AOI can be used to suggest labels for the rest of the video, with the suggestions becoming increasingly accurate as more samples are provided by an annotator using interactive machine learning (IML). IMETA has the potential to reduce the annotation workload and speed up the evaluation of mobile eye tracking studies. |
Kadir, Md Abdul; Selim, Abdulrahman Mohamed; Barz, Michael; Sonntag, Daniel A User Interface for Explaining Machine Learning Model Explanations Inproceedings Companion Proceedings of the 28th International Conference on Intelligent User Interfaces, pp. 59–63, Association for Computing Machinery, Sydney, NSW, Australia, 2023, ISBN: 9798400701078. @inproceedings{13200, title = {A User Interface for Explaining Machine Learning Model Explanations}, author = {Md Abdul Kadir and Abdulrahman Mohamed Selim and Michael Barz and Daniel Sonntag}, url = {https://doi.org/10.1145/3581754.3584131}, doi = {10.1145/3581754.3584131}, isbn = {9798400701078}, year = {2023}, date = {2023-01-01}, booktitle = {Companion Proceedings of the 28th International Conference on Intelligent User Interfaces}, pages = {59–63}, publisher = {Association for Computing Machinery}, address = {Sydney, NSW, Australia}, series = {IUI '23 Companion}, abstract = {Explainable Artificial Intelligence (XAI) is an emerging subdiscipline of Machine Learning (ML) and human-computer interaction. Discriminative models need to be understood. An explanation of such ML models is vital when an AI system makes decisions that have significant consequences, such as in healthcare or finance. By providing an input-specific explanation, users can gain confidence in an AI system’s decisions and be more willing to trust and rely on it. One problem is that interpreting example-based explanations for discriminative models, such as saliency maps, can be difficult because it is not always clear how the highlighted features contribute to the model’s overall prediction or decisions. Moreover, saliency maps, which are state-of-the-art visual explanation methods, do not provide concrete information on the influence of particular features. We propose an interactive visualisation tool called EMILE-UI that allows users to evaluate the provided explanations of an image-based classification task, specifically those provided by saliency maps. This tool allows users to evaluate the accuracy of a saliency map by reflecting the true attention or focus of the corresponding model. It visualises the relationship between the ML model and its explanation of input images, making it easier to interpret saliency maps and understand how the ML model actually predicts. Our tool supports a wide range of deep learning image classification models and image data as inputs.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Explainable Artificial Intelligence (XAI) is an emerging subdiscipline of Machine Learning (ML) and human-computer interaction. Discriminative models need to be understood. An explanation of such ML models is vital when an AI system makes decisions that have significant consequences, such as in healthcare or finance. By providing an input-specific explanation, users can gain confidence in an AI system’s decisions and be more willing to trust and rely on it. One problem is that interpreting example-based explanations for discriminative models, such as saliency maps, can be difficult because it is not always clear how the highlighted features contribute to the model’s overall prediction or decisions. Moreover, saliency maps, which are state-of-the-art visual explanation methods, do not provide concrete information on the influence of particular features. We propose an interactive visualisation tool called EMILE-UI that allows users to evaluate the provided explanations of an image-based classification task, specifically those provided by saliency maps. This tool allows users to evaluate the accuracy of a saliency map by reflecting the true attention or focus of the corresponding model. It visualises the relationship between the ML model and its explanation of input images, making it easier to interpret saliency maps and understand how the ML model actually predicts. Our tool supports a wide range of deep learning image classification models and image data as inputs. |
2022 |
Journal Articles |
Rezaei, Mohammad Amin; Fathollahi, Arman; Rezaei, Sajad; Hu, Jiefeng; Gheisarnejad, Meysam; Teimouri, Ali Reza; Rituraj, Rituraj; Mosavi, Amirhosein; Khooban, Mohammad-Hassan Adaptation of A Real-Time Deep Learning Approach with An Analog Fault Detection Technique for Reliability Forecasting of Capacitor Banks Used in Mobile Vehicles Journal Article IEEE Access, 10 , pp. 132271-132287, 2022. @article{12980, title = {Adaptation of A Real-Time Deep Learning Approach with An Analog Fault Detection Technique for Reliability Forecasting of Capacitor Banks Used in Mobile Vehicles}, author = {Mohammad Amin Rezaei and Arman Fathollahi and Sajad Rezaei and Jiefeng Hu and Meysam Gheisarnejad and Ali Reza Teimouri and Rituraj Rituraj and Amirhosein Mosavi and Mohammad-Hassan Khooban}, year = {2022}, date = {2022-12-01}, journal = {IEEE Access}, volume = {10}, pages = {132271-132287}, publisher = {IEEE}, abstract = {The DC-Link capacitor is defined as the essential electronics element which sources or sinks the respective currents. The reliability of DC-link capacitor-banks (CBs) encounters many challenges due to their usage in electric vehicles. Heavy shocks may damage the internal capacitors without shutting down the CB. The fundamental development obstacles of CBs are: lack of considering capacitor degradation in reliability assessment, the impact of unforeseen sudden internal capacitor faults in forecasting CB lifetime, and the faults consequence on CB degradation. The sudden faults change the CB capacitance, which leads to reliability change. To more accurately estimate the reliability, the type of the fault needs to be detected for predicting the correct post-fault capacitance. To address these practical problems, a new CB model and reliability assessment formula covering all fault types are first presented, then, a new analog fault-detection method is presented, and a combination of online-learning long short-term memory (LSTM) and fault-detection method is subsequently performed, which adapt the sudden internal CB faults with the LSTM to correctly predict the CB degradation. To confirm the correct LSTM operation, four capacitors degradation is practically recorded for 2000-hours, and the off-line faultless degradation values predicted by the LSTM are compared with the actual data. The experimental findings validate the applicability of the proposed method. The codes and data are provided.}, keywords = {}, pubstate = {published}, tppubtype = {article} } The DC-Link capacitor is defined as the essential electronics element which sources or sinks the respective currents. The reliability of DC-link capacitor-banks (CBs) encounters many challenges due to their usage in electric vehicles. Heavy shocks may damage the internal capacitors without shutting down the CB. The fundamental development obstacles of CBs are: lack of considering capacitor degradation in reliability assessment, the impact of unforeseen sudden internal capacitor faults in forecasting CB lifetime, and the faults consequence on CB degradation. The sudden faults change the CB capacitance, which leads to reliability change. To more accurately estimate the reliability, the type of the fault needs to be detected for predicting the correct post-fault capacitance. To address these practical problems, a new CB model and reliability assessment formula covering all fault types are first presented, then, a new analog fault-detection method is presented, and a combination of online-learning long short-term memory (LSTM) and fault-detection method is subsequently performed, which adapt the sudden internal CB faults with the LSTM to correctly predict the CB degradation. To confirm the correct LSTM operation, four capacitors degradation is practically recorded for 2000-hours, and the off-line faultless degradation values predicted by the LSTM are compared with the actual data. The experimental findings validate the applicability of the proposed method. The codes and data are provided. |
Sandhu, Jasminder Kaur; Lilhore, Umesh Kumar; M, Poongodi; Kaur, Navpreet; Band, Shahab S; Hamdi, Mounir; Iwendi, Celestine; Simaiya, Sarita; Kamruzzaman, M M; Mosavi, Amirhosein Predicting the Risk of Heart Failure Based on Clinical Data Journal Article Human-centric Computing and Information Sciences, 12 , pp. 1322-1355, 2022. @article{12981, title = {Predicting the Risk of Heart Failure Based on Clinical Data}, author = {Jasminder Kaur Sandhu and Umesh Kumar Lilhore and Poongodi M and Navpreet Kaur and Shahab S Band and Mounir Hamdi and Celestine Iwendi and Sarita Simaiya and M M Kamruzzaman and Amirhosein Mosavi}, year = {2022}, date = {2022-12-01}, journal = {Human-centric Computing and Information Sciences}, volume = {12}, pages = {1322-1355}, publisher = {Kora Information Processing Soc (KIPS-CSWRG))}, abstract = {The disorder that directly impacts the heart and the blood vessels inside the body is cardiovascular disease (CVD). According to the World Health Organization reports, CVDs are the leading cause of mortality worldwide, claiming the human life of nearly 23.6 million people annually. The categorization of diseases in CVD includes coronary heart disease, strokes, and transient ischemic attacks (TIA), peripheral arterial disease, aortic disease. Most CVD fatalities are caused by strokes and heart attacks, with an estimated one-third of these deaths currently happening before 60. The standard medical organization "New York Heart Association" (NYHA) categorize the various stages of heart failure as Class I (with no symptoms), Class II (mild symptoms), Class III (comfortable only when in resting position), Class IV (severe condition or patient is bed-bound), and Class V (unable to determine the class). Machine learning-based methods play an essential role in clinical data analysis. This research presents the importance of various essential attributes related to heart disease based on a hybrid machine learning model. The proposed hybrid model SVM-GA is based on a support vector machine and the genetic algorithm. This research analyzed an online dataset obtainable at the UCI Machine Learning Repository with the medical data of 299 patients who suffered from heart failures and are classified as Class III or IV as per the standard NYHA. This dataset was collected through patients' available follow-up and checkup duration and involved thirteen clinical characteristics. The proposed machine learning models were used to calculate feature importance in this research. The proposed model and existing well-known machine learning based-models, i.e., Bayesian generalized linear model, ANN, Bagged CART, Bag Earth, and SVM, are implemented using Python and various performance measuring parameters, i.e., accuracy, processing time, precision, recall, F-measures are calculated. Experimental analysis shows the proposed SVM-GA model strengthens in terms of better accuracy, processing time, precision, recall, F-measures over existing methods.}, keywords = {}, pubstate = {published}, tppubtype = {article} } The disorder that directly impacts the heart and the blood vessels inside the body is cardiovascular disease (CVD). According to the World Health Organization reports, CVDs are the leading cause of mortality worldwide, claiming the human life of nearly 23.6 million people annually. The categorization of diseases in CVD includes coronary heart disease, strokes, and transient ischemic attacks (TIA), peripheral arterial disease, aortic disease. Most CVD fatalities are caused by strokes and heart attacks, with an estimated one-third of these deaths currently happening before 60. The standard medical organization "New York Heart Association" (NYHA) categorize the various stages of heart failure as Class I (with no symptoms), Class II (mild symptoms), Class III (comfortable only when in resting position), Class IV (severe condition or patient is bed-bound), and Class V (unable to determine the class). Machine learning-based methods play an essential role in clinical data analysis. This research presents the importance of various essential attributes related to heart disease based on a hybrid machine learning model. The proposed hybrid model SVM-GA is based on a support vector machine and the genetic algorithm. This research analyzed an online dataset obtainable at the UCI Machine Learning Repository with the medical data of 299 patients who suffered from heart failures and are classified as Class III or IV as per the standard NYHA. This dataset was collected through patients' available follow-up and checkup duration and involved thirteen clinical characteristics. The proposed machine learning models were used to calculate feature importance in this research. The proposed model and existing well-known machine learning based-models, i.e., Bayesian generalized linear model, ANN, Bagged CART, Bag Earth, and SVM, are implemented using Python and various performance measuring parameters, i.e., accuracy, processing time, precision, recall, F-measures are calculated. Experimental analysis shows the proposed SVM-GA model strengthens in terms of better accuracy, processing time, precision, recall, F-measures over existing methods. |
Manshadi, Mahsa; Mousavi, Milad; Soltani, M; Mosavi, Amirhosein; Kovacs, Levente Deep Learning for Modeling an Offshore Hybrid Wind–Wave Energy System Journal Article Energies, 15 , pp. 9484-9494, 2022. @article{12990, title = {Deep Learning for Modeling an Offshore Hybrid Wind–Wave Energy System}, author = {Mahsa Manshadi and Milad Mousavi and M Soltani and Amirhosein Mosavi and Levente Kovacs}, year = {2022}, date = {2022-12-01}, journal = {Energies}, volume = {15}, pages = {9484-9494}, publisher = {MDPI}, abstract = {The combination of an offshore wind turbine and a wave energy converter on an integrated platform is an economical solution for the electrical power demand in coastal countries. Due to the expensive installation cost, a prediction should be used to investigate whether the location is suitable for these sites. For this purpose, this research presents the feasibility of installing a combined hybrid site in the desired coastal location by predicting the net produced power due to the environmental parameters. For combining these two systems, an optimized array includes ten turbines and ten wave energy converters. The mathematical equations of the net force on the two introduced systems and the produced power of the wind turbines are proposed. The turbines’ maximum forces are 4 kN, and for the wave energy converters are 6 kN, respectively. Furthermore, the comparison is conducted in order to find the optimum system. The comparison shows that the most effective system of desired environmental condition is introduced. A number of machine learning and deep learning methods are used to predict key parameters after collecting the dataset. Moreover, a comparative analysis is conducted to find a suitable model. The models’ performance has been well studied through generating the confusion matrix and the receiver operating characteristic (ROC) curve of the hybrid site. The deep learning model outperformed other models, with an approximate accuracy of 0.96.}, keywords = {}, pubstate = {published}, tppubtype = {article} } The combination of an offshore wind turbine and a wave energy converter on an integrated platform is an economical solution for the electrical power demand in coastal countries. Due to the expensive installation cost, a prediction should be used to investigate whether the location is suitable for these sites. For this purpose, this research presents the feasibility of installing a combined hybrid site in the desired coastal location by predicting the net produced power due to the environmental parameters. For combining these two systems, an optimized array includes ten turbines and ten wave energy converters. The mathematical equations of the net force on the two introduced systems and the produced power of the wind turbines are proposed. The turbines’ maximum forces are 4 kN, and for the wave energy converters are 6 kN, respectively. Furthermore, the comparison is conducted in order to find the optimum system. The comparison shows that the most effective system of desired environmental condition is introduced. A number of machine learning and deep learning methods are used to predict key parameters after collecting the dataset. Moreover, a comparative analysis is conducted to find a suitable model. The models’ performance has been well studied through generating the confusion matrix and the receiver operating characteristic (ROC) curve of the hybrid site. The deep learning model outperformed other models, with an approximate accuracy of 0.96. |
Hartmann, Mareike; Du, Han; Feldhus, Nils; Kruijff-Korbayová, Ivana; Sonntag, Daniel XAINES: Explaining AI with Narratives Journal Article KI - Künstliche Intelligenz, German Journal on Artificial Intelligence - Organ des Fachbereiches "Künstliche Intelligenz" der Gesellschaft für Informatik e.V., 36 , pp. 287-296, 2022. @article{13116, title = {XAINES: Explaining AI with Narratives}, author = {Mareike Hartmann and Han Du and Nils Feldhus and Ivana Kruijff-Korbayová and Daniel Sonntag}, editor = {Ute Schmid and Britta Wrede}, url = {https://www.dfki.de/fileadmin/user_upload/import/13116_s13218-022-00780-8.pdf}, doi = {https://doi.org/10.1007/s13218-022-00780-8}, year = {2022}, date = {2022-12-01}, journal = {KI - Künstliche Intelligenz, German Journal on Artificial Intelligence - Organ des Fachbereiches "Künstliche Intelligenz" der Gesellschaft für Informatik e.V.}, volume = {36}, pages = {287-296}, publisher = {Springer}, abstract = {Artificial Intelligence (AI) systems are increasingly pervasive: Internet of Things, in-car intelligent devices, robots, and virtual assistants, and their large-scale adoption makes it necessary to explain their behaviour, for example to their users who are impacted by their decisions, or to their developers who need to ensure their functionality. This requires, on the one hand, to obtain an accurate representation of the chain of events that caused the system to behave in a certain way (e.g., to make a specific decision). On the other hand, this causal chain needs to be communicated to the users depending on their needs and expectations. In this phase of explanation delivery, allowing interaction between user and model has the potential to improve both model quality and user experience. The XAINES project investigates the explanation of AI systems through narratives targeted to the needs of a specific audience, focusing on two important aspects that are crucial for enabling successful explanation: generating and selecting appropriate explanation content, i.e. the information to be contained in the explanation, and delivering this information to the user in an appropriate way. In this article, we present the project’s roadmap towards enabling the explanation of AI with narratives.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Artificial Intelligence (AI) systems are increasingly pervasive: Internet of Things, in-car intelligent devices, robots, and virtual assistants, and their large-scale adoption makes it necessary to explain their behaviour, for example to their users who are impacted by their decisions, or to their developers who need to ensure their functionality. This requires, on the one hand, to obtain an accurate representation of the chain of events that caused the system to behave in a certain way (e.g., to make a specific decision). On the other hand, this causal chain needs to be communicated to the users depending on their needs and expectations. In this phase of explanation delivery, allowing interaction between user and model has the potential to improve both model quality and user experience. The XAINES project investigates the explanation of AI systems through narratives targeted to the needs of a specific audience, focusing on two important aspects that are crucial for enabling successful explanation: generating and selecting appropriate explanation content, i.e. the information to be contained in the explanation, and delivering this information to the user in an appropriate way. In this article, we present the project’s roadmap towards enabling the explanation of AI with narratives. |
YAN, SHU-RONG; TIAN, MANWEN; ALATTAS, KHALID A; MOHAMADZADEH, ARDASHIR; SABZALIAN, MOHAMMAD; Mosavi, Amirhosein An Experimental Machine Learning Approach for Mid-Term Energy Demand Forecasting Journal Article IEEE Access, 10 , pp. 118926-118940, 2022. @article{12991, title = {An Experimental Machine Learning Approach for Mid-Term Energy Demand Forecasting}, author = {SHU-RONG YAN and MANWEN TIAN and KHALID A ALATTAS and ARDASHIR MOHAMADZADEH and MOHAMMAD SABZALIAN and Amirhosein Mosavi}, year = {2022}, date = {2022-11-01}, journal = {IEEE Access}, volume = {10}, pages = {118926-118940}, publisher = {IEEE}, abstract = {In this study, a neural network-based approach is designed for mid-term load forecasting (MTLF). The structure and hyperparameters are tuned to obtain the best forecasting accuracy one year ahead. The suggested approach is practically applied to a region in Iran by the use of real-world data sets of 10 years. The influential factors such as economic, weather, and social factors are investigated, and their impact on accuracy is numerically analyzed. The bad data are detected by a suggested effective method. In addition to load peak, the 24-hours load pattern is also predicted, which helps for better mid-term planning. The simulations show that the suggested approach is practical, and the accuracy is more than 95%, even when there are drastic weather changes.}, keywords = {}, pubstate = {published}, tppubtype = {article} } In this study, a neural network-based approach is designed for mid-term load forecasting (MTLF). The structure and hyperparameters are tuned to obtain the best forecasting accuracy one year ahead. The suggested approach is practically applied to a region in Iran by the use of real-world data sets of 10 years. The influential factors such as economic, weather, and social factors are investigated, and their impact on accuracy is numerically analyzed. The bad data are detected by a suggested effective method. In addition to load peak, the 24-hours load pattern is also predicted, which helps for better mid-term planning. The simulations show that the suggested approach is practical, and the accuracy is more than 95%, even when there are drastic weather changes. |
Ott, Torben; Masset, Paul; Gouvea, Thiago; Kepecs, Adam Apparent sunk cost effect in rational agents Journal Article Science Advances, 8 , pp. 1-10, 2022. @article{12243, title = {Apparent sunk cost effect in rational agents}, author = {Torben Ott and Paul Masset and Thiago Gouvea and Adam Kepecs}, url = {https://www.science.org/doi/10.1126/sciadv.abi7004}, year = {2022}, date = {2022-02-01}, journal = {Science Advances}, volume = {8}, pages = {1-10}, publisher = {American Association for the Advancement of Science}, abstract = {Rational decision makers aim to maximize their gains, but humans and other animals often fail to do so, exhibiting biases and distortions in their choice behavior. In a recent study of economic decisions, humans, mice, and rats were reported to succumb to the sunk cost fallacy, making decisions based on irrecoverable past investments to the detriment of expected future returns. We challenge this interpretation because it is subject to a statistical fallacy, a form of attrition bias, and the observed behavior can be explained without invoking a sunk cost–dependent mechanism. Using a computational model, we illustrate how a rational decision maker with a reward-maximizing decision strategy reproduces the reported behavioral pattern and propose an improved task design to dissociate sunk costs from fluctuations in decision valuation. Similar statistical confounds may be common in analyses of cognitive behaviors, highlighting the need to use causal statistical inference and generative models for interpretation.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Rational decision makers aim to maximize their gains, but humans and other animals often fail to do so, exhibiting biases and distortions in their choice behavior. In a recent study of economic decisions, humans, mice, and rats were reported to succumb to the sunk cost fallacy, making decisions based on irrecoverable past investments to the detriment of expected future returns. We challenge this interpretation because it is subject to a statistical fallacy, a form of attrition bias, and the observed behavior can be explained without invoking a sunk cost–dependent mechanism. Using a computational model, we illustrate how a rational decision maker with a reward-maximizing decision strategy reproduces the reported behavioral pattern and propose an improved task design to dissociate sunk costs from fluctuations in decision valuation. Similar statistical confounds may be common in analyses of cognitive behaviors, highlighting the need to use causal statistical inference and generative models for interpretation. |
Barz, Michael; Bhatti, Omair Shahzad; Sonntag, Daniel Implicit Estimation of Paragraph Relevance from Eye Movements Journal Article Frontiers in Computer Science, 3 , pp. 13, 2022. @article{12165, title = {Implicit Estimation of Paragraph Relevance from Eye Movements}, author = {Michael Barz and Omair Shahzad Bhatti and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/12165_fcomp-03-808507.pdf https://www.frontiersin.org/articles/10.3389/fcomp.2021.808507}, year = {2022}, date = {2022-01-01}, journal = {Frontiers in Computer Science}, volume = {3}, pages = {13}, publisher = {Frontiers Media S.A.}, abstract = {Eye movements were shown to be an effective source of implicit relevance feedback in constrained search and decision-making tasks. Recent research suggests that gaze-based features, extracted from scanpaths over short news articles (g-REL), can reveal the perceived relevance of read text with respect to a previously shown trigger question. In this work, we aim to confirm this finding and we investigate whether it generalizes to multi-paragraph documents from Wikipedia (Google Natural Questions) that require readers to scroll down to read the whole text. We conduct a user study (n=24) in which participants read single- and multi-paragraph articles and rate their relevance at the paragraph level with respect to a trigger question. We model the perceived document relevance using machine learning and features from the literature as input. Our results confirm that eye movements can be used to effectively model the relevance of short news articles, in particular if we exclude difficult cases: documents which are on topic of the trigger questions but irrelevant. However, our results do not clearly show that the modeling approach generalizes to multi-paragraph document settings. We publish our dataset and our code for feature extraction under an open source license to enable future research in the field of gaze-based implicit relevance feedback.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Eye movements were shown to be an effective source of implicit relevance feedback in constrained search and decision-making tasks. Recent research suggests that gaze-based features, extracted from scanpaths over short news articles (g-REL), can reveal the perceived relevance of read text with respect to a previously shown trigger question. In this work, we aim to confirm this finding and we investigate whether it generalizes to multi-paragraph documents from Wikipedia (Google Natural Questions) that require readers to scroll down to read the whole text. We conduct a user study (n=24) in which participants read single- and multi-paragraph articles and rate their relevance at the paragraph level with respect to a trigger question. We model the perceived document relevance using machine learning and features from the literature as input. Our results confirm that eye movements can be used to effectively model the relevance of short news articles, in particular if we exclude difficult cases: documents which are on topic of the trigger questions but irrelevant. However, our results do not clearly show that the modeling approach generalizes to multi-paragraph document settings. We publish our dataset and our code for feature extraction under an open source license to enable future research in the field of gaze-based implicit relevance feedback. |
Nguyen, Ho Minh Duy; Nguyen, Thu T; Vu, Huong; Pham, Quang; Nguyen, Manh-Duy; Nguyen, Binh T; Sonntag, Daniel TATL: Task Agnostic Transfer Learning for Skin Attributes Detection Journal Article Medical Image Analysis, 01 , pp. 1-27, 2022. @article{12216, title = {TATL: Task Agnostic Transfer Learning for Skin Attributes Detection}, author = {Ho Minh Duy Nguyen and Thu T Nguyen and Huong Vu and Quang Pham and Manh-Duy Nguyen and Binh T Nguyen and Daniel Sonntag}, url = {https://arxiv.org/pdf/2104.01641.pdf}, year = {2022}, date = {2022-01-01}, journal = {Medical Image Analysis}, volume = {01}, pages = {1-27}, publisher = {Elsevier}, abstract = {Existing skin attributes detection methods usually initialize with a pre-trained Imagenet network and then fine-tune on a medical target task. However, we argue that such approaches are suboptimal because medical datasets are largely different from ImageNet and often contain limited training samples. In this work, we propose Task Agnostic Transfer Learning (TATL), a novel framework motivated by dermatologists' behaviors in the skincare context. TATL learns an attribute-agnostic segmenter that detects lesion skin regions and then transfers this knowledge to a set of attribute-specific classifiers to detect each particular attribute. Since TATL's attribute-agnostic segmenter only detects skin attribute regions, it enjoys ample data from all attributes, allows transferring knowledge among features, and compensates for the lack of training data from rare attributes. We conduct extensive experiments to evaluate the proposed TATL transfer learning mechanism with various neural network architectures on two popular skin attributes detection benchmarks. The empirical results show that TATL not only works well with multiple architectures but also can achieve state-of-the-art performances, while enjoying minimal model and computational complexities. We also provide theoretical insights and explanations for why our transfer learning framework performs well in practice.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Existing skin attributes detection methods usually initialize with a pre-trained Imagenet network and then fine-tune on a medical target task. However, we argue that such approaches are suboptimal because medical datasets are largely different from ImageNet and often contain limited training samples. In this work, we propose Task Agnostic Transfer Learning (TATL), a novel framework motivated by dermatologists' behaviors in the skincare context. TATL learns an attribute-agnostic segmenter that detects lesion skin regions and then transfers this knowledge to a set of attribute-specific classifiers to detect each particular attribute. Since TATL's attribute-agnostic segmenter only detects skin attribute regions, it enjoys ample data from all attributes, allows transferring knowledge among features, and compensates for the lack of training data from rare attributes. We conduct extensive experiments to evaluate the proposed TATL transfer learning mechanism with various neural network architectures on two popular skin attributes detection benchmarks. The empirical results show that TATL not only works well with multiple architectures but also can achieve state-of-the-art performances, while enjoying minimal model and computational complexities. We also provide theoretical insights and explanations for why our transfer learning framework performs well in practice. |
Volkmar, Georg; Alexandrovsky, Dmitry; Eilks, Asmus Eike; Queck, Dirk; Herrlich, Marc; Malaka, Rainer Effects of PCG on Creativity in Playful City-Building Environments in VR Journal Article Proceedings of the ACM on Human-Computer Interaction, 6 , pp. 1-20, 2022. @article{12840, title = {Effects of PCG on Creativity in Playful City-Building Environments in VR}, author = {Georg Volkmar and Dmitry Alexandrovsky and Asmus Eike Eilks and Dirk Queck and Marc Herrlich and Rainer Malaka}, year = {2022}, date = {2022-01-01}, journal = {Proceedings of the ACM on Human-Computer Interaction}, volume = {6}, pages = {1-20}, publisher = {Association for Computing Machinery}, abstract = {The use of procedural content generation (PCG) in the context of video games has increased over the years as it provides an economical way to generate game content whilst enhancing their variety and replayability. For city-building games, this approach is often utilized to predefine map layouts, terrains, or cityscapes for the player. One core aspect of facilitating enjoyment in these games comes from creative expressivity. PCG, in this context, may support creativity by lowering the technical complexity for content creation, or it may hinder creativity by taking away control and freedom from the user. To examine these potential effects, this paper investigates if PCG has an impact on players' creativity in the context of VR city-building games. We present a VR prototype that provides varying degrees of procedural content: No PCG, terrain generation, city generation, and full (city + terrain) generation. In a remote user study, these conditions were compared regarding their capability to support creativity. Statistical tests for equivalence revealed that the presence of PCG did not affect creativity in any way. Our work suggests that PCG can be a useful integration into city-building games without notably decreasing players' ability to express themselves creatively.}, keywords = {}, pubstate = {published}, tppubtype = {article} } The use of procedural content generation (PCG) in the context of video games has increased over the years as it provides an economical way to generate game content whilst enhancing their variety and replayability. For city-building games, this approach is often utilized to predefine map layouts, terrains, or cityscapes for the player. One core aspect of facilitating enjoyment in these games comes from creative expressivity. PCG, in this context, may support creativity by lowering the technical complexity for content creation, or it may hinder creativity by taking away control and freedom from the user. To examine these potential effects, this paper investigates if PCG has an impact on players' creativity in the context of VR city-building games. We present a VR prototype that provides varying degrees of procedural content: No PCG, terrain generation, city generation, and full (city + terrain) generation. In a remote user study, these conditions were compared regarding their capability to support creativity. Statistical tests for equivalence revealed that the presence of PCG did not affect creativity in any way. Our work suggests that PCG can be a useful integration into city-building games without notably decreasing players' ability to express themselves creatively. |
Albert, Iannis; Burkard, Nicole; Queck, Dirk; Herrlich, Marc The Effect of Auditory-Motor Synchronization in Exergames on the Example of the VR Rhythm Game BeatSaber Journal Article Proceedings of the ACM on Human-Computer Interaction, 6 , pp. 1-26, 2022. @article{12841, title = {The Effect of Auditory-Motor Synchronization in Exergames on the Example of the VR Rhythm Game BeatSaber}, author = {Iannis Albert and Nicole Burkard and Dirk Queck and Marc Herrlich}, year = {2022}, date = {2022-01-01}, journal = {Proceedings of the ACM on Human-Computer Interaction}, volume = {6}, pages = {1-26}, publisher = {Association for Computing Machinery}, abstract = {Physical inactivity and an increasingly sedentary lifestyle constitute a significant public health concern. Exergames try to tackle this problem by combining exercising with motivational gameplay. Another approach in sports science is the use of auditory-motor synchronization, the entrainment of movements to the rhythm of music. There are already commercially successful games making use of the combination of both, such as the popular VR rhythm game BeatSaber. However, unlike traditional exercise settings often relying on periodic movements that can be easily entrained to a rhythmic pulse, exergames typically offer an additional cognitive challenge through their gameplay and might be based more on reaction or memorization. That poses the question as to what extent the effects of auditory-motor synchronization can be transferred to exergames, and if the synchronization of music and gameplay facilitates the playing experience. We conducted a user study (N = 54) to investigate the effects of different degrees of synchronization between music and gameplay using the VR rhythm game BeatSaber. Results show significant effects on performance, perceived workload, and player experience between the synchronized and non-synchronized conditions, but the results seem to be strongly mediated by the ability of the participants to consciously perceive the synchronization differences.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Physical inactivity and an increasingly sedentary lifestyle constitute a significant public health concern. Exergames try to tackle this problem by combining exercising with motivational gameplay. Another approach in sports science is the use of auditory-motor synchronization, the entrainment of movements to the rhythm of music. There are already commercially successful games making use of the combination of both, such as the popular VR rhythm game BeatSaber. However, unlike traditional exercise settings often relying on periodic movements that can be easily entrained to a rhythmic pulse, exergames typically offer an additional cognitive challenge through their gameplay and might be based more on reaction or memorization. That poses the question as to what extent the effects of auditory-motor synchronization can be transferred to exergames, and if the synchronization of music and gameplay facilitates the playing experience. We conducted a user study (N = 54) to investigate the effects of different degrees of synchronization between music and gameplay using the VR rhythm game BeatSaber. Results show significant effects on performance, perceived workload, and player experience between the synchronized and non-synchronized conditions, but the results seem to be strongly mediated by the ability of the participants to consciously perceive the synchronization differences. |
Inproceedings |
Liang, Siting; Kades, Klaus; Fink, Matthias A; Full, Peter M; Weber, Tim F; Kleesiek, Jens; Strube, Michael; Maier-Hein, Klaus Fine-tuning BERT Models for Summarizing German Radiology Findings Inproceedings Naumann, Tristan; Bethard, Steven; Roberts, Kirk; Rumshisky, Anna (Ed.): Proceedings of the 4th Clinical Natural Language Processing Workshop, Association for Computational Linguistics, 2022. @inproceedings{12809, title = {Fine-tuning BERT Models for Summarizing German Radiology Findings}, author = {Siting Liang and Klaus Kades and Matthias A Fink and Peter M Full and Tim F Weber and Jens Kleesiek and Michael Strube and Klaus Maier-Hein}, editor = {Tristan Naumann and Steven Bethard and Kirk Roberts and Anna Rumshisky}, url = {https://www.dfki.de/fileadmin/user_upload/import/12809_2022.clinicalnlp-1.4.pdf}, year = {2022}, date = {2022-07-01}, booktitle = {Proceedings of the 4th Clinical Natural Language Processing Workshop}, publisher = {Association for Computational Linguistics}, abstract = {Writing the conclusion section of radiology reports is essential for communicating the radiology findings and its assessment to physicians in a condensed form. In this work, we employ a transformer-based Seq2Seq model for generating the conclusion section of German radiology reports. The model is initialized with the pre-trained parameters of a German BERT model and fine-tuned in our downstream task on our domain data. We proposed two strategies to improve the factual correctness of the model. In the first method, next to the abstractive learning objective, we introduce an extraction learning objective to train the decoder in the model to both generate one summary sequence and extract the key findings from the source input. The second approach is to integrate the pointer mechanism into the transformer-based Seq2Seq model. The pointer network helps the Seq2Seq model to choose between generating tokens from the vocabulary or copying parts from the source input during generation. The results of the automatic and human evaluations show that the enhanced Seq2Seq model is capable of generating human-like radiology conclusions and that the improved models effectively reduce the factual errors in the generations despite the small amount of training data.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Writing the conclusion section of radiology reports is essential for communicating the radiology findings and its assessment to physicians in a condensed form. In this work, we employ a transformer-based Seq2Seq model for generating the conclusion section of German radiology reports. The model is initialized with the pre-trained parameters of a German BERT model and fine-tuned in our downstream task on our domain data. We proposed two strategies to improve the factual correctness of the model. In the first method, next to the abstractive learning objective, we introduce an extraction learning objective to train the decoder in the model to both generate one summary sequence and extract the key findings from the source input. The second approach is to integrate the pointer mechanism into the transformer-based Seq2Seq model. The pointer network helps the Seq2Seq model to choose between generating tokens from the vocabulary or copying parts from the source input during generation. The results of the automatic and human evaluations show that the enhanced Seq2Seq model is capable of generating human-like radiology conclusions and that the improved models effectively reduce the factual errors in the generations despite the small amount of training data. |
Liang, Siting; Hartmann, Mareike; Sonntag, Daniel Cross-lingual German Biomedical Information Extraction: from Zero-shot to Human-in-the-Loop Inproceedings 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, 2022. @inproceedings{12839, title = {Cross-lingual German Biomedical Information Extraction: from Zero-shot to Human-in-the-Loop}, author = {Siting Liang and Mareike Hartmann and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/12839_2022_HCI+NLP.3.1.pdf}, year = {2022}, date = {2022-07-01}, booktitle = {2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics}, publisher = {Association for Computational Linguistics}, abstract = {This paper presents our project proposal for extracting biomedical information from German clinical narratives with limited amounts of annotations. We first describe the applied strategies in transfer learning and active learning for solving our problem. After that, we discuss the design of the user interface for both supplying model inspection and obtaining user annotations in the interactive environment.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } This paper presents our project proposal for extracting biomedical information from German clinical narratives with limited amounts of annotations. We first describe the applied strategies in transfer learning and active learning for solving our problem. After that, we discuss the design of the user interface for both supplying model inspection and obtaining user annotations in the interactive environment. |
Queck, Dirk; Albert, Iannis; Burkard, Nicole; Zimmer, Philipp; Volkmar, Georg; Dänekas, Bastian; Malaka, Rainer; Herrlich, Marc SpiderClip: Towards an Open Source System for Wearable Device Simulation in Virtual Reality Inproceedings CHI EA '22: Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, 2022. @inproceedings{12550, title = {SpiderClip: Towards an Open Source System for Wearable Device Simulation in Virtual Reality}, author = {Dirk Queck and Iannis Albert and Nicole Burkard and Philipp Zimmer and Georg Volkmar and Bastian Dänekas and Rainer Malaka and Marc Herrlich}, url = {https://dl.acm.org/doi/abs/10.1145/3491101.3519758#sec-supp}, year = {2022}, date = {2022-04-01}, booktitle = {CHI EA '22: Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems}, publisher = {Association for Computing Machinery}, abstract = {Smartwatches and fitness trackers integrate different sensors from inertial measurement units to heart rate sensors in a very compact and affordable form factor. This makes them interesting and relevant research tools. One potential application domain is virtual reality, e.g., for health related applications such as exergames or simulation approaches. However, commercial devices complicate and limit the collection of raw and real-time data, suffer from privacy issues and are not tailored to using them with VR tracking systems. We address these issues with an open source design to facilitate the construction of VR-enabled wearables for conducting scientific experiments. Our work is motivated by research in mixed realities in pervasive computing environments. We introduce our system and present a proof-of-concept study with 17 participants. Our results show that the wearable reliably measures high-quality data comparable to commercially available fitness trackers and that it does not impede movements or interfere with VR tracking.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Smartwatches and fitness trackers integrate different sensors from inertial measurement units to heart rate sensors in a very compact and affordable form factor. This makes them interesting and relevant research tools. One potential application domain is virtual reality, e.g., for health related applications such as exergames or simulation approaches. However, commercial devices complicate and limit the collection of raw and real-time data, suffer from privacy issues and are not tailored to using them with VR tracking systems. We address these issues with an open source design to facilitate the construction of VR-enabled wearables for conducting scientific experiments. Our work is motivated by research in mixed realities in pervasive computing environments. We introduce our system and present a proof-of-concept study with 17 participants. Our results show that the wearable reliably measures high-quality data comparable to commercially available fitness trackers and that it does not impede movements or interfere with VR tracking. |
Valdunciel, Pablo; Bhatti, Omair Shahzad; Barz, Michael; Sonntag, Daniel Interactive Assessment Tool for Gaze-based Machine Learning Models in Information Retrieval Inproceedings ACM SIGIR Conference on Human Information Interaction and Retrieval, Association for Computing Machinery, 2022. @inproceedings{12287, title = {Interactive Assessment Tool for Gaze-based Machine Learning Models in Information Retrieval}, author = {Pablo Valdunciel and Omair Shahzad Bhatti and Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/12287_3498366.3505834.pdf}, year = {2022}, date = {2022-03-01}, booktitle = {ACM SIGIR Conference on Human Information Interaction and Retrieval}, publisher = {Association for Computing Machinery}, abstract = {Eye movements were shown to be an effective source of implicit relevance feedback in information retrieval tasks. They can be used to, e.g., estimate the relevance of read documents and expand search queries using machine learning. In this paper, we present the Reading Model Assessment tool (ReMA), an interactive tool for assessing gaze-based relevance estimation models. Our tool allows experimenters to easily browse recorded trials, compare the model output to a ground truth, and visualize gaze-based features at the token- and paragraph-level that serve as model input. Our goal is to facilitate the understanding of the relation between eye movements and the human relevance estimation process, to understand the strengths and weaknesses of a model at hand, and, eventually, to enable researchers to build more effective models.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Eye movements were shown to be an effective source of implicit relevance feedback in information retrieval tasks. They can be used to, e.g., estimate the relevance of read documents and expand search queries using machine learning. In this paper, we present the Reading Model Assessment tool (ReMA), an interactive tool for assessing gaze-based relevance estimation models. Our tool allows experimenters to easily browse recorded trials, compare the model output to a ground truth, and visualize gaze-based features at the token- and paragraph-level that serve as model input. Our goal is to facilitate the understanding of the relation between eye movements and the human relevance estimation process, to understand the strengths and weaknesses of a model at hand, and, eventually, to enable researchers to build more effective models. |
Lauer, Luisa; Javaheri, Hamraz; Altmeyer, Kristin; Malone, Sarah; Grünerbl, Agnes; Barz, Michael; Peschel, Markus; Brünken, Roland; Lukowicz, Paul Encountering Students' Learning Difficulties in Electrics - Didactical Concept and Prototype of Augmented Reality-Toolkit Inproceedings Fostering scientific citizenship in an uncertain world - ESERA 2021 e-Proceedings, University of Minho, 2022. @inproceedings{12121, title = {Encountering Students' Learning Difficulties in Electrics - Didactical Concept and Prototype of Augmented Reality-Toolkit}, author = {Luisa Lauer and Hamraz Javaheri and Kristin Altmeyer and Sarah Malone and Agnes Grünerbl and Michael Barz and Markus Peschel and Roland Brünken and Paul Lukowicz}, url = {https://www.dfki.de/fileadmin/user_upload/import/12121_2022_Encountering_Students'_Learning_Difficulties_in_Electrics_-_Didactical_Concept_and_Prototype_of_Augmented_Reality-Toolkit.pdf}, year = {2022}, date = {2022-01-01}, booktitle = {Fostering scientific citizenship in an uncertain world - ESERA 2021 e-Proceedings}, publisher = {University of Minho}, abstract = {Real-time visualization of electrical circuit scematics in accordance to the components’ semantic connection• Use of the toolkit may faciliate the acquisition of representational competencies (concerning the matching of components and symbols and the matching of circuits and circuit schematics)• Usable with either handheld AR-devices or head-mounted AR-devices}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Real-time visualization of electrical circuit scematics in accordance to the components’ semantic connection• Use of the toolkit may faciliate the acquisition of representational competencies (concerning the matching of components and symbols and the matching of circuits and circuit schematics)• Usable with either handheld AR-devices or head-mounted AR-devices |
Nguyen, Ho Minh Duy; Henschel, Roberto; Rosenhahn, Bodo; Sonntag, Daniel; Swoboda, Paul LMGP: Lifted Multicut Meets Geometry Projections for Multi-Camera Multi-Object Tracking Inproceedings Conference on Computer Vision and Pattern Recognition (CVPR) 2022, IEEE/CVF, 2022. @inproceedings{12286, title = {LMGP: Lifted Multicut Meets Geometry Projections for Multi-Camera Multi-Object Tracking}, author = {Ho Minh Duy Nguyen and Roberto Henschel and Bodo Rosenhahn and Daniel Sonntag and Paul Swoboda}, url = {https://arxiv.org/pdf/2111.11892.pdf}, year = {2022}, date = {2022-01-01}, booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR) 2022}, publisher = {IEEE/CVF}, abstract = {Multi-Camera Multi-Object Tracking is currently drawing attention in the computer vision field due to its superior performance in real-world applications such as video surveillance with crowded scenes or in vast space. In this work, we propose a mathematically elegant multi-camera multiple object tracking approach based on a spatial-temporal lifted multicut formulation. Our model utilizes state-of-the-art tracklets produced by single-camera trackers as proposals. As these tracklets may contain ID-Switch errors, we refine them through a novel pre-clustering obtained from 3D geometry projections. As a result, we derive a better tracking graph without ID switches and more precise affinity costs for the data association phase. Tracklets are then matched to multi-camera trajectories by solving a global lifted multicut formulation that incorporates short and long-range temporal interactions on tracklets located in the same camera as well as inter-camera ones. Experimental results on the WildTrack dataset yield near-perfect result, outperforming state-of-the-art trackers on Campus while being on par on the PETS-09 dataset. We will make our implementations available upon acceptance of the paper.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Multi-Camera Multi-Object Tracking is currently drawing attention in the computer vision field due to its superior performance in real-world applications such as video surveillance with crowded scenes or in vast space. In this work, we propose a mathematically elegant multi-camera multiple object tracking approach based on a spatial-temporal lifted multicut formulation. Our model utilizes state-of-the-art tracklets produced by single-camera trackers as proposals. As these tracklets may contain ID-Switch errors, we refine them through a novel pre-clustering obtained from 3D geometry projections. As a result, we derive a better tracking graph without ID switches and more precise affinity costs for the data association phase. Tracklets are then matched to multi-camera trajectories by solving a global lifted multicut formulation that incorporates short and long-range temporal interactions on tracklets located in the same camera as well as inter-camera ones. Experimental results on the WildTrack dataset yield near-perfect result, outperforming state-of-the-art trackers on Campus while being on par on the PETS-09 dataset. We will make our implementations available upon acceptance of the paper. |
Gouvea, Thiago; Troshani, Ilira; Herrlich, Marc; Sonntag, Daniel Annotating sound events through interactive design of interpretable features Inproceedings Proceedings of the First International Conference on Hybrid Human-Machine Intelligence, IOS Press, 2022. @inproceedings{12428, title = {Annotating sound events through interactive design of interpretable features}, author = {Thiago Gouvea and Ilira Troshani and Marc Herrlich and Daniel Sonntag}, url = {https://www.hhai-conference.org/wp-content/uploads/2022/06/hhai2022-pd_paper_7726.pdf}, year = {2022}, date = {2022-01-01}, booktitle = {Proceedings of the First International Conference on Hybrid Human-Machine Intelligence}, publisher = {IOS Press}, abstract = {Professionals of all domains of expertise expect to take part in the benefits of the machine learning (ML) revolution, but realisation is often slowed down by lack of training in ML concepts and tools, as well as low availability of annotated data for supervised methods. Inspired by the problem of assessing the impact of human-generated activity on marine ecosystems through passive acoustic monitoring (PAM), we are developing Seadash, an interactive tool for event detection and classification in multivariate time series.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Professionals of all domains of expertise expect to take part in the benefits of the machine learning (ML) revolution, but realisation is often slowed down by lack of training in ML concepts and tools, as well as low availability of annotated data for supervised methods. Inspired by the problem of assessing the impact of human-generated activity on marine ecosystems through passive acoustic monitoring (PAM), we are developing Seadash, an interactive tool for event detection and classification in multivariate time series. |
Gouvêa, Thiago S; Troshani, Ilira; Herrlich, Marc; Sonntag, Daniel Interactive design of interpretable features for marine soundscape data annotation Inproceedings Workshop on Human-centered Design of Symbiotic Hybrid Intelligence, HHAI, 2022. @inproceedings{12429, title = {Interactive design of interpretable features for marine soundscape data annotation}, author = {Thiago S Gouvêa and Ilira Troshani and Marc Herrlich and Daniel Sonntag}, year = {2022}, date = {2022-01-01}, booktitle = {Workshop on Human-centered Design of Symbiotic Hybrid Intelligence}, publisher = {HHAI}, abstract = {Machine learning (ML) is increasingly used in different application domains. However, to reach its full potential it is important that experts without extensive ML training be able to create and effectively apply models in their domain. This requires forms of co-learning that need to be facilitated by effective interfaces and interaction paradigms. Inspired by the problem of detecting and classifying sound events in marine soundscapes, we are developing Seadash. Through a rapid, iterative data exploration workflow, the user designs and curates features that capture meaningful structure in the data, and uses these to efficiently annotate the dataset. While the tool is still in early stages, we present the concept and discuss future directions.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Machine learning (ML) is increasingly used in different application domains. However, to reach its full potential it is important that experts without extensive ML training be able to create and effectively apply models in their domain. This requires forms of co-learning that need to be facilitated by effective interfaces and interaction paradigms. Inspired by the problem of detecting and classifying sound events in marine soundscapes, we are developing Seadash. Through a rapid, iterative data exploration workflow, the user designs and curates features that capture meaningful structure in the data, and uses these to efficiently annotate the dataset. While the tool is still in early stages, we present the concept and discuss future directions. |
Hartmann, Mareike; Sonntag, Daniel A survey on improving NLP models with human explanations Inproceedings Proceedings of the First Workshop on Learning with Natural Language Supervision, Association for Computational Linguistics, 2022. @inproceedings{12519, title = {A survey on improving NLP models with human explanations}, author = {Mareike Hartmann and Daniel Sonntag}, url = {https://aclanthology.org/2022.lnls-1.5.pdf}, year = {2022}, date = {2022-01-01}, booktitle = {Proceedings of the First Workshop on Learning with Natural Language Supervision}, publisher = {Association for Computational Linguistics}, abstract = {Training a model with access to human explanations can improve data efficiency and model performance on in- and out-of-domain data. Adding to these empirical findings, similarity with the process of human learning makes learning from explanations a promising way to establish a fruitful human-machine interaction. Several methods have been proposed for improving natural language processing (NLP) models with human explanations, that rely on different explanation types and mechanism for integrating these explanations into the learning process. These methods are rarely compared with each other, making it hard for practitioners to choose the best combination of explanation type and integration mechanism for a specific use-case. In this paper, we give an overview of different methods for learning from human explanations, and discuss different factors that can inform the decision of which method to choose for a specific use-case.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Training a model with access to human explanations can improve data efficiency and model performance on in- and out-of-domain data. Adding to these empirical findings, similarity with the process of human learning makes learning from explanations a promising way to establish a fruitful human-machine interaction. Several methods have been proposed for improving natural language processing (NLP) models with human explanations, that rely on different explanation types and mechanism for integrating these explanations into the learning process. These methods are rarely compared with each other, making it hard for practitioners to choose the best combination of explanation type and integration mechanism for a specific use-case. In this paper, we give an overview of different methods for learning from human explanations, and discuss different factors that can inform the decision of which method to choose for a specific use-case. |
Graf, Linda; Altmeyer, Maximilian; Emmerich, Katharina; Herrlich, Marc; Krekhov, Andrey; Spiel, Katta Development and Validation of a German Version of the Player Experience Inventory (PXI) Inproceedings Proceedings of the Mensch und Computer Conference, ACM, 2022. @inproceedings{12535, title = {Development and Validation of a German Version of the Player Experience Inventory (PXI)}, author = {Linda Graf and Maximilian Altmeyer and Katharina Emmerich and Marc Herrlich and Andrey Krekhov and Katta Spiel}, url = {https://www.dfki.de/fileadmin/user_upload/import/12535_MuC22__German_PXI_Version.pdf}, year = {2022}, date = {2022-01-01}, booktitle = {Proceedings of the Mensch und Computer Conference}, publisher = {ACM}, abstract = {The Player Experience Inventory (PXI), initially developed by Abeele et al. (2020), measures player experiences among English-speaking players. However, empirically validated translations of the PXI are sparse, limiting the use of the scale among non-English speaking players. In this paper, we address this issue by providing a translated version of the scale in German, the most widely spoken first language in the European Union. After translating the original items, we conducted a confirmatory factor analysis (N=506) to validate the German version of the PXI. Our results confirmed a 10-factor model - which the original authors of the instrument suggested - and show that the German PXI has valid psychometric properties. While model fit, internal consistency and convergent validity were acceptable, there was room for improvement regarding discriminant validity. Based on our results, we advocate for the German PXI as a valid and reliable instrument for assessing player experiences in German-speaking samples.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The Player Experience Inventory (PXI), initially developed by Abeele et al. (2020), measures player experiences among English-speaking players. However, empirically validated translations of the PXI are sparse, limiting the use of the scale among non-English speaking players. In this paper, we address this issue by providing a translated version of the scale in German, the most widely spoken first language in the European Union. After translating the original items, we conducted a confirmatory factor analysis (N=506) to validate the German version of the PXI. Our results confirmed a 10-factor model - which the original authors of the instrument suggested - and show that the German PXI has valid psychometric properties. While model fit, internal consistency and convergent validity were acceptable, there was room for improvement regarding discriminant validity. Based on our results, we advocate for the German PXI as a valid and reliable instrument for assessing player experiences in German-speaking samples. |
Rekrut, Maurice; Selim, Abdulrahman Mohamed; Krüger, Antonio Improving Silent Speech BCI Training Procedures through Transfer from Overt to Silent Speech Inproceedings Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, IEEE, 2022. @inproceedings{12619, title = {Improving Silent Speech BCI Training Procedures through Transfer from Overt to Silent Speech}, author = {Maurice Rekrut and Abdulrahman Mohamed Selim and Antonio Krüger}, year = {2022}, date = {2022-01-01}, booktitle = {Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics}, publisher = {IEEE}, abstract = {Silent speech Brain-Computer Interfaces (BCIs) try to decode imagined speech from brain activity. Those BCIs require a tremendous amount of training data usually collected during mentally and physically exhausting sessions in which participants silently repeat words presented on a screen for several hours. Within this work we present an approach to overcome those exhausting sessions by training a silent speech classifier on data recorded while speaking certain words and transferring this classifier to EEG data recorded during silent repetition of the same words. This approach does not only allow for a less mentally and physically exhausting training procedure but also for a more productive one as the overt speech output can be used for interaction while the classifier for silent speech is trained simultaneously. We evaluated our approach in a study in which 15 participants navigated a virtual robot on a screen in a game like scenario through a maze once with 5 overtly spoken and once with the same 5 silently spoken command words. In an offline analysis we trained a classifier on overt speech data and let it predict silent speech data. Our classification results do not only show successful results for the transfer (61.78%) significantly above chance level but also comparable results to a standard silents speech classifier (71.48%) trained and tested on the same data. These results illustrate the potential of the method to replace the currently tedious training procedures for silent speech BCIs with a more comfortable, engaging and productive approach by a transfer from overt to silent speech.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Silent speech Brain-Computer Interfaces (BCIs) try to decode imagined speech from brain activity. Those BCIs require a tremendous amount of training data usually collected during mentally and physically exhausting sessions in which participants silently repeat words presented on a screen for several hours. Within this work we present an approach to overcome those exhausting sessions by training a silent speech classifier on data recorded while speaking certain words and transferring this classifier to EEG data recorded during silent repetition of the same words. This approach does not only allow for a less mentally and physically exhausting training procedure but also for a more productive one as the overt speech output can be used for interaction while the classifier for silent speech is trained simultaneously. We evaluated our approach in a study in which 15 participants navigated a virtual robot on a screen in a game like scenario through a maze once with 5 overtly spoken and once with the same 5 silently spoken command words. In an offline analysis we trained a classifier on overt speech data and let it predict silent speech data. Our classification results do not only show successful results for the transfer (61.78%) significantly above chance level but also comparable results to a standard silents speech classifier (71.48%) trained and tested on the same data. These results illustrate the potential of the method to replace the currently tedious training procedures for silent speech BCIs with a more comfortable, engaging and productive approach by a transfer from overt to silent speech. |
Kuznetsov, Konstantin; Barz, Michael; Sonntag, Daniel SpellInk: Interactive correction of spelling mistakes in handwritten text Inproceedings Proceedings of the First International Conference on Hybrid Human-Machine Intelligence, pp. 278-280, IOS Press, De Boelelaan 1105, 1081 HV Amsterdam, Netherlands, 2022. @inproceedings{12621, title = {SpellInk: Interactive correction of spelling mistakes in handwritten text}, author = {Konstantin Kuznetsov and Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/12621_hhai22_demo_spellink.pdf https://www.hhai-conference.org/demos/pd_paper_5349/}, year = {2022}, date = {2022-01-01}, booktitle = {Proceedings of the First International Conference on Hybrid Human-Machine Intelligence}, volume = {354}, pages = {278-280}, publisher = {IOS Press}, address = {De Boelelaan 1105, 1081 HV Amsterdam, Netherlands}, abstract = {Despite the current dominance of typed text, writing by hand remains the most natural mean of written communication and information keeping. Still, digital pen input provides limited user experience and lacks flexibility, as most of the manipulations are performed on a digitalized version of the text. In this paper, we present our prototype that enables spellchecking for handwritten text: it allows users to interactively correct misspellings directly in a handwritten script. We plan to study the usability of the proposed user interface and its acceptance by users. Also, we aim to investigate how user feedback can be used to incrementally improve the underlying recognition models.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Despite the current dominance of typed text, writing by hand remains the most natural mean of written communication and information keeping. Still, digital pen input provides limited user experience and lacks flexibility, as most of the manipulations are performed on a digitalized version of the text. In this paper, we present our prototype that enables spellchecking for handwritten text: it allows users to interactively correct misspellings directly in a handwritten script. We plan to study the usability of the proposed user interface and its acceptance by users. Also, we aim to investigate how user feedback can be used to incrementally improve the underlying recognition models. |
Céard-Falkenberg, Felix; Kuznetsov, Konstantin; Prange, Alexander; Barz, Michael; Sonntag, Daniel pEncode: A Tool for Visualizing Pen Signal Encodings in Real-time Inproceedings Proceedings of the First International Conference on Hybrid Human-Machine Intelligence, pp. 281-284, IOS Press, De Boelelaan 1105, 1081 HV Amsterdam, Netherlands, 2022. @inproceedings{12622, title = {pEncode: A Tool for Visualizing Pen Signal Encodings in Real-time}, author = {Felix Céard-Falkenberg and Konstantin Kuznetsov and Alexander Prange and Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/12622_hhai22_demo_pencode.pdf https://www.youtube.com/watch?v=t80aa2E5jKo}, year = {2022}, date = {2022-01-01}, booktitle = {Proceedings of the First International Conference on Hybrid Human-Machine Intelligence}, volume = {354}, pages = {281-284}, publisher = {IOS Press}, address = {De Boelelaan 1105, 1081 HV Amsterdam, Netherlands}, abstract = {Many features have been proposed for encoding the input signal from digital pens and touch-based interaction. They are widely used for analyzing and classifying handwritten texts, sketches, or gestures. Although they are well defined mathematically, many features are non-trivial and therefore difficult to understand for a human. In this paper, we present an application that visualizes a subset from 114 digital pen features in real-time while drawing. It provides an easy-to-use interface that allows application developers and machine learning practitioners to learn how digital pen features encode their inputs, helps in the feature selection process, and enables rapid prototyping of sketch and gesture classifiers.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Many features have been proposed for encoding the input signal from digital pens and touch-based interaction. They are widely used for analyzing and classifying handwritten texts, sketches, or gestures. Although they are well defined mathematically, many features are non-trivial and therefore difficult to understand for a human. In this paper, we present an application that visualizes a subset from 114 digital pen features in real-time while drawing. It provides an easy-to-use interface that allows application developers and machine learning practitioners to learn how digital pen features encode their inputs, helps in the feature selection process, and enables rapid prototyping of sketch and gesture classifiers. |
Bhatti, Omair Shahzad; Barz, Michael; Sonntag, Daniel Leveraging Implicit Gaze-Based User Feedback for Interactive Machine Learning Inproceedings Rodermund, Stephanie C; Timm, Ingo J; Malburg, Lukas; Bergmann, Ralph (Ed.): KI 2022: Advances in Artificial Intelligence, pp. 9-16, Springer International Publishing, 2022. @inproceedings{12633, title = {Leveraging Implicit Gaze-Based User Feedback for Interactive Machine Learning}, author = {Omair Shahzad Bhatti and Michael Barz and Daniel Sonntag}, editor = {Stephanie C Rodermund and Ingo J Timm and Lukas Malburg and Ralph Bergmann}, url = {https://www.dfki.de/fileadmin/user_upload/import/12633_Leveraging_implicit_gaze_based_user_feedback_for_interactive_machine_learning__KI_22__Accepted__(6).pdf}, doi = {https://doi.org/10.1007/978-3-031-15791-2}, year = {2022}, date = {2022-01-01}, booktitle = {KI 2022: Advances in Artificial Intelligence}, pages = {9-16}, publisher = {Springer International Publishing}, abstract = {Interactive Machine Learning (IML) systems incorporate humans into the learning process to enable iterative and continuous model improvements. The interactive process can be designed to leverage the expertise of domain experts with no background in machine learning, for instance, through repeated user feedback requests. However, excessive requests can be perceived as annoying and cumbersome and could reduce user trust. Hence, it is mandatory to establish an efficient dialog between a user and a machine learning system. We aim to detect when a domain expert disagrees with the output of a machine learning system by observing its eye movements and facial expressions. In this paper, we describe our approach for modelling user disagreement and discuss how such a model could be used for triggering user feedback requests in the context of interactive machine learning.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Interactive Machine Learning (IML) systems incorporate humans into the learning process to enable iterative and continuous model improvements. The interactive process can be designed to leverage the expertise of domain experts with no background in machine learning, for instance, through repeated user feedback requests. However, excessive requests can be perceived as annoying and cumbersome and could reduce user trust. Hence, it is mandatory to establish an efficient dialog between a user and a machine learning system. We aim to detect when a domain expert disagrees with the output of a machine learning system by observing its eye movements and facial expressions. In this paper, we describe our approach for modelling user disagreement and discuss how such a model could be used for triggering user feedback requests in the context of interactive machine learning. |
Miscellaneous |
Szeier, Szilvia; Baffy, Benjámin; Baranyi, Gábor; Skaf, Joul; Kopácsi, László; Sonntag, Daniel; Sörös, Gábor; andrincz, András Lő 3D Semantic Label Transfer and Matching in Human-Robot Collaboration Miscellaneous 2022. @misc{12900, title = {3D Semantic Label Transfer and Matching in Human-Robot Collaboration}, author = {Szilvia Szeier and Benjámin Baffy and Gábor Baranyi and Joul Skaf and László Kopácsi and Daniel Sonntag and Gábor Sörös and András Lő andrincz}, url = {https://www.dfki.de/fileadmin/user_upload/import/12900_0003_paper.pdf https://learn3dg.github.io/}, year = {2022}, date = {2022-10-01}, publisher = {Learning to Generate 3D Shapes and Scenes, ECCV 2022 Workshop}, abstract = {Semantic 3D maps are highly useful for human-robot collaboration and joint task planning. We build upon an existing real-time 3D semantic reconstruction pipeline and extend it with semantic matching across human and robot viewpoints, which is required if class labels differ or are missing due to different perspectives during collaborative reconstruction. We use deep recognition networks, which usually perform well from higher (human) viewpoints but are inferior from ground robot viewpoints. Therefore, we propose several approaches for acquiring semantic labels for unusual perspectives. We group the pixels from the lower viewpoint, project voxel class labels of the upper perspective to the lower perspective and apply majority voting to obtain labels for the robot. The quality of the reconstruction is evaluated in the Habitat simulator and in a real environment using a robot car equipped with an RGBD camera. The proposed approach can provide high-quality semantic segmentation from the robot perspective with accuracy similar to the human perspective. Furthermore, as computations are close to real time, the approach enables interactive applications.}, keywords = {}, pubstate = {published}, tppubtype = {misc} } Semantic 3D maps are highly useful for human-robot collaboration and joint task planning. We build upon an existing real-time 3D semantic reconstruction pipeline and extend it with semantic matching across human and robot viewpoints, which is required if class labels differ or are missing due to different perspectives during collaborative reconstruction. We use deep recognition networks, which usually perform well from higher (human) viewpoints but are inferior from ground robot viewpoints. Therefore, we propose several approaches for acquiring semantic labels for unusual perspectives. We group the pixels from the lower viewpoint, project voxel class labels of the upper perspective to the lower perspective and apply majority voting to obtain labels for the robot. The quality of the reconstruction is evaluated in the Habitat simulator and in a real environment using a robot car equipped with an RGBD camera. The proposed approach can provide high-quality semantic segmentation from the robot perspective with accuracy similar to the human perspective. Furthermore, as computations are close to real time, the approach enables interactive applications. |
Anagnostopoulou, Aliki; Hartmann, Mareike; Sonntag, Daniel Putting Humans in the Image Captioning Loop Miscellaneous Bridging Human-Computer Interaction and Natural Language Processing (NAACL 2022), 2022. @misc{12516, title = {Putting Humans in the Image Captioning Loop}, author = {Aliki Anagnostopoulou and Mareike Hartmann and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/12516_5.pdf https://drive.google.com/file/d/1WT1Emfc76Myv_PujMXaWI4ucqF9eegqC/view}, year = {2022}, date = {2022-07-01}, abstract = {Image Captioning (IC) models can highly benefit from human feedback in the training process, especially in cases where data is limited. We present work-in-progress on adapting an IC system to integrate human feedback, with the goal to make it easily adaptable to user-specific data. Our approach builds on a base IC model pre-trained on the MS COCO dataset, which generates captions for unseen images. The user will then be able to offer feedback on the image and the generated/predicted caption, which will be augmented to create additional training instances for the adaptation of the model. The additional instances are integrated into the model using step-wise updates, and a sparse memory replay component is used to avoid catastrophic forgetting. We hope that this approach, while leading to improved results, will also result in customizable IC models.}, howpublished = {Bridging Human-Computer Interaction and Natural Language Processing (NAACL 2022)}, keywords = {}, pubstate = {published}, tppubtype = {misc} } Image Captioning (IC) models can highly benefit from human feedback in the training process, especially in cases where data is limited. We present work-in-progress on adapting an IC system to integrate human feedback, with the goal to make it easily adaptable to user-specific data. Our approach builds on a base IC model pre-trained on the MS COCO dataset, which generates captions for unseen images. The user will then be able to offer feedback on the image and the generated/predicted caption, which will be augmented to create additional training instances for the adaptation of the model. The additional instances are integrated into the model using step-wise updates, and a sparse memory replay component is used to avoid catastrophic forgetting. We hope that this approach, while leading to improved results, will also result in customizable IC models. |
Hartmann, Mareike; Anagnostopoulou, Aliki; Sonntag, Daniel Interactive Machine Learning for Image Captioning Miscellaneous The AAAI-22 Workshop on Interactive Machine Learning, 2022. @misc{12167, title = {Interactive Machine Learning for Image Captioning}, author = {Mareike Hartmann and Aliki Anagnostopoulou and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/12167_interactive_learning_for_image_captioning.pdf}, year = {2022}, date = {2022-02-01}, abstract = {We propose an approach for interactive learning for an image captioning model. As human feedback is expensive and modern neural network based approaches often require large amounts of supervised data to be trained, we envision a system that exploits human feedback as good as possible by multiplying the feedback using data augmentation methods, and integrating the resulting training examples into the model in a smart way. This approach has three key components, for which we need to find suitable practical implementations: feedback collection, data augmentation, and model update. We outline our idea and review different possibilities to address these tasks.}, howpublished = {The AAAI-22 Workshop on Interactive Machine Learning}, keywords = {}, pubstate = {published}, tppubtype = {misc} } We propose an approach for interactive learning for an image captioning model. As human feedback is expensive and modern neural network based approaches often require large amounts of supervised data to be trained, we envision a system that exploits human feedback as good as possible by multiplying the feedback using data augmentation methods, and integrating the resulting training examples into the model in a smart way. This approach has three key components, for which we need to find suitable practical implementations: feedback collection, data augmentation, and model update. We outline our idea and review different possibilities to address these tasks. |
Technical Reports |
Nguyen, Ho Minh Duy; Henschel, Roberto; Rosenhahn, Bodo; Sonntag, Daniel; Swoboda, Paul LMGP: Lifted Multicut Meets Geometry Projections for Multi-Camera Multi-Object Tracking Technical Report DFKI, MPI-INF , 2022. @techreport{12211, title = {LMGP: Lifted Multicut Meets Geometry Projections for Multi-Camera Multi-Object Tracking}, author = {Ho Minh Duy Nguyen and Roberto Henschel and Bodo Rosenhahn and Daniel Sonntag and Paul Swoboda}, url = {https://arxiv.org/pdf/2111.11892.pdf}, year = {2022}, date = {2022-01-01}, volume = {01}, institution = {DFKI, MPI-INF}, abstract = {Multi-Camera Multi-Object Tracking is currently drawing attention in the computer vision field due to its superior performance in real-world applications such as video surveillance with crowded scenes or in vast space. In this work, we propose a mathematically elegant multi-camera multiple object tracking approach based on a spatial-temporal lifted multicut formulation. Our model utilizes state-of-the-art tracklets produced by single-camera trackers as proposals. As these tracklets may contain ID-Switch errors, we refine them through a novel pre-clustering obtained from 3D geometry projections. As a result, we derive a better tracking graph without ID switches and more precise affinity costs for the data association phase. Tracklets are then matched to multi-camera trajectories by solving a global lifted multicut formulation that incorporates short and long-range temporal interactions on tracklets located in the same camera as well as inter-camera ones. Experimental results on the WildTrack dataset yield near-perfect result, outperforming state-of-the-art trackers on Campus while being on par on the PETS-09 dataset. We will make our implementations available upon acceptance of the paper.}, keywords = {}, pubstate = {published}, tppubtype = {techreport} } Multi-Camera Multi-Object Tracking is currently drawing attention in the computer vision field due to its superior performance in real-world applications such as video surveillance with crowded scenes or in vast space. In this work, we propose a mathematically elegant multi-camera multiple object tracking approach based on a spatial-temporal lifted multicut formulation. Our model utilizes state-of-the-art tracklets produced by single-camera trackers as proposals. As these tracklets may contain ID-Switch errors, we refine them through a novel pre-clustering obtained from 3D geometry projections. As a result, we derive a better tracking graph without ID switches and more precise affinity costs for the data association phase. Tracklets are then matched to multi-camera trajectories by solving a global lifted multicut formulation that incorporates short and long-range temporal interactions on tracklets located in the same camera as well as inter-camera ones. Experimental results on the WildTrack dataset yield near-perfect result, outperforming state-of-the-art trackers on Campus while being on par on the PETS-09 dataset. We will make our implementations available upon acceptance of the paper. |
2021 |
Journal Articles |
Sonntag, Daniel Künstliche Intelligenz in der Medizin und Gynäkologie – Holzweg oder Heilversprechen? Journal Article Der Gynäkologe, 1 , pp. 1-7, 2021. @article{11612, title = {Künstliche Intelligenz in der Medizin und Gynäkologie – Holzweg oder Heilversprechen?}, author = {Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11612_sonntag-gyn.pdf}, year = {2021}, date = {2021-04-01}, journal = {Der Gynäkologe}, volume = {1}, pages = {1-7}, publisher = {Springer}, abstract = {Künstliche Intelligenz (KI) hat in den letzten Jahren eine neue Reifephase erreicht und entwickelt sich zum Treiber der Digitalisierung in allen Lebensbereichen. Die KI ist eine Querschnittstechnologie, die für alle Bereiche der Medizin mit Bild‑, Text- und Biodaten von großer Bedeutung ist. Es gibt keinen medizinischen Bereich, der nicht von KI beeinflusst werden wird. Dabei spielt die klinische Entscheidungsunterstützung eine wichtige Rolle. KI-Methoden etablieren sich gerade beim medizinischen Workflow-Management und bei der Vorhersage des Behandlungserfolgs bzw. des Behandlungsergebnisses. KI-Systeme können bereits in Bilddiagnose und im Patientenmanagement unterstützen, aber keine kritischen Entscheidungen vorschlagen. Die jeweiligen Präventions- oder Therapiemaßnahmen können mit KI-Unterstützung sinnvoller bewertet werden, allerdings ist die Abdeckung der Krankheiten noch viel zu gering, um robuste Systeme für den klinischen Alltag zu erstellen. Der flächendeckende Einsatz setzt Fortbildungsmaßnahmen für Ärzte voraus, um die Entscheidung treffen zu können, wann auf automatische Entscheidungsunterstützung vertraut werden kann. Artificial intelligence (AI) has attained a new level of maturity in recent years and is becoming the driver of digitalization in all areas of life. AI is a cross-sectional technology with great importance for all areas of medicine employing image data, text data and bio-data. There is no medical field that will remain unaffected by AI, with AI-assisted clinical decision-making assuming a particularly important role. AI methods are becoming established in medical workflow management and for prediction of treatment success or treatment outcome. AI systems are already able to lend support to imaging-based diagnosis and patient management, but cannot suggest critical decisions. The corresponding preventive or therapeutic measures can be more rationally assessed with the help of AI, although the number of diseases covered is currently too low to create robust systems for routine clinical use. Prerequisite for the widespread use of AI systems is appropriate training to enable physicians to decide when computer-assisted decision-making can be relied upon.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Künstliche Intelligenz (KI) hat in den letzten Jahren eine neue Reifephase erreicht und entwickelt sich zum Treiber der Digitalisierung in allen Lebensbereichen. Die KI ist eine Querschnittstechnologie, die für alle Bereiche der Medizin mit Bild‑, Text- und Biodaten von großer Bedeutung ist. Es gibt keinen medizinischen Bereich, der nicht von KI beeinflusst werden wird. Dabei spielt die klinische Entscheidungsunterstützung eine wichtige Rolle. KI-Methoden etablieren sich gerade beim medizinischen Workflow-Management und bei der Vorhersage des Behandlungserfolgs bzw. des Behandlungsergebnisses. KI-Systeme können bereits in Bilddiagnose und im Patientenmanagement unterstützen, aber keine kritischen Entscheidungen vorschlagen. Die jeweiligen Präventions- oder Therapiemaßnahmen können mit KI-Unterstützung sinnvoller bewertet werden, allerdings ist die Abdeckung der Krankheiten noch viel zu gering, um robuste Systeme für den klinischen Alltag zu erstellen. Der flächendeckende Einsatz setzt Fortbildungsmaßnahmen für Ärzte voraus, um die Entscheidung treffen zu können, wann auf automatische Entscheidungsunterstützung vertraut werden kann. Artificial intelligence (AI) has attained a new level of maturity in recent years and is becoming the driver of digitalization in all areas of life. AI is a cross-sectional technology with great importance for all areas of medicine employing image data, text data and bio-data. There is no medical field that will remain unaffected by AI, with AI-assisted clinical decision-making assuming a particularly important role. AI methods are becoming established in medical workflow management and for prediction of treatment success or treatment outcome. AI systems are already able to lend support to imaging-based diagnosis and patient management, but cannot suggest critical decisions. The corresponding preventive or therapeutic measures can be more rationally assessed with the help of AI, although the number of diseases covered is currently too low to create robust systems for routine clinical use. Prerequisite for the widespread use of AI systems is appropriate training to enable physicians to decide when computer-assisted decision-making can be relied upon. |
Kapp, Sebastian; Barz, Michael; Mukhametov, Sergey; Sonntag, Daniel; Kuhn, Jochen ARETT: Augmented Reality Eye Tracking Toolkit for Head Mounted Displays Journal Article Sensors - Open Access Journal, 21 , pp. 18, 2021. @article{11528, title = {ARETT: Augmented Reality Eye Tracking Toolkit for Head Mounted Displays}, author = {Sebastian Kapp and Michael Barz and Sergey Mukhametov and Daniel Sonntag and Jochen Kuhn}, url = {https://www.dfki.de/fileadmin/user_upload/import/11528_2021_ARETT-_Augmented_Reality_Eye_Tracking_Toolkit_for_Head_Mounted_Displays.pdf https://www.mdpi.com/1424-8220/21/6/2234}, year = {2021}, date = {2021-01-01}, journal = {Sensors - Open Access Journal}, volume = {21}, pages = {18}, publisher = {Multidisciplinary Digital Publishing Institute (MDPI)}, abstract = {Currently an increasing number of head mounted displays (HMD) for virtual and augmented reality (VR/AR) are equipped with integrated eye trackers. Use cases of these integrated eye trackers include rendering optimization and gaze-based user interaction. In addition, visual attention in VR and AR is interesting for applied research based on eye tracking in cognitive or educational sciences for example. While some research toolkits for VR already exist, only a few target AR scenarios. In this work, we present an open-source eye tracking toolkit for reliable gaze data acquisition in AR based on Unity 3D and the Microsoft HoloLens 2, as well as an R package for seamless data analysis. Furthermore, we evaluate the spatial accuracy and precision of the integrated eye tracker for fixation targets with different distances and angles to the user (n=21). On average, we found that gaze estimates are reported with an angular accuracy of 0.83 degrees and a precision of 0.27 degrees while the user is resting, which is on par with state-of-the-art mobile eye trackers.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Currently an increasing number of head mounted displays (HMD) for virtual and augmented reality (VR/AR) are equipped with integrated eye trackers. Use cases of these integrated eye trackers include rendering optimization and gaze-based user interaction. In addition, visual attention in VR and AR is interesting for applied research based on eye tracking in cognitive or educational sciences for example. While some research toolkits for VR already exist, only a few target AR scenarios. In this work, we present an open-source eye tracking toolkit for reliable gaze data acquisition in AR based on Unity 3D and the Microsoft HoloLens 2, as well as an R package for seamless data analysis. Furthermore, we evaluate the spatial accuracy and precision of the integrated eye tracker for fixation targets with different distances and angles to the user (n=21). On average, we found that gaze estimates are reported with an angular accuracy of 0.83 degrees and a precision of 0.27 degrees while the user is resting, which is on par with state-of-the-art mobile eye trackers. |
Somfai, Ellák; Baffy, Benjámin; Fenech, Kristian; Guo, Changlu; Hosszú, Rita; Korózs, Dorina; Nunnari, Fabrizio; Pólik, Marcell; Sonntag, Daniel; Ulbert, Attila; Lorincz, András Minimizing false negative rate in melanoma detection and providing insight into the causes of classification Journal Article Computing Research Repository eprint Journal, abs/2102.09199 , pp. 1-14, 2021. @article{11613, title = {Minimizing false negative rate in melanoma detection and providing insight into the causes of classification}, author = {Ellák Somfai and Benjámin Baffy and Kristian Fenech and Changlu Guo and Rita Hosszú and Dorina Korózs and Fabrizio Nunnari and Marcell Pólik and Daniel Sonntag and Attila Ulbert and András Lorincz}, url = {https://www.dfki.de/fileadmin/user_upload/import/11613_2021_Minimizing_false_negative_rate_in_melanoma_detection_and_providing_insight_into_the_causes_of_classification.pdf https://arxiv.org/abs/2102.09199}, year = {2021}, date = {2021-01-01}, journal = {Computing Research Repository eprint Journal}, volume = {abs/2102.09199}, pages = {1-14}, publisher = {arXiv}, abstract = {Our goal is to bridge human and machine intelligence in melanoma detection. We develop a classification system exploiting a combination of visual pre-processing, deep learning, and ensembling for providing explanations to experts and to minimize false negative rate while maintaining high accuracy in melanoma detection. Source images are first automatically segmented using a U-net CNN. The result of the segmentation is then used to extract image sub-areas and specific parameters relevant in human evaluation, namely center, border, and asymmetry measures. These data are then processed by tailored neural networks which include structure searching algorithms. Partial results are then ensembled by a committee machine. Our evaluation on the largest skin lesion dataset which is publicly available today, ISIC-2019, shows improvement in all evaluated metrics over a baseline using the original images only. We also showed that indicative scores computed by the feature classifiers can provide useful insight into the various features on which the decision can be based.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Our goal is to bridge human and machine intelligence in melanoma detection. We develop a classification system exploiting a combination of visual pre-processing, deep learning, and ensembling for providing explanations to experts and to minimize false negative rate while maintaining high accuracy in melanoma detection. Source images are first automatically segmented using a U-net CNN. The result of the segmentation is then used to extract image sub-areas and specific parameters relevant in human evaluation, namely center, border, and asymmetry measures. These data are then processed by tailored neural networks which include structure searching algorithms. Partial results are then ensembled by a committee machine. Our evaluation on the largest skin lesion dataset which is publicly available today, ISIC-2019, shows improvement in all evaluated metrics over a baseline using the original images only. We also showed that indicative scores computed by the feature classifiers can provide useful insight into the various features on which the decision can be based. |
Barz, Michael; Sonntag, Daniel Automatic Visual Attention Detection for Mobile Eye Tracking Using Pre-Trained Computer Vision Models and Human Gaze Journal Article Sensors - Open Access Journal, 21 , pp. 21, 2021. @article{11668, title = {Automatic Visual Attention Detection for Mobile Eye Tracking Using Pre-Trained Computer Vision Models and Human Gaze}, author = {Michael Barz and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11668_sensors-21-04143-v2.pdf https://www.mdpi.com/1424-8220/21/12/4143}, year = {2021}, date = {2021-01-01}, journal = {Sensors - Open Access Journal}, volume = {21}, pages = {21}, publisher = {MDPI}, abstract = {Processing visual stimuli in a scene is essential for the human brain to make situation-aware decisions. These stimuli, which are prevalent subjects of diagnostic eye tracking studies, are commonly encoded as rectangular areas of interest (AOIs) per frame. Because it is a tedious manual annotation task, the automatic detection and annotation of visual attention to AOIs can accelerate and objectify eye tracking research, in particular for mobile eye tracking with egocentric video feeds. In this work, we implement two methods to automatically detect visual attention to AOIs using pre-trained deep learning models for image classification and object detection. Furthermore, we develop an evaluation framework based on the VISUS dataset and well-known performance metrics from the field of activity recognition. We systematically evaluate our methods within this framework, discuss potentials and limitations, and propose ways to improve the performance of future automatic visual attention detection methods.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Processing visual stimuli in a scene is essential for the human brain to make situation-aware decisions. These stimuli, which are prevalent subjects of diagnostic eye tracking studies, are commonly encoded as rectangular areas of interest (AOIs) per frame. Because it is a tedious manual annotation task, the automatic detection and annotation of visual attention to AOIs can accelerate and objectify eye tracking research, in particular for mobile eye tracking with egocentric video feeds. In this work, we implement two methods to automatically detect visual attention to AOIs using pre-trained deep learning models for image classification and object detection. Furthermore, we develop an evaluation framework based on the VISUS dataset and well-known performance metrics from the field of activity recognition. We systematically evaluate our methods within this framework, discuss potentials and limitations, and propose ways to improve the performance of future automatic visual attention detection methods. |
Lauer, Luisa; Altmeyer, Kristin; Malone, Sarah; Barz, Michael; Brünken, Roland; Sonntag, Daniel; Peschel, Markus Investigating the Usability of a Head-Mounted Display Augmented Reality Device in Elementary School Children Journal Article Sensors - Open Access Journal, 21 , pp. 20, 2021. @article{11866, title = {Investigating the Usability of a Head-Mounted Display Augmented Reality Device in Elementary School Children}, author = {Luisa Lauer and Kristin Altmeyer and Sarah Malone and Michael Barz and Roland Brünken and Daniel Sonntag and Markus Peschel}, url = {https://www.dfki.de/fileadmin/user_upload/import/11866_sensors-21-06623.pdf https://www.mdpi.com/1424-8220/21/19/6623}, year = {2021}, date = {2021-01-01}, journal = {Sensors - Open Access Journal}, volume = {21}, pages = {20}, publisher = {MDPI}, abstract = {Augmenting reality via head-mounted displays (HMD-AR) is an emerging technology in education. The interactivity provided by HMD-AR devices is particularly promising for learning, but presents a challenge to human activity recognition, especially with children. Recent technological advances regarding speech and gesture recognition concerning Microsoft’s HoloLens 2 may address this prevailing issue. In a within-subjects study with 47 elementary school children (2nd to 6th grade), we examined the usability of the HoloLens 2 using a standardized tutorial on multimodal interaction in AR. The overall system usability was rated “good”. However, several behavioral metrics indicated that specific interaction modes differed in their efficiency. The results are of major importance for the development of learning applications in HMD-AR as they partially deviate from previous findings. In particular, the well-functioning recognition of children’s voice commands that we observed represents a novelty. Furthermore, we found different interaction preferences in HMD-AR among the children. We also found the use of HMD-AR to have a positive effect on children’s activity-related achievement emotions. Overall, our findings can serve as a basis for determining general requirements, possibilities, and limitations of the implementation of educational HMD-AR environments in elementary school classrooms.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Augmenting reality via head-mounted displays (HMD-AR) is an emerging technology in education. The interactivity provided by HMD-AR devices is particularly promising for learning, but presents a challenge to human activity recognition, especially with children. Recent technological advances regarding speech and gesture recognition concerning Microsoft’s HoloLens 2 may address this prevailing issue. In a within-subjects study with 47 elementary school children (2nd to 6th grade), we examined the usability of the HoloLens 2 using a standardized tutorial on multimodal interaction in AR. The overall system usability was rated “good”. However, several behavioral metrics indicated that specific interaction modes differed in their efficiency. The results are of major importance for the development of learning applications in HMD-AR as they partially deviate from previous findings. In particular, the well-functioning recognition of children’s voice commands that we observed represents a novelty. Furthermore, we found different interaction preferences in HMD-AR among the children. We also found the use of HMD-AR to have a positive effect on children’s activity-related achievement emotions. Overall, our findings can serve as a basis for determining general requirements, possibilities, and limitations of the implementation of educational HMD-AR environments in elementary school classrooms. |
Incollections |
Barz, Michael; Sonntag, Daniel Incremental Improvement of a Question Answering System by Re-ranking Answer Candidates Using Machine Learning Incollection Marchi, Erik; Siniscalchi, Sabato Marco; Cumani, Sandro; Salerno, Valerio Mario; Li, Haizhou (Ed.): Increasing Naturalness and Flexibility in Spoken Dialogue Interaction: 10th International Workshop on Spoken Dialogue Systems, pp. 367-379, Springer, 2021. @incollection{11522, title = {Incremental Improvement of a Question Answering System by Re-ranking Answer Candidates Using Machine Learning}, author = {Michael Barz and Daniel Sonntag}, editor = {Erik Marchi and Sabato Marco Siniscalchi and Sandro Cumani and Valerio Mario Salerno and Haizhou Li}, url = {https://www.dfki.de/fileadmin/user_upload/import/11522_2019_Incremental_Improvement_of_a_Question_Answering_System_by_Re-ranking_Answer_Candidates_using_Machine_Learning.pdf}, doi = {https://doi.org/10.1007/978-981-15-9323-9_34}, year = {2021}, date = {2021-01-01}, booktitle = {Increasing Naturalness and Flexibility in Spoken Dialogue Interaction: 10th International Workshop on Spoken Dialogue Systems}, pages = {367-379}, publisher = {Springer}, abstract = {We implement a method for re-ranking top-10 results of a state-of-the-art question answering (QA) system. The goal of our re-ranking approach is to improve the answer selection given the user question and the top-10 candidates. We focus on improving deployed QA systems that do not allow re-training or when re-training comes at a high cost. Our re-ranking approach learns a similarity function using n-gram based features using the query, the answer and the initial system confidence as input. Our contributions are: (1) we generate a QA training corpus starting from 877 answers from the customer care domain of T-Mobile Austria, (2) we implement a state-of-the-art QA pipeline using neural sentence embeddings that encode queries in the same space than the answer index, and (3) we evaluate the QA pipeline and our re-ranking approach using a separately provided test set. The test set can be considered to be available after deployment of the system, e.g., based on feedback of users. Our results show that the system performance, in terms of top-n accuracy and the mean reciprocal rank, benefits from re-ranking using gradient boosted regression trees. On average, the mean reciprocal rank improves by 9.15%9.15%9.15textbackslash%.}, keywords = {}, pubstate = {published}, tppubtype = {incollection} } We implement a method for re-ranking top-10 results of a state-of-the-art question answering (QA) system. The goal of our re-ranking approach is to improve the answer selection given the user question and the top-10 candidates. We focus on improving deployed QA systems that do not allow re-training or when re-training comes at a high cost. Our re-ranking approach learns a similarity function using n-gram based features using the query, the answer and the initial system confidence as input. Our contributions are: (1) we generate a QA training corpus starting from 877 answers from the customer care domain of T-Mobile Austria, (2) we implement a state-of-the-art QA pipeline using neural sentence embeddings that encode queries in the same space than the answer index, and (3) we evaluate the QA pipeline and our re-ranking approach using a separately provided test set. The test set can be considered to be available after deployment of the system, e.g., based on feedback of users. Our results show that the system performance, in terms of top-n accuracy and the mean reciprocal rank, benefits from re-ranking using gradient boosted regression trees. On average, the mean reciprocal rank improves by 9.15%9.15%9.15textbackslash%. |
Inproceedings |
Biswas, Rajarshi; Barz, Michael; Hartmann, Mareike; Sonntag, Daniel Improving German Image Captions using Machine Translation and Transfer Learning Inproceedings Espinosa-Anke, Luis; Martin-Vide, Carlos; Spasic, Irena (Ed.): Statistical Language and Speech Processing SLSP 2021, Springer, Council Chamber Glamorgan Building King Edward VII Ave Cathays Park Cardiff CF10 3WT, 2021. @inproceedings{11805, title = {Improving German Image Captions using Machine Translation and Transfer Learning}, author = {Rajarshi Biswas and Michael Barz and Mareike Hartmann and Daniel Sonntag}, editor = {Luis Espinosa-Anke and Carlos Martin-Vide and Irena Spasic}, url = {https://www.dfki.de/fileadmin/user_upload/import/11805_SLSP2021Paper.pdf}, year = {2021}, date = {2021-11-01}, booktitle = {Statistical Language and Speech Processing SLSP 2021}, publisher = {Springer}, address = {Council Chamber Glamorgan Building King Edward VII Ave Cathays Park Cardiff CF10 3WT}, abstract = {Image captioning is a complex artificial intelligence task that involves many fundamental questions of data representation, learning, and natural language processing. In addition, most of the work in this domain addresses the English language because of the high availability of annotated training data compared to other languages. Therefore, we investigate methods for image captioning in German that transfer knowledge from English training data. We explore four different methods for generating image captions in German, two baseline methods and two more advanced ones based on transfer learning. The baseline methods are based on a state-of-the-art model which we train using a translated version of the English MS COCO dataset and the smaller German Multi30K dataset, respectively. Both advanced methods are pre-trained using the translated MS COCO dataset and fine-tuned for German on the Multi30K dataset. One of these methods uses an alternative attention mechanism from the literature that showed a good performance in English image captioning. We compare the performance of all methods for the Multi30K test set in German using common automatic evaluation metrics. We show that our advanced method with the alternative attention mechanism presents a new baseline for German BLEU, ROUGE, CIDEr, and SPICE scores, and achieves a relative improvement of 21.2 % in BLEU-4 score compared to the current state-of-the-art in German image captioning.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Image captioning is a complex artificial intelligence task that involves many fundamental questions of data representation, learning, and natural language processing. In addition, most of the work in this domain addresses the English language because of the high availability of annotated training data compared to other languages. Therefore, we investigate methods for image captioning in German that transfer knowledge from English training data. We explore four different methods for generating image captions in German, two baseline methods and two more advanced ones based on transfer learning. The baseline methods are based on a state-of-the-art model which we train using a translated version of the English MS COCO dataset and the smaller German Multi30K dataset, respectively. Both advanced methods are pre-trained using the translated MS COCO dataset and fine-tuned for German on the Multi30K dataset. One of these methods uses an alternative attention mechanism from the literature that showed a good performance in English image captioning. We compare the performance of all methods for the Multi30K test set in German using common automatic evaluation metrics. We show that our advanced method with the alternative attention mechanism presents a new baseline for German BLEU, ROUGE, CIDEr, and SPICE scores, and achieves a relative improvement of 21.2 % in BLEU-4 score compared to the current state-of-the-art in German image captioning. |
Hartmann, Mareike; de Lhoneux, Miryam; Hershcovich, Daniel; Kementchedjhieva, Yova; Nielsen, Lukas; Qiu, Chen; Søgaard, Anders A Multilingual Benchmark for Probing Negation-Awareness with Minimal Pairs Inproceedings Proceedings of the 25th Conference on Computational Natural Language Learning (CoNLL), pp. 224-257, Association for Computational Linguistics, 2021. @inproceedings{11846, title = {A Multilingual Benchmark for Probing Negation-Awareness with Minimal Pairs}, author = {Mareike Hartmann and Miryam de Lhoneux and Daniel Hershcovich and Yova Kementchedjhieva and Lukas Nielsen and Chen Qiu and Anders Søgaard}, url = {https://www.dfki.de/fileadmin/user_upload/import/11846_2021.conll-1.19.pdf https://aclanthology.org/2021.conll-1.19/}, year = {2021}, date = {2021-11-01}, booktitle = {Proceedings of the 25th Conference on Computational Natural Language Learning (CoNLL)}, pages = {224-257}, publisher = {Association for Computational Linguistics}, abstract = {Negation is one of the most fundamental concepts in human cognition and language, and several natural language inference (NLI) probes have been designed to investigate pretrained language models' ability to detect and reason with negation. However, the existing probing datasets are limited to English only, and do not enable controlled probing of performance in the absence or presence of negation. In response, we present a multilingual (English, Bulgarian, German, French and Chinese) benchmark collection of NLI examples that are grammatical and correctly labeled, as a result of manual inspection and editing. We use the benchmark to probe the negation-awareness of multilingual language models and find that models that correctly predict examples with negation cues often fail to correctly predict their counter-examples em without negation cues, even when the cues are irrelevant for semantic inference.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Negation is one of the most fundamental concepts in human cognition and language, and several natural language inference (NLI) probes have been designed to investigate pretrained language models' ability to detect and reason with negation. However, the existing probing datasets are limited to English only, and do not enable controlled probing of performance in the absence or presence of negation. In response, we present a multilingual (English, Bulgarian, German, French and Chinese) benchmark collection of NLI examples that are grammatical and correctly labeled, as a result of manual inspection and editing. We use the benchmark to probe the negation-awareness of multilingual language models and find that models that correctly predict examples with negation cues often fail to correctly predict their counter-examples em without negation cues, even when the cues are irrelevant for semantic inference. |
Jørgensen, Rasmus Kær; Hartmann, Mareike; Dai, Xiang; Elliott, Desmond mDAPT: Multilingual Domain Adaptive Pretraining in a Single Model Inproceedings Findings of the Association for Computational Linguistics - EMNLP 2021, pp. 3404-3018, Association for Computational Linguistics, 2021. @inproceedings{11845, title = {mDAPT: Multilingual Domain Adaptive Pretraining in a Single Model}, author = {Rasmus Kær Jørgensen and Mareike Hartmann and Xiang Dai and Desmond Elliott}, url = {https://www.dfki.de/fileadmin/user_upload/import/11845_2021.findings-emnlp.290.pdf}, year = {2021}, date = {2021-11-01}, booktitle = {Findings of the Association for Computational Linguistics - EMNLP 2021}, journal = {Findings of the Association for Computational Linguistics: EMNLP 2021}, volume = {1}, pages = {3404-3018}, publisher = {Association for Computational Linguistics}, abstract = {Domain adaptive pretraining, i.e. the continued unsupervised pretraining of a language model on domain-specific text, improves the modelling of text for downstream tasks within the domain. Numerous real-world applications are based on domain-specific text, e.g. working with financial or biomedical documents, and these applications often need to support multiple languages. However, large-scale domain-specific multilingual pretraining data for such scenarios can be difficult to obtain, due to regulations, legislation, or simply a lack of language- and domain-specific text. One solution is to train a single multilingual model, taking advantage of the data available in as many languages as possible. In this work, we explore the benefits of domain adaptive pretraining with a focus on adapting to multiple languages within a specific domain. We propose different techniques to compose pretraining corpora that enable a language model to both become domain-specific and multilingual. Evaluation on nine domain-specific datasets---for biomedical named entity recognition and financial sentence classification---covering seven different languages show that a single multilingual domain-specific model can outperform the general multilingual model, and performs close to its monolingual counterpart. This finding holds across two different pretraining methods, adapter-based pretraining and full model pretraining.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Domain adaptive pretraining, i.e. the continued unsupervised pretraining of a language model on domain-specific text, improves the modelling of text for downstream tasks within the domain. Numerous real-world applications are based on domain-specific text, e.g. working with financial or biomedical documents, and these applications often need to support multiple languages. However, large-scale domain-specific multilingual pretraining data for such scenarios can be difficult to obtain, due to regulations, legislation, or simply a lack of language- and domain-specific text. One solution is to train a single multilingual model, taking advantage of the data available in as many languages as possible. In this work, we explore the benefits of domain adaptive pretraining with a focus on adapting to multiple languages within a specific domain. We propose different techniques to compose pretraining corpora that enable a language model to both become domain-specific and multilingual. Evaluation on nine domain-specific datasets---for biomedical named entity recognition and financial sentence classification---covering seven different languages show that a single multilingual domain-specific model can outperform the general multilingual model, and performs close to its monolingual counterpart. This finding holds across two different pretraining methods, adapter-based pretraining and full model pretraining. |
Erlemeyer, Fabian; Rehtanz, Christian; Hermanns, Annegret; Lüers, Bengt; Nebel-Wenner, Marvin; Eilers, Reef Janes Live Testing of Flexibilities on Distribution Grid Level – Simulation Setup and Lessons Learned Inproceedings IEEE Electric Power and Energy Conference, IEEE Xplore, IEEE Operations Center 445 Hoes Lane Piscataway, NJ 08854-4141 USA Phone: +1 732 981 0060, 2021. @inproceedings{11927, title = {Live Testing of Flexibilities on Distribution Grid Level – Simulation Setup and Lessons Learned}, author = {Fabian Erlemeyer and Christian Rehtanz and Annegret Hermanns and Bengt Lüers and Marvin Nebel-Wenner and Reef Janes Eilers}, url = {https://www.dfki.de/fileadmin/user_upload/import/11927_2021199998.pdf}, year = {2021}, date = {2021-10-01}, booktitle = {IEEE Electric Power and Energy Conference}, publisher = {IEEE Xplore}, address = {IEEE Operations Center 445 Hoes Lane Piscataway, NJ 08854-4141 USA Phone: +1 732 981 0060}, abstract = {In the DESIGNETZ project real flexibility units were connected to a distribution grid simulation to investigate the integration of decentralized flexibilities for different use-cases. The simulation determines the demand for unit flexibility and communicates the demand to the flexibilities. In return, the response of the flexibilities is integrated back into the simulation to consider not-simulated effects, too. This paper presents the simulation setup and discusses lessons learnt from deploying the simulation into operation.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } In the DESIGNETZ project real flexibility units were connected to a distribution grid simulation to investigate the integration of decentralized flexibilities for different use-cases. The simulation determines the demand for unit flexibility and communicates the demand to the flexibilities. In return, the response of the flexibilities is integrated back into the simulation to consider not-simulated effects, too. This paper presents the simulation setup and discusses lessons learnt from deploying the simulation into operation. |
Barz, Michael; Kapp, Sebastian; Kuhn, Jochen; Sonntag, Daniel Automatic Recognition and Augmentation of Attended Objects in Real-Time Using Eye Tracking and a Head-Mounted Display Inproceedings ACM Symposium on Eye Tracking Research and Applications, pp. 4, Association for Computing Machinery, 2021. @inproceedings{11614, title = {Automatic Recognition and Augmentation of Attended Objects in Real-Time Using Eye Tracking and a Head-Mounted Display}, author = {Michael Barz and Sebastian Kapp and Jochen Kuhn and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11614_etra_ar_video.pdf}, doi = {https://doi.org/10.1145/3450341.3458766}, year = {2021}, date = {2021-05-01}, booktitle = {ACM Symposium on Eye Tracking Research and Applications}, pages = {4}, publisher = {Association for Computing Machinery}, abstract = {Scanning and processing visual stimuli in a scene is essential for the human brain to make situation-aware decisions. Adding the ability to observe the scanning behavior and scene processing to intelligent mobile user interfaces can facilitate a new class of cognition-aware user interfaces. As a first step in this direction, we implement an augmented reality (AR) system that classifies objects at the user’s point of regard, detects visual attention to them, and augments the real objects with virtual labels that stick to the objects in real-time. We use a head-mounted AR device (Microsoft HoloLens 2) with integrated eye tracking capabilities and a front-facing camera for implementing our prototype.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Scanning and processing visual stimuli in a scene is essential for the human brain to make situation-aware decisions. Adding the ability to observe the scanning behavior and scene processing to intelligent mobile user interfaces can facilitate a new class of cognition-aware user interfaces. As a first step in this direction, we implement an augmented reality (AR) system that classifies objects at the user’s point of regard, detects visual attention to them, and augments the real objects with virtual labels that stick to the objects in real-time. We use a head-mounted AR device (Microsoft HoloLens 2) with integrated eye tracking capabilities and a front-facing camera for implementing our prototype. |
Nguyen, Ho Minh Duy; Nguyen, Duy M; Vu, Huong; Nguyen, Binh T; Nunnari, Fabrizio; Sonntag, Daniel An Attention Mechanism using Multiple Knowledge Sources for COVID-19 Detection from CT Images Inproceedings The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), AAAI, 2021. @inproceedings{11369, title = {An Attention Mechanism using Multiple Knowledge Sources for COVID-19 Detection from CT Images}, author = {Ho Minh Duy Nguyen and Duy M Nguyen and Huong Vu and Binh T Nguyen and Fabrizio Nunnari and Daniel Sonntag}, url = {https://www.dfki.de/fileadmin/user_upload/import/11369_AAAI_Workshop_TrustworthyHealthcare_v3.pdf}, year = {2021}, date = {2021-01-01}, booktitle = {The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)}, publisher = {AAAI}, abstract = {Besides principal polymerase chain reaction (PCR) tests, automatically identifying positive samples based on computed tomography (CT) scans can present a promising option in the early diagnosis of COVID-19. Recently, there have been increasing efforts to utilize deep networks for COVID-19 diagnosis based on CT scans. While these approaches mostly focus on introducing novel architectures, transfer learning techniques or construction of large scale data, we propose a novel strategy to improve several performance baselines by leveraging multiple useful information sources relevant to doctors' judgments. Specifically, infected regions and heat-map features extracted from learned networks are integrated with the global image via an attention mechanism during the learning process. This procedure makes our system more robust to noise and guides the network focusing on local lesion areas. Extensive experiments illustrate the superior performance of our approach compared to recent baselines. Furthermore, our learned network guidance presents an explainable feature to doctors to understand the connection between input and output in a grey-box model.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Besides principal polymerase chain reaction (PCR) tests, automatically identifying positive samples based on computed tomography (CT) scans can present a promising option in the early diagnosis of COVID-19. Recently, there have been increasing efforts to utilize deep networks for COVID-19 diagnosis based on CT scans. While these approaches mostly focus on introducing novel architectures, transfer learning techniques or construction of large scale data, we propose a novel strategy to improve several performance baselines by leveraging multiple useful information sources relevant to doctors' judgments. Specifically, infected regions and heat-map features extracted from learned networks are integrated with the global image via an attention mechanism during the learning process. This procedure makes our system more robust to noise and guides the network focusing on local lesion areas. Extensive experiments illustrate the superior performance of our approach compared to recent baselines. Furthermore, our learned network guidance presents an explainable feature to doctors to understand the connection between input and output in a grey-box model. |
Prange, Alexander; Barz, Michael; Heimann-Steinert, Anika; Sonntag, Daniel Explainable Automatic Evaluation of the Trail Making Test for Dementia Screening Inproceedings Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, 2021. @inproceedings{11432, title = {Explainable Automatic Evaluation of the Trail Making Test for Dementia Screening}, author = {Alexander Prange and Michael Barz and Anika Heimann-Steinert and Daniel Sonntag}, doi = {https://doi.org/10.1145/3411764.3445046}, year = {2021}, date = {2021-01-01}, booktitle = {Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems}, publisher = {Association for Computing Machinery}, abstract = {The Trail Making Test (TMT) is a frequently used neuropsychological test for assessing cognitive performance. The subject connects a sequence of numbered nodes by using a pen on normal paper. We present an automatic cognitive assessment tool that analyzes samples of the TMT which we record using a digital pen. This enables us to analyze digital pen features that are difficult or impossible to evaluate manually. Our system automatically measures several pen features, including the completion time which is the main performance indicator used by clinicians to score the TMT in practice. In addition, our system provides a structured report of the analysis of the test, for example indicating missed or erroneously connected nodes, thereby offering more objective, transparent and explainable results to the clinician. We evaluate our system with 40 elderly subjects from a geriatrics daycare clinic of a large hospital.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The Trail Making Test (TMT) is a frequently used neuropsychological test for assessing cognitive performance. The subject connects a sequence of numbered nodes by using a pen on normal paper. We present an automatic cognitive assessment tool that analyzes samples of the TMT which we record using a digital pen. This enables us to analyze digital pen features that are difficult or impossible to evaluate manually. Our system automatically measures several pen features, including the completion time which is the main performance indicator used by clinicians to score the TMT in practice. In addition, our system provides a structured report of the analysis of the test, for example indicating missed or erroneously connected nodes, thereby offering more objective, transparent and explainable results to the clinician. We evaluate our system with 40 elderly subjects from a geriatrics daycare clinic of a large hospital. |
2023 |
Journal Articles |
Thermal prediction for energy management of clouds using a hybrid model based on CNN and stacking multi-layer bi-directional LSTM Journal Article Energy Reports, 9 , pp. 2253-2268, 2023. |
Predicting longitudinal dispersion coefficient using ensemble models and optimized multi-layer perceptron models Journal Article Ain Shams Engineering Journal, 10 , pp. 2253-2277, 2023. |
An integrated GIS-based multivariate adaptive regression splines-cat swarm optimization for improving the accuracy of wildfire susceptibility mapping Journal Article Geocarto International, 38 , pp. 1-25, 2023. |
Modeling Climate Change Effects on the Distribution of Oak Forests with Machine Learning Journal Article Forests, 14 , pp. 13220-13233, 2023. |
Digital ink and differentiated subjective ratings for cognitive load measurement in middle childhood Journal Article British Journal of Educational Psychology, n/a , pp. 18, 2023. |
Inproceedings |
Joint Self-Supervised Image-Volume Representation Learning with Intra-Inter Contrastive Clustering Inproceedings Proceedings of the thirty-seventh AAAI Conference on Artificial Intelligence, AAAI Press, 2023. |
Interactive Fixation-to-AOI Mapping for Mobile Eye Tracking Data Based on Few-Shot Image Classification Inproceedings Companion Proceedings of the 28th International Conference on Intelligent User Interfaces, pp. 175-178, Association for Computing Machinery, 2023. |
IMETA: An Interactive Mobile Eye Tracking Annotation Method for Semi-Automatic Fixation-to-AOI Mapping Inproceedings Companion Proceedings of the 28th International Conference on Intelligent User Interfaces, pp. 33-36, Association for Computing Machinery, 2023. |
A User Interface for Explaining Machine Learning Model Explanations Inproceedings Companion Proceedings of the 28th International Conference on Intelligent User Interfaces, pp. 59–63, Association for Computing Machinery, Sydney, NSW, Australia, 2023, ISBN: 9798400701078. |
2022 |
Journal Articles |
Adaptation of A Real-Time Deep Learning Approach with An Analog Fault Detection Technique for Reliability Forecasting of Capacitor Banks Used in Mobile Vehicles Journal Article IEEE Access, 10 , pp. 132271-132287, 2022. |
Predicting the Risk of Heart Failure Based on Clinical Data Journal Article Human-centric Computing and Information Sciences, 12 , pp. 1322-1355, 2022. |
Deep Learning for Modeling an Offshore Hybrid Wind–Wave Energy System Journal Article Energies, 15 , pp. 9484-9494, 2022. |
XAINES: Explaining AI with Narratives Journal Article KI - Künstliche Intelligenz, German Journal on Artificial Intelligence - Organ des Fachbereiches "Künstliche Intelligenz" der Gesellschaft für Informatik e.V., 36 , pp. 287-296, 2022. |
An Experimental Machine Learning Approach for Mid-Term Energy Demand Forecasting Journal Article IEEE Access, 10 , pp. 118926-118940, 2022. |
Apparent sunk cost effect in rational agents Journal Article Science Advances, 8 , pp. 1-10, 2022. |
Implicit Estimation of Paragraph Relevance from Eye Movements Journal Article Frontiers in Computer Science, 3 , pp. 13, 2022. |
TATL: Task Agnostic Transfer Learning for Skin Attributes Detection Journal Article Medical Image Analysis, 01 , pp. 1-27, 2022. |
Effects of PCG on Creativity in Playful City-Building Environments in VR Journal Article Proceedings of the ACM on Human-Computer Interaction, 6 , pp. 1-20, 2022. |
The Effect of Auditory-Motor Synchronization in Exergames on the Example of the VR Rhythm Game BeatSaber Journal Article Proceedings of the ACM on Human-Computer Interaction, 6 , pp. 1-26, 2022. |
Inproceedings |
Fine-tuning BERT Models for Summarizing German Radiology Findings Inproceedings Naumann, Tristan; Bethard, Steven; Roberts, Kirk; Rumshisky, Anna (Ed.): Proceedings of the 4th Clinical Natural Language Processing Workshop, Association for Computational Linguistics, 2022. |
Cross-lingual German Biomedical Information Extraction: from Zero-shot to Human-in-the-Loop Inproceedings 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, 2022. |
SpiderClip: Towards an Open Source System for Wearable Device Simulation in Virtual Reality Inproceedings CHI EA '22: Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, 2022. |
Interactive Assessment Tool for Gaze-based Machine Learning Models in Information Retrieval Inproceedings ACM SIGIR Conference on Human Information Interaction and Retrieval, Association for Computing Machinery, 2022. |
Encountering Students' Learning Difficulties in Electrics - Didactical Concept and Prototype of Augmented Reality-Toolkit Inproceedings Fostering scientific citizenship in an uncertain world - ESERA 2021 e-Proceedings, University of Minho, 2022. |
LMGP: Lifted Multicut Meets Geometry Projections for Multi-Camera Multi-Object Tracking Inproceedings Conference on Computer Vision and Pattern Recognition (CVPR) 2022, IEEE/CVF, 2022. |
Annotating sound events through interactive design of interpretable features Inproceedings Proceedings of the First International Conference on Hybrid Human-Machine Intelligence, IOS Press, 2022. |
Interactive design of interpretable features for marine soundscape data annotation Inproceedings Workshop on Human-centered Design of Symbiotic Hybrid Intelligence, HHAI, 2022. |
A survey on improving NLP models with human explanations Inproceedings Proceedings of the First Workshop on Learning with Natural Language Supervision, Association for Computational Linguistics, 2022. |
Development and Validation of a German Version of the Player Experience Inventory (PXI) Inproceedings Proceedings of the Mensch und Computer Conference, ACM, 2022. |
Improving Silent Speech BCI Training Procedures through Transfer from Overt to Silent Speech Inproceedings Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, IEEE, 2022. |
SpellInk: Interactive correction of spelling mistakes in handwritten text Inproceedings Proceedings of the First International Conference on Hybrid Human-Machine Intelligence, pp. 278-280, IOS Press, De Boelelaan 1105, 1081 HV Amsterdam, Netherlands, 2022. |
pEncode: A Tool for Visualizing Pen Signal Encodings in Real-time Inproceedings Proceedings of the First International Conference on Hybrid Human-Machine Intelligence, pp. 281-284, IOS Press, De Boelelaan 1105, 1081 HV Amsterdam, Netherlands, 2022. |
Leveraging Implicit Gaze-Based User Feedback for Interactive Machine Learning Inproceedings Rodermund, Stephanie C; Timm, Ingo J; Malburg, Lukas; Bergmann, Ralph (Ed.): KI 2022: Advances in Artificial Intelligence, pp. 9-16, Springer International Publishing, 2022. |
Miscellaneous |
3D Semantic Label Transfer and Matching in Human-Robot Collaboration Miscellaneous 2022. |
Putting Humans in the Image Captioning Loop Miscellaneous Bridging Human-Computer Interaction and Natural Language Processing (NAACL 2022), 2022. |
Interactive Machine Learning for Image Captioning Miscellaneous The AAAI-22 Workshop on Interactive Machine Learning, 2022. |
Technical Reports |
LMGP: Lifted Multicut Meets Geometry Projections for Multi-Camera Multi-Object Tracking Technical Report DFKI, MPI-INF , 2022. |
2021 |
Journal Articles |
Künstliche Intelligenz in der Medizin und Gynäkologie – Holzweg oder Heilversprechen? Journal Article Der Gynäkologe, 1 , pp. 1-7, 2021. |
ARETT: Augmented Reality Eye Tracking Toolkit for Head Mounted Displays Journal Article Sensors - Open Access Journal, 21 , pp. 18, 2021. |
Minimizing false negative rate in melanoma detection and providing insight into the causes of classification Journal Article Computing Research Repository eprint Journal, abs/2102.09199 , pp. 1-14, 2021. |
Automatic Visual Attention Detection for Mobile Eye Tracking Using Pre-Trained Computer Vision Models and Human Gaze Journal Article Sensors - Open Access Journal, 21 , pp. 21, 2021. |
Investigating the Usability of a Head-Mounted Display Augmented Reality Device in Elementary School Children Journal Article Sensors - Open Access Journal, 21 , pp. 20, 2021. |
Incollections |
Incremental Improvement of a Question Answering System by Re-ranking Answer Candidates Using Machine Learning Incollection Marchi, Erik; Siniscalchi, Sabato Marco; Cumani, Sandro; Salerno, Valerio Mario; Li, Haizhou (Ed.): Increasing Naturalness and Flexibility in Spoken Dialogue Interaction: 10th International Workshop on Spoken Dialogue Systems, pp. 367-379, Springer, 2021. |
Inproceedings |
Improving German Image Captions using Machine Translation and Transfer Learning Inproceedings Espinosa-Anke, Luis; Martin-Vide, Carlos; Spasic, Irena (Ed.): Statistical Language and Speech Processing SLSP 2021, Springer, Council Chamber Glamorgan Building King Edward VII Ave Cathays Park Cardiff CF10 3WT, 2021. |
A Multilingual Benchmark for Probing Negation-Awareness with Minimal Pairs Inproceedings Proceedings of the 25th Conference on Computational Natural Language Learning (CoNLL), pp. 224-257, Association for Computational Linguistics, 2021. |
mDAPT: Multilingual Domain Adaptive Pretraining in a Single Model Inproceedings Findings of the Association for Computational Linguistics - EMNLP 2021, pp. 3404-3018, Association for Computational Linguistics, 2021. |
Live Testing of Flexibilities on Distribution Grid Level – Simulation Setup and Lessons Learned Inproceedings IEEE Electric Power and Energy Conference, IEEE Xplore, IEEE Operations Center 445 Hoes Lane Piscataway, NJ 08854-4141 USA Phone: +1 732 981 0060, 2021. |
Automatic Recognition and Augmentation of Attended Objects in Real-Time Using Eye Tracking and a Head-Mounted Display Inproceedings ACM Symposium on Eye Tracking Research and Applications, pp. 4, Association for Computing Machinery, 2021. |
An Attention Mechanism using Multiple Knowledge Sources for COVID-19 Detection from CT Images Inproceedings The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), AAAI, 2021. |
Explainable Automatic Evaluation of the Trail Making Test for Dementia Screening Inproceedings Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, 2021. |