Visual Search Target Inference for Pro-active User Support

Visual Search Target Inference in Natural Interaction Settings with Machine Learning

Visual search is a perceptual task in which humans aim at identifying a search target object such as a traffic sign among other objects. Search target inference subsumes computational methods for predicting this target by tracking and analyzing overt behavioral cues of that person, e.g., the human gaze and fixated visual stimuli. In [Barz et al., 2020] we present a generic approach to inferring search targets in natural scenes by predicting the class of the surrounding image segment. Our method encodes visual search sequences as histograms of fixated segment classes determined by SegNet, a deep learning image segmentation model for natural scenes. We compare our sequence encoding and model training (SVM) to a recent baseline from the literature for predicting the target segment. Also, we use a new search target inference dataset.

Encoding Methods for Fixations of a Visual Search

Visual Search Target Inference Using Bag of Deep Visual Words

In our work [Stauden et al., 2018], we implement a new feature encoding, the Bag of Deep Visual Words, for search target inference using a pre-trained convolutional neural network (CNN). Our work is based on a recent approach from the literature that uses Bag of Visual Words, common in computer vision applications. We evaluate our method using a gold standard dataset. The results show that our new feature encoding outperforms the baseline from the literature, in particular, when excluding fixations on the target. We presented this work at the 41st German conference on Artificial Intelligence.

References

Sven Stauden, Michael Barz, Daniel Sonntag: Visual Search Target Inference Using Bag of Deep Visual Words. In: Trollmann, Frank; Turhan, Anni-Yasmin (Ed.): KI 2018: Advances in Artificial Intelligence - 41st German Conference on AI, Springer, 2018.
Michael Barz, Sven Stauden, Daniel Sonntag: Visual Search Target Inference in Natural Interaction Settings with Machine Learning. In: Bulling, Andreas; Huckauf, Anke; Jain, Eakta; Radach, Ralph; Weiskopf, Daniel (Ed.): ACM Symposium on Eye Tracking Research and Applications, Association for Computing Machinery, 2020.

Dataset

We provide our extensions of the used datasets for better reproducibility of our results. The documentation with annotations (without images) can be found on GitHub: https://github.com/DFKI-Interactive-Machine-Learning/STI-Dataset

You can also download the whole dataset with images:

Download STI-Dataset

Contact

Michael Barz and Sven Stauden

Published by Michael Barz on May 17, 2020May 17, 2020

Visual Search Target Inference in Natural Interaction Settings with Machine Learning

Visual Search Target Inference Using Bag of Deep Visual Words

References

Dataset

Contact

Machine Learning

Active Learning for Medical
Image Segmentation

Machine Learning

Foundation Models for Medical AI

Machine Learning

Comprehensive Evaluation of
Feature Attribution Methods in Explainable AI via Input Perturbation

Visual Search Target Inference for Pro-active User Support

Published by Michael Barz on May 17, 2020May 17, 2020

Visual Search Target Inference in Natural Interaction Settings with Machine Learning

Visual Search Target Inference Using Bag of Deep Visual Words

References

Dataset

Contact

Related Posts

Machine Learning

Active Learning for Medical Image Segmentation

Machine Learning

Foundation Models for Medical AI

Machine Learning

Comprehensive Evaluation of Feature Attribution Methods in Explainable AI via Input Perturbation

Active Learning for Medical
Image Segmentation

Comprehensive Evaluation of
Feature Attribution Methods in Explainable AI via Input Perturbation