Writing a Thesis – DFKI – Interactive Machine Learning Lab

Finding a Thesis Topic

Students who are interested in writing a bachelor’s or master’s thesis should begin thinking about possible topics (cf. hot topics for thesis projects on this page) or propose their own (cf. introduction to IML). Good research questions often have their origins in scientific papers around the research topics of the IML lab. Be on the look out for new data sources that might help provide new insights into a special IML research topic.

Your Advisor and Your Committee

In order to write a bachelor’s or master’s thesis you must find an member of the IML lab who is willing to be your thesis advisor. You propose your thesis topic together with your advisor to Prof. Sonntag as the first reviewer in your committee.

How Long Should it Be? How Long Does it Take?

A bachelor’s thesis is generally 30-60 pages, not including the bibliography. A master’s thesis is generally 60-80 pages, not including the bibliography. However, the length will vary according to the topic and the method of analysis, so the appropriate length will be determined by you, your advisor, and your committee. Students who write a master’s thesis generally do so over two semesters, bachelor’s one semester.

Procedure and Formal Requirements

You are a student at Oldenburg University, follow these instructions.
You are a student at Saarland University, follow these instructions.

You must maintain continuous enrollment at Oldenburg University or at Saarland University while working on the bachelor’s or master’s thesis. If you are planning to conduct interviews, surveys or do other research involving human subjects, you must obtain prior approval from DFKI.

Here you can find some theses examples.

Here you can find project group examples.

Hot Topics for Thesis Projects

Explainable Medical Decision

You will implement novel modern approaches in computer vision such as Transfer Learning, Graph Neural Network, or Semi-Supervised Learning to solve important medical decision problems like Breast cancer detection, Chest-(X-Ray/CT) abnormalities diagnosis, or related medical domains. The target is to achieve state-of-the-art performance and the proposed method could be explainable to end users to improve the system’s reliability.

Nguyen, Duy MH, et al. “An Attention Mechanism using Multiple Knowledge Sources for COVID-19 Detection from CT Images.”, AAAI 2021, Workshop: Trustworthy AI for Healthcare.

Soberanis-Mukul, Roger D., Nassir Navab, and Shadi Albarqouni. “An Uncertainty-Driven GCN Refinement Strategy for Organ Segmentation.” arXiv preprint arXiv:2012.03352 (2020).

Contact: Duy Nguyen

Theoretical Machine Learning for Medical Applications

In this topic, we will investigate important theoretical machine learning problems that have high impacts on several medical applications. It includes but is not limited to optimization formulation to incorporate efficient user’s feedback to boost the performance of trained models besides available training data (active learning), investigate benefits of transfer learning strategies when dealing with scarce data issues in medical problems, or training algorithms to adapt with highly imbalanced data distribution.

Wilder, Bryan, Eric Horvitz, and Ece Kamar. “Learning to complement humans.” arXiv preprint arXiv:2005.00582 (2020).

De, Abir, et al. “Classification Under Human Assistance.” AAAI (2021).

Yao, Huaxiu, et al. “Hierarchically structured meta-learning.” International Conference on Machine Learning. PMLR, 2019.

Contact: Duy Nguyen

Creating a dataset of natural explanatory conversations (about cooking*) [Master]

Requirements: Programming in Python, ideally experience with processing video and audio data

Project description: The aim is to create an annotated dataset of human-to-human dialogue in Youtube cooking videos*, that can serve as a resource for training ML models to generate conversational explanations of the cooking process. This involves the identification of videos with multiple speakers, speaker diarization (partitioning audio and/or transcript according to speaker identity), identification of conversational interaction between the speakers, and investigating if these interactions qualify as ‘conversational explanations’ of the video content

Contact: Mareike Hartmann

Relevant literature:

Speaker diarization: https://arxiv.org/pdf/2101.09624.pdf
Potential videos: http://youcook2.eecs.umich.edu/explore
Background on ‘conversational explanations’ from an XAI perspective: https://arxiv.org/pdf/1706.07269.pdf (Sec. 5) Note that in this project, we focus on ‘explaining’ the video content rather than model predictions.

*We focus on the process of cooking as there is some related ongoing work at DFKI, but other instructional scenarios are possible.

Feedback Systems for Image Captioning

This thesis aims to develop multi-modal feedback systems to improve the accuracy and reliability of automated image captioning models, with potential extensions to large multimodal models (LMMs). The target system should include mechanisms to calculate confidence during generation and request further input or feedback when uncertainty arises, and methods to refine the captions based on this feedback. The final step involves evaluating the system’s improvement in caption accuracy and user satisfaction.

This research will enhance human-AI interaction by creating more interpretable and user-driven image captioning systems. It is ideal for students with interests in machine learning, computer vision, natural language processing, and human-computer interaction.

Contact: Aliki Anagnostopoulou

Relevant publication:

Anagnostopoulou, A., Gouvêa, T. S., & Sonntag, D. (2024). Enhancing Journalism with AI: A Study of Contextualized Image Captioning for News Articles using LLMs and LMMs. Trustworthy Interactive Decision Making with Foundation Models workshop, 33rd International Joint Conference on Artificial Intelligence. https://doi.org/10.48550/arXiv.2408.04331

Multi-modal Textual Entailment in Medical Diagnoses Using NLP and Imaging Data [Master]

Requirements: Programming in Python, Pytorch (or Tensorflow), Skills in NLP and Deep Learning

Project description: Develop a multi-modal entailment reasoning system that provides context-aware diagnostic suggestions by combining textual entailment in NLP with image analysis from medical imaging (e.g., X-rays, MRI) to strengthen diagnostic reasoning. This project could integrate textual findings with visual cues to improve model entailment for diagnosis support.

Contact: Siting Liang

Relevant literature:

Explainable Medical Decision: Investigating a Model-Invariant Algorithm for Neural Network Attribution Maps [Master]

A central finding of preliminary research reveals that different neural network architectures, when trained on the same data distribution, generate diverse attribution maps for local explanations, supporting the assertion that attribution maps are model-dependent [2]. However, it is also understood that these attribution maps, despite their varying origins, can embody certain common characteristics [1].

Given this premise, the proposition for future research is to delve into the development of a novel algorithm that seeks to create attribution maps universally accepted by all models. These models, despite possessing diverse architectures, are based on the same data distribution. This line of enquiry will pave the way towards generating explanations that are devoid of model-dependency or model-bias, thereby privileging model-invariance.

This research aims to bridge the gap between differing neural network architectures, fostering improved communication, data interpretation, and usability. Ultimately, advancements in this field have the potential to significantly propel the evolution of explainable Artificial Intelligence (AI).

Contact: Md Abdul Kadir

Relevant literature:

[1] Kadir, M. A., Addluri, G. K., & Sonntag, D. (2023). Harmonizing Feature Attributions Across Deep Learning Architectures: Enhancing Interpretability and Consistency. arXiv preprint arXiv:2307.02150.

[2] Gupta, A., Saunshi, N., Yu, D., Lyu, K., & Arora, S. (2022). New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound. Advances in Neural Information Processing Systems, 35, 33120-33133.

Interactive Medical Image Analysis: Application of Deep Learning in Colposcopic Image Analysis for Cancerous Region Detection: A Diagnostic Revolution [Master]

Colposcopy, a vital method for the diagnosis of cervical pathology, hinges primarily on the visual cues to detect abnormalities and designate regions for biopsies. The conventional method often includes the use of Acetic acid (5%) for highlighting the cells’ nucleus and hence revealing abnormal or pre-cancerous cells, while green filters aid in visualizing blood vessels supplying these regions. However, vast variations in individual practitioner’s experience and expertise may lead to ununiformed assessments.

This research proposal aims to bridge this gap introducing deep learning algorithms, which have shown unprecedented success in image recognition and classification tasks, into colposcopic examinations [1]. The utilization of these machine learning methodologies could allow automatic detection of cancerous or precancerous regions in colposcopic images or videos, automating and standardizing the evaluation process while offering real-time feedback and suggestions during the examination.

Contact: Md Abdul Kadir

Relevant literature:

[1] Chandran V, Sumithra MG, Karthick A, George T, Deivakani M, Elakkiya B, Subramaniam U, Manoharan S. Diagnosis of Cervical Cancer based on Ensemble Deep Learning Network using Colposcopy Images. Biomed Res Int. 2021 May 4;2021:5584004. doi: 10.1155/2021/5584004. PMID: 33997017; PMCID: PMC8112909.

Transfer Learning for Bioacoustics

This thesis aims to investigate the effect of selecting different layers of various embedding models for transfer learning in passive acoustic monitoring. Specifically, it will explore the correlation between the performance of the embeddings and the proximity of the selected layer to the output layer, as well as the relatedness between the model’s domain and the target domain of passive acoustic monitoring.

Contact: Hannes Kath

Relevant Literature:

Leveraging transfer learning and active learning for data annotation in passive acoustic monitoring of wildlife (https://www.sciencedirect.com/science/article/pii/S1574954124002528?via%3Dihub)

Global birdsong embeddings enable superior transfer learning for bioacoustic classification (https://www.nature.com/articles/s41598-023-49989-z)

Active Learning for Bioacoustic Datasets [Master]

The aim of this master thesis project is to propose an Active Learning strategy for annotating multilabel bioacoustic data recorded using passive acoustic monitoring techniques. After a thorough review of the literature, you will implement both basic and state-of-the-art active learning strategies, adapt them to a multilabel scenario if necessary, and test them on passive acoustic monitoring datasets in terms of performance and usability.

Contact: Hannes Kath

Relevant Literature:

Leveraging transfer learning and active learning for data annotation in passive acoustic monitoring of wildlife (https://www.sciencedirect.com/science/article/pii/S1574954124002528?via%3Dihub)

A Survey on Active Learning: State-of-the-Art, Practical Challenges and Research Directions (https://www.mdpi.com/2227-7390/11/4/820)

Interactive Analysis of Long Term Acoustic Monitoring of Wildlife with Advanced Machine Learning [Master]

Environmental conservation efforts rely heavily on monitoring biodiversity and ecosystems health. The emergence of cheap and reliable data loggers enabled large-scale, longitudinal environmental monitoring programs and creating bottleneck at the point of data management and analysis. This project aims to enhance the analysis of ecological soundscape data by improving how spatio-temporal variations within soundscape recordings are understood and interpreted by ecologists.

The thesis will focus on developing and refining machine learning models, specifically self-supervised learned representations such as with Variational Autoencoders (VAEs), to capture spatio-temporal features more effectively in acoustic data. The project will explore innovative techniques such as dynamic segmentation, temporal convolutional networks, and feature-level fusion to create more nuanced and context-aware analyses of environmental sounds.

Your Role:

Implement and compare different class-agnostic representation learning methods, evaluating their sensitivity to temporal features of longitudinal acoustic environmental monitoring datasets.
Implement and compare different network architectures, audio segmentation and feature fusion techniques.
Collaborate with a team of domain experts to analyse and interpret model outputs to assess impacts on biodiversity monitoring.
Develop a demonstrator of an interactive data visualisation application for summarising [large/longitudinal/long-term] acoustic environmental monitoring datasets
Prepare research findings for publication in high-impact journals and presentations at international conferences.

Ideal Candidate:

Currently enrolled in a master’s program in Computer Science, Data Science or similar.
Strong programming skills in Python and experience with machine learning frameworks such as TensorFlow or PyTorch.
Prior experience or interest in working with audio signals
Analytical skills and creativity in solving complex problems.
Excellent communication skills for presenting research findings and collaborating across disciplines.

Contact: Rida Saghir

Relevant literature:

Kath, H., Gouvêa, T. S., & Sonntag, D. (2023). A deep generative model for interactive data annotation through direct manipulation in latent space. arXiv preprint arXiv:2305.15337.

Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.

Bengio, Y., Courville, A. C., & Vincent, P. (2012). Unsupervised feature learning and deep learning: A review and new perspectives. CoRR, abs/1206.5538, 1(2665), 2012.

Best, P., Paris, S., Glotin, H., & Marxer, R. (2023). Deep audio embeddings for vocalisation clustering. Plos one, 18(7), e0283396.

Liu, S., Mallol-Ragolta, A., Parada-Cabeleiro, E., Qian, K., Jing, X., Kathan, A., Hu, B., & Schuller, B. (2022). Audio self-supervised learning: A survey. Patterns, 3.

X. Liu et al., “Self-Supervised Learning: Generative or Contrastive,” in IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 1, pp. 857-876, 1 Jan. 2023, doi: 10.1109/TKDE.2021.3090866.

Interactive Contrastive Learning for Enhanced Representations in Passive Acoustic Monitoring [Master]

Passive acoustic monitoring (PAM) is a powerful tool for studying biodiversity and ecosystem health, but analyzing vast amounts of acoustic data remains a challenge. Traditional feature extraction methods often lack adaptability, are static and might fail to incorporate domain knowledge effectively.

To address this, an interactive contrastive learning framework that integrates human-in-the-loop feedback to refine learned features or representations. In this thesis, Contrastive learning, which optimizes feature spaces by pulling similar sounds together and pushing dissimilar ones apart, is enhanced through user-guided similarity annotations. This interactive process mimics a similarity search system but with user involvement, iteratively improving embeddings for ecologically meaningful soundscape analysis.

By bridging automated learning with expert input, our approach enhances interpretability, reduces dependence on extensive labeled datasets, and improves clustering, classification or other downstream tasks.

Objective:

Develop an interactive contrastive learning framework for passive acoustic monitoring (PAM) that refines soundscape representations through human-in-the-loop feedback and suitable augmentation methods.
Design a user-in-the-loop contrastive learning system that integrates expert feedback to iteratively improve the quality of learned audio representations.
Design an active learning strategy: The system should prioritize ambiguous or informative pairs where user feedback can provide the most significant improvement.
Evaluate the impact of interactive learning on downstream tasks such as species classification and unsupervised clustering by comparing it against traditional representation learning methods.

Ideal Candidate:

Currently enrolled in a master’s program in Computer Science, Data Science or similar.
Good programming skills in Python and experience with machine learning frameworks such as TensorFlow or PyTorch.
Analytical skills and creativity in solving complex problems.
Good communication skills for presenting research findings and collaborating across disciplines.

Contact: Rida Saghir (rida.saghir@dfki.de / rida.saghir@uni-oldenburg.de)