Thesis (Selection of subject)

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Relational reasoning in vision-language models

Thesis title in Czech:	Relační reasoning ve vision-language modelech
Thesis title in English:	Relational reasoning in vision-language models
Key words:	relační síte\|relační uvažování\|hluboké učení
English key words:	Relation Networks\|relational reasoning\|deep learning
Academic year of topic announcement:	2023/2024
Thesis type:	Bachelor's thesis
Thesis language:	angličtina
Department:	Department of Theoretical Computer Science and Mathematical Logic (32-KTIML)
Supervisor:	RNDr. Jakub Bulín, Ph.D.
Author:	hidden - assigned and confirmed by the Study Dept.
Date of registration:	07.03.2024
Date of assignment:	12.03.2024
Confirmed by Study dept. on:	12.03.2024
Date and time of defence:	20.06.2025 09:00
Date of electronic submission:	04.05.2025
Date of submission of printed version:	04.05.2025
Date of proceeded defence:	20.06.2025
Opponents:	Mgr. Jindřich Libovický, Ph.D.

Guidelines

Vision-language models have made significant strides in interpreting and generating descriptions from visual data. However, their ability to perform complex relational reasoning remains a challenge. Relational reasoning involves understanding the relationships between different entities within a context, which is crucial for tasks such as visual question answering and scene understanding. Santoro et al. proposed a simple neural network module designed specifically to enhance relational reasoning in neural networks [1]. This thesis aims to explore the effectiveness of such modules when integrated into vision-language models.

The objective of this bachelor thesis is to implement a vision-language model enhanced with a relational reasoning module as described by Santoro et al. in "A simple neural network module for relational reasoning" (2017) [1]. The student will benchmark this enhanced model against a standard vision-language model, focusing on performance in relational reasoning tasks. This comparative analysis will help in understanding the impact of integrating relational reasoning capabilities into vision-language models.

References

[1] Santoro, Adam et al. “A simple neural network module for relational reasoning.” Neural Information Processing Systems (2017).
[2] McCallum, Andrew et al. “Chains of Reasoning over Entities, Relations, and Text using Recurrent Neural Networks.” Conference of the European Chapter of the Association for Computational Linguistics (2016).
[3] Hu, Ronghang et al. “Learning to Reason: End-to-End Module Networks for Visual Question Answering.” 2017 IEEE International Conference on Computer Vision (ICCV) (2017): 804-813.

Preliminary scope of work in English

The goal of the thesis is to implement and benchmark a vision-langauge model with a relational module as presented in an earlier work by Santoro et al. and compare its performance to a pure vision-language model on a relational reasoning dataset.