Thesis (Selection of subject)Thesis (Selection of subject)(version: 368)
Thesis details
   Login via CAS
Grounding Natural Language Inference on Images
Thesis title in Czech: Vyvozování v přirozeném jazyce s využitím obrazových dat
Thesis title in English: Grounding Natural Language Inference on Images
Key words: vyvozování v přirozeném jazyce
English key words: Grounding Natural Language Inference on Images
Academic year of topic announcement: 2016/2017
Thesis type: diploma thesis
Thesis language: angličtina
Department: Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor: doc. RNDr. Pavel Pecina, Ph.D.
Author: hidden - assigned and confirmed by the Study Dept.
Date of registration: 12.06.2017
Date of assignment: 13.06.2017
Confirmed by Study dept. on: 07.08.2018
Date and time of defence: 11.09.2018 09:00
Date of electronic submission:21.07.2018
Date of submission of printed version:20.07.2018
Date of proceeded defence: 11.09.2018
Opponents: Mgr. Jindřich Libovický, Ph.D.
 
 
 
Guidelines
Natural Language Inference (NLI) which involves understanding of entailment and contradiction is the basic step towards the development of semantic representation of text. Given a premise, the task of NLI is to determine whether a hypothesis can be inferred (entailment), or the hypothesis not true (contradiction) or not sufficient information(neutral). In recent years, NLI has become very important testing ground for word, phrase and sentence representation. Availability of the data for this task has been limited until the recent release of the SNLI dataset, which is two orders of magnitude larger than all other resources of its type which enables applying various techniques, including deep learning.

The aim of the thesis is to examine whether visual representations can drive the process of learning semantic representations. The thesis will investigate different methodologies used in NLI and image understanding, including learning effective sentence and image representation and how they are embedded in the same space.
References
S. Bowman, G. Angeli, C. Potts, and C. Manning (2015). A large annotated corpus for learning natural language inference. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics. Lisbon, Portugal.

B. MacCartney and C. D. Manning (2008). Modeling semantic containment and exclusion in natural language inference. In Proceedings of the 22nd international conference on computational linguistics-volume 1 (pp. 521–
528). Manchester, UK.

Z. Wang, W. Hamza, and R. Florian (2017). Bilateral multi-perspective match- ing for natural language sentences. arXiv preprint arXiv:1702.03814 .
Preliminary scope of work
Despite the surge of research interest in problems involving linguistic and visual information, exploring multimodal data for Natural Language Inference remains unexplored. Natural Language Inference, regarded as the basic step towards Natural Language Understanding, is extremely challenging due to the natural complexity of human languages. However, we believe this issue can be alleviated by using multimodal data. Given an image and its description, our proposed task is to determined whether a natural language hypothesis contradicts, entails or is neutral with regards to the image and its description. To address this problem, we develop a multimodal framework based on the Bilateral Multi-perspective Matching framework. Data is collected by mapping the SNLI dataset with the image dataset Flickr30k. The result dataset, made publicly available, has more than 565k instances. Experiments on this dataset show that the multimodal model outperforms the state-of-the-art textual model.
Preliminary scope of work in English
Despite the surge of research interest in problems involving linguistic and visual information, exploring multimodal data for Natural Language Inference remains unexplored. Natural Language Inference, regarded as the basic step towards Natural Language Understanding, is extremely challenging due to the natural complexity of human languages. However, we believe this issue can be alleviated by using multimodal data. Given an image and its description, our proposed task is to determined whether a natural language hypothesis contradicts, entails or is neutral with regards to the image and its description. To address this problem, we develop a multimodal framework based on the Bilateral Multi-perspective Matching framework. Data is collected by mapping the SNLI dataset with the image dataset Flickr30k. The result dataset, made publicly available, has more than 565k instances. Experiments on this dataset show that the multimodal model outperforms the state-of-the-art textual model.
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html