Témata prací (Výběr práce)Témata prací (Výběr práce)(verze: 368)
Detail práce
   Přihlásit přes CAS
Grounding Natural Language Inference on Images
Název práce v češtině: Vyvozování v přirozeném jazyce s využitím obrazových dat
Název v anglickém jazyce: Grounding Natural Language Inference on Images
Klíčová slova: vyvozování v přirozeném jazyce
Klíčová slova anglicky: Grounding Natural Language Inference on Images
Akademický rok vypsání: 2016/2017
Typ práce: diplomová práce
Jazyk práce: angličtina
Ústav: Ústav formální a aplikované lingvistiky (32-UFAL)
Vedoucí / školitel: doc. RNDr. Pavel Pecina, Ph.D.
Řešitel: skrytý - zadáno a potvrzeno stud. odd.
Datum přihlášení: 12.06.2017
Datum zadání: 13.06.2017
Datum potvrzení stud. oddělením: 07.08.2018
Datum a čas obhajoby: 11.09.2018 09:00
Datum odevzdání elektronické podoby:21.07.2018
Datum odevzdání tištěné podoby:20.07.2018
Datum proběhlé obhajoby: 11.09.2018
Oponenti: Mgr. Jindřich Libovický, Ph.D.
 
 
 
Zásady pro vypracování
Natural Language Inference (NLI) which involves understanding of entailment and contradiction is the basic step towards the development of semantic representation of text. Given a premise, the task of NLI is to determine whether a hypothesis can be inferred (entailment), or the hypothesis not true (contradiction) or not sufficient information(neutral). In recent years, NLI has become very important testing ground for word, phrase and sentence representation. Availability of the data for this task has been limited until the recent release of the SNLI dataset, which is two orders of magnitude larger than all other resources of its type which enables applying various techniques, including deep learning.

The aim of the thesis is to examine whether visual representations can drive the process of learning semantic representations. The thesis will investigate different methodologies used in NLI and image understanding, including learning effective sentence and image representation and how they are embedded in the same space.
Seznam odborné literatury
S. Bowman, G. Angeli, C. Potts, and C. Manning (2015). A large annotated corpus for learning natural language inference. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics. Lisbon, Portugal.

B. MacCartney and C. D. Manning (2008). Modeling semantic containment and exclusion in natural language inference. In Proceedings of the 22nd international conference on computational linguistics-volume 1 (pp. 521–
528). Manchester, UK.

Z. Wang, W. Hamza, and R. Florian (2017). Bilateral multi-perspective match- ing for natural language sentences. arXiv preprint arXiv:1702.03814 .
Předběžná náplň práce
Despite the surge of research interest in problems involving linguistic and visual information, exploring multimodal data for Natural Language Inference remains unexplored. Natural Language Inference, regarded as the basic step towards Natural Language Understanding, is extremely challenging due to the natural complexity of human languages. However, we believe this issue can be alleviated by using multimodal data. Given an image and its description, our proposed task is to determined whether a natural language hypothesis contradicts, entails or is neutral with regards to the image and its description. To address this problem, we develop a multimodal framework based on the Bilateral Multi-perspective Matching framework. Data is collected by mapping the SNLI dataset with the image dataset Flickr30k. The result dataset, made publicly available, has more than 565k instances. Experiments on this dataset show that the multimodal model outperforms the state-of-the-art textual model.
Předběžná náplň práce v anglickém jazyce
Despite the surge of research interest in problems involving linguistic and visual information, exploring multimodal data for Natural Language Inference remains unexplored. Natural Language Inference, regarded as the basic step towards Natural Language Understanding, is extremely challenging due to the natural complexity of human languages. However, we believe this issue can be alleviated by using multimodal data. Given an image and its description, our proposed task is to determined whether a natural language hypothesis contradicts, entails or is neutral with regards to the image and its description. To address this problem, we develop a multimodal framework based on the Bilateral Multi-perspective Matching framework. Data is collected by mapping the SNLI dataset with the image dataset Flickr30k. The result dataset, made publicly available, has more than 565k instances. Experiments on this dataset show that the multimodal model outperforms the state-of-the-art textual model.
 
Univerzita Karlova | Informační systém UK