Témata prací (Výběr práce)

Váš prohlížeč nepodporuje JavaScript nebo je jeho podpora vypnutá. Některé funkce nemusejí být dostupné.

Multimodalita ve zpracování přirozeného jazyka

Název práce v češtině:	Multimodalita ve zpracování přirozeného jazyka
Název v anglickém jazyce:	Multimodality in natural language processing
Klíčová slova:	multimodalita, zpracování přirozeného jazyka, hluboké učení, počítačové vidění
Klíčová slova anglicky:	multimodality, natural language processing, deep learning, computer vision
Akademický rok vypsání:	2023/2024
Typ práce:	disertační práce
Jazyk práce:
Ústav:	Ústav formální a aplikované lingvistiky (32-UFAL)
Vedoucí / školitel:	doc. RNDr. Pavel Pecina, Ph.D.
Řešitel:

Zásady pro vypracování

Integrating multimodality in natural language processing (NLP) is referred to as multimodal natural language processing. Research in this novel direction primarily aims at processing textual content using visual information (e.g., images and possibly video) to support various task (e.g., machine translation). Its motivation stems mainly from two linguistic challenges: lexical ambiguity and out of vocabulary words. Current studies show that visual information is indeed useful for translation resulting in modest but encouraging improvements in translation quality (Elliott et al. 2017, Calixto et al. 2017, Caglayan et al. 2018). Very recent work also evidences that using visual information helps in interpreting language when language is implicit (Collell et al. 2018). The aim of this work is to investigate the effect and of multimodal data processing in various NLP tasks.

Seznam odborné literatury

Goodfellow, I., Y. Bengio, and A. Courville 2016. Deep learning. Cambridge, MA, USA: MIT press.

Orhan Firat, Kyunghyun Cho, and Yoshua Bengio. Multi-way, multilingual neural machine translation with a shared attention mechanism. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 866–875, San Diego, California, June 2016. Association for Computational Linguistics.

Elliott, D., et al. (2016). Multi30K: Multilingual English-German image descriptions. Proc. of the 5th Workshop on Vision and Language (pp. 70-74).

Caglayan, O., et al. (2018). LIUM-CVC submissions for WMT18 multimodal translation task. Proc. WMT. Calixto, I., et al. (2017). Doubly-attentive decoder for multi-modal neural machine translation. Proc. ACL.

Libovický, J. & Helcl, J. (2017). Attention strategies for multi-source sequence-to-sequence learning. Proc. ACL.

Collell, G., et al. (2018). Acquiring common sense spatial knowledge through implicit spatial templates. Proc. AAAI.