Témata prací (Výběr práce)

Váš prohlížeč nepodporuje JavaScript nebo je jeho podpora vypnutá. Některé funkce nemusejí být dostupné.

Understanding cross-lingual abilities in large multilingual language models

Název práce v češtině:	Porozumění mezijazykovým vlastnostem ve velkých vícejazyčných jazykových modelech
Název v anglickém jazyce:	Understanding cross-lingual abilities in large multilingual language models
Klíčová slova:	transfer learning\|cross-lingual learning\|low-resource\|language models
Klíčová slova anglicky:	transfer learning\|cross-lingual learning\|low-resource\|language models
Akademický rok vypsání:	2022/2023
Typ práce:	diplomová práce
Jazyk práce:	angličtina
Ústav:	Ústav formální a aplikované lingvistiky (32-UFAL)
Vedoucí / školitel:	Mgr. Jindřich Libovický, Ph.D.
Řešitel:	skrytý - zadáno a potvrzeno stud. odd.
Datum přihlášení:	14.03.2023
Datum zadání:	14.03.2023
Datum potvrzení stud. oddělením:	05.04.2023
Datum a čas obhajoby:	06.09.2023 09:00
Datum odevzdání elektronické podoby:	20.07.2023
Datum odevzdání tištěné podoby:	23.07.2023
Datum proběhlé obhajoby:	06.09.2023
Oponenti:	Ing. Tomasz Limisiewicz

Zásady pro vypracování

Over the past few years, cross-lingual abilities have been evident in large multilingual language models. However, understanding why and under what circumstances they work is unclear. This work aims to better understand these aspects in a subset of multilingual models. The core idea is to evaluate how parameters change and remain compatible with data from other languages, i.e., how shared representations arise. Furthermore, we explore the robustness of current approaches in different fine-tuning scenarios.

The is a part of the double-degree European Masters Program in Language and Communication Technologies (LCT). The main supervisors of the thesis are Mareike Hartmann and Marius Mosbach from Saarland University in Saarbrücken, Germany.

Seznam odborné literatury

Jindřich Libovický, Rudolf Rosa, and Alexander Fraser. 2020. On the Language Neutrality of Pre-trained Multilingual Representations. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1663–1674, Online. Association for Computational Linguistics.

Dan Malkin, Tomasz Limisiewicz, and Gabriel Stanovsky. 2022. A Balanced Data Approach for Evaluating Cross-Lingual Transfer: Mapping the Linguistic Blood Bank. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4903–4915, Seattle, United States. Association for Computational Linguistics.

Jonas Pfeiffer, Naman Goyal, Xi Lin, Xian Li, James Cross, Sebastian Riedel, and Mikel Artetxe. 2022. Lifting the Curse of Multilinguality by Pre-training Modular Transformers. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3479–3495, Seattle, United States. Association for Computational Linguistics.

Philipp Dufter and Hinrich Schütze. 2020. Identifying Elements Essential for BERT’s Multilinguality. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4423–4437, Online. Association for Computational Linguistics.