Thesis (Selection of subject)Thesis (Selection of subject)(version: 368)
Thesis details
   Login via CAS
Adapting Pretrained Models for Machine Translation
Thesis title in Czech: Adaptace předtrénovaných modelů pro strojový překlad
Thesis title in English: Adapting Pretrained Models for Machine Translation
Key words: adapters|machine translation|bert|transformer|transfer learning
English key words: adapters|machine translation|bert|transformer|transfer learning
Academic year of topic announcement: 2021/2022
Thesis type: diploma thesis
Thesis language: angličtina
Department: Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor: doc. RNDr. Ondřej Bojar, Ph.D.
Author: hidden - assigned and confirmed by the Study Dept.
Date of registration: 07.02.2022
Date of assignment: 07.02.2022
Confirmed by Study dept. on: 11.02.2022
Date and time of defence: 02.09.2022 09:00
Date of electronic submission:21.07.2022
Date of submission of printed version:24.07.2022
Date of proceeded defence: 02.09.2022
Opponents: Mgr. Dušan Variš, Ph.D.
 
 
 
Guidelines
Pre-trained models are being made readily available by companies or research institutes so that the research community can reuse and repurpose them in other tasks. There are three common techniques for using these pretrained models: (1) continue training on the desired task, i.e. treat the pre-trained model as nothing more than a clever weight initialization, (2) train the last layer of the pre-trained model and keep the previous parts fixed, (3) add small parts of trainable parameters, so called 'adapters' [1,2,3], throughout the network and train them, while keeping the rest of the network fixed.

This thesis focuses on the task of machine translation and tries to benefit from models pre-trained monolingually on a language modeling task. This difference in the task inevitably requires some adaptation of the model. The question is how large this adaptation should be and which parts of the model it should concern [4].

Minimally, the thesis will connect two language models, one to serve as the encoder in the source language and the other to serve as the decoder in the target language. The common component of attention between the encoder and decoder has to be initialized randomly and the decoder has to be adapted to produce the target sentence left to right but the rest of the network can remain fixed, reusing weights of the two language models.

Further experiments in the thesis will gradually allow more and more of the network to be fine-tuned for the translation task in question with the goal to examine which of the approaches delivers the best translation quality while considering the needed training time. The experiments will also consider the aspect of the domain of the text, that is, the training data and the final test set will come from a slightly different area. All the experiments can be limited to a single language pair, e.g. German to English translation.

Most of the evaluations will rely on automatic measures of translation quality such as BLEU, chrF but also more recent metrics like COMET. A very small manual evaluation of the best setups at the end is desirable.
References
Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., Laroussilhe, Q.D., Gesmundo, A., Attariyan, M., & Gelly, S. (2019). Parameter-Efficient Transfer Learning for NLP. ICML.

Pfeiffer, J., Vulić, I., Gurevych, I., & Ruder, S. (2020, November). MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 7654–7673. doi:10.18653/v1/2020.emnlp-main.617

Bapna, A., Arivazhagan, N., & Firat, O. (2019). Simple, Scalable Adaptation for Neural Machine Translation. EMNLP.

Winata, G.I., Wang, G., Xiong, C., & Hoi, S.C. (2021). Adapt-and-Adjust: Overcoming the Long-Tail Problem of Multilingual Speech Recognition. ArXiv, abs/2012.01687.
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html