Adapting Pretrained Models for Machine Translation
Thesis title in Czech: | Adaptace předtrénovaných modelů pro strojový překlad |
---|---|
Thesis title in English: | Adapting Pretrained Models for Machine Translation |
Key words: | adapters|machine translation|bert|transformer|transfer learning |
English key words: | adapters|machine translation|bert|transformer|transfer learning |
Academic year of topic announcement: | 2021/2022 |
Thesis type: | diploma thesis |
Thesis language: | angličtina |
Department: | Institute of Formal and Applied Linguistics (32-UFAL) |
Supervisor: | doc. RNDr. Ondřej Bojar, Ph.D. |
Author: | hidden - assigned and confirmed by the Study Dept. |
Date of registration: | 07.02.2022 |
Date of assignment: | 07.02.2022 |
Confirmed by Study dept. on: | 11.02.2022 |
Date and time of defence: | 02.09.2022 09:00 |
Date of electronic submission: | 21.07.2022 |
Date of submission of printed version: | 24.07.2022 |
Date of proceeded defence: | 02.09.2022 |
Opponents: | Mgr. Dušan Variš, Ph.D. |
Guidelines |
Pre-trained models are being made readily available by companies or research institutes so that the research community can reuse and repurpose them in other tasks. There are three common techniques for using these pretrained models: (1) continue training on the desired task, i.e. treat the pre-trained model as nothing more than a clever weight initialization, (2) train the last layer of the pre-trained model and keep the previous parts fixed, (3) add small parts of trainable parameters, so called 'adapters' [1,2,3], throughout the network and train them, while keeping the rest of the network fixed.
This thesis focuses on the task of machine translation and tries to benefit from models pre-trained monolingually on a language modeling task. This difference in the task inevitably requires some adaptation of the model. The question is how large this adaptation should be and which parts of the model it should concern [4]. Minimally, the thesis will connect two language models, one to serve as the encoder in the source language and the other to serve as the decoder in the target language. The common component of attention between the encoder and decoder has to be initialized randomly and the decoder has to be adapted to produce the target sentence left to right but the rest of the network can remain fixed, reusing weights of the two language models. Further experiments in the thesis will gradually allow more and more of the network to be fine-tuned for the translation task in question with the goal to examine which of the approaches delivers the best translation quality while considering the needed training time. The experiments will also consider the aspect of the domain of the text, that is, the training data and the final test set will come from a slightly different area. All the experiments can be limited to a single language pair, e.g. German to English translation. Most of the evaluations will rely on automatic measures of translation quality such as BLEU, chrF but also more recent metrics like COMET. A very small manual evaluation of the best setups at the end is desirable. |
References |
Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., Laroussilhe, Q.D., Gesmundo, A., Attariyan, M., & Gelly, S. (2019). Parameter-Efficient Transfer Learning for NLP. ICML.
Pfeiffer, J., Vulić, I., Gurevych, I., & Ruder, S. (2020, November). MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 7654–7673. doi:10.18653/v1/2020.emnlp-main.617 Bapna, A., Arivazhagan, N., & Firat, O. (2019). Simple, Scalable Adaptation for Neural Machine Translation. EMNLP. Winata, G.I., Wang, G., Xiong, C., & Hoi, S.C. (2021). Adapt-and-Adjust: Overcoming the Long-Tail Problem of Multilingual Speech Recognition. ArXiv, abs/2012.01687. |