Learning capabilities in Transformer Neural Networks
Thesis title in Czech: | Schopnosti učení v transformerových neuronových sítích |
---|---|
Thesis title in English: | Learning capabilities in Transformer Neural Networks |
Key words: | neuronový strojový překlad|katastrofické zapomínání|modulární neuronové sítě|navazující učení|generalizace |
English key words: | neural machine translation|catastrophic forgetting|modular neural networks|incremental learning|generalization |
Academic year of topic announcement: | 2015/2016 |
Thesis type: | dissertation |
Thesis language: | angličtina |
Department: | Institute of Formal and Applied Linguistics (32-UFAL) |
Supervisor: | doc. RNDr. Ondřej Bojar, Ph.D. |
Author: | hidden - assigned and confirmed by the Study Dept. |
Date of registration: | 03.10.2016 |
Date of assignment: | 03.10.2016 |
Confirmed by Study dept. on: | 03.10.2016 |
Date and time of defence: | 24.03.2023 10:40 |
Date of electronic submission: | 05.12.2022 |
Date of submission of printed version: | 02.01.2023 |
Date of proceeded defence: | 24.03.2023 |
Opponents: | Rico Sennrich, Dr. |
Mgr. et Mgr. Ondřej Dušek, Ph.D. | |
Guidelines |
„In recent years, Transformer-based neural networks have become a dominant approach to solving many NLP problems, reaching or surpassing human-level performance in several tasks. Although Transformers were inspired by the neural interactions inside of the biological brain, similarly to other deep learning approaches, their learning process is very different from that of a human.
The aim of this thesis is to investigate selected aspects of the Transformer networks with regards to their training process and discuss where the current training process underperforms compared to learning in humans (e.g. need for huge amounts of training examples, inability to extract meaningful priors for future few-shot learning). We plan to study the learning process in the context of sequence-to-sequence tasks, ranging from simple string manipulation to the challenging task of machine translation. The main focus will be on the problems of continuous learning and the problem of knowledge composition (using knowledge about solving simple problems to tackle more complex tasks). A part of the work will also investigate the possible overestimation of the generalization ability of the contemporary Transformers. |
References |
Brenden M. Lake, Tomer D. Ullman, Joshua B. Tenenbaum, and Samuel J. Gershman. 2017. Building machines that learn and think like people. Behavioral and Brain Sciences, 40:e253.
James Kirkpatrick, Razvan Pascanu, Neil C. Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences of the United States of America, 114 13:3521–3526. Zhizhong Li and Derek Hoiem. 2016. Learning without forgetting. In European Conference on Computer Vision, pages 614–629. Springer. Ekaterina Garmash and Christof Monz. 2016. Ensemble learning for multi-source neural machine translation. In COLING. R. Aljundi, P. Chakravarty, and T. Tuytelaars. 2017. Expert gate: Lifelong learning with a network of experts. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 00, pages 7120–7129. |