Thesis (Selection of subject)

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Learning capabilities in Transformer Neural Networks

Thesis title in Czech:	Schopnosti učení v transformerových neuronových sítích
Thesis title in English:	Learning capabilities in Transformer Neural Networks
Key words:	neuronový strojový překlad\|katastrofické zapomínání\|modulární neuronové sítě\|navazující učení\|generalizace
English key words:	neural machine translation\|catastrophic forgetting\|modular neural networks\|incremental learning\|generalization
Academic year of topic announcement:	2015/2016
Thesis type:	dissertation
Thesis language:	angličtina
Department:	Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor:	doc. RNDr. Ondřej Bojar, Ph.D.
Author:	hidden - assigned and confirmed by the Study Dept.
Date of registration:	03.10.2016
Date of assignment:	03.10.2016
Confirmed by Study dept. on:	03.10.2016
Date and time of defence:	24.03.2023 10:40
Date of electronic submission:	05.12.2022
Date of submission of printed version:	02.01.2023
Date of proceeded defence:	24.03.2023
Opponents:	Rico Sennrich, Dr.
	Mgr. et Mgr. Ondřej Dušek, Ph.D.

Guidelines

„In recent years, Transformer-based neural networks have become a dominant approach to solving many NLP problems, reaching or surpassing human-level performance in several tasks. Although Transformers were inspired by the neural interactions inside of the biological brain, similarly to other deep learning approaches, their learning process is very different from that of a human.

The aim of this thesis is to investigate selected aspects of the Transformer networks with regards to their training process and discuss where the current training process underperforms compared to learning in humans (e.g. need for huge amounts of training examples, inability to extract meaningful priors for future few-shot learning). We plan to study the learning process in the context of sequence-to-sequence tasks, ranging from simple string manipulation to the challenging task of machine translation. The main focus will be on the problems of continuous learning and the problem of knowledge composition (using knowledge about solving simple problems to tackle more complex tasks). A part of the work will also investigate the possible overestimation of the generalization ability of the contemporary Transformers.

References

Brenden M. Lake, Tomer D. Ullman, Joshua B. Tenenbaum, and Samuel J. Gershman. 2017. Building machines that learn and think like people. Behavioral and Brain Sciences, 40:e253.
James Kirkpatrick, Razvan Pascanu, Neil C. Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences of the United States of America, 114 13:3521–3526.
Zhizhong Li and Derek Hoiem. 2016. Learning without forgetting. In European Conference on Computer Vision, pages 614–629. Springer.
Ekaterina Garmash and Christof Monz. 2016. Ensemble learning for multi-source neural machine translation. In COLING.
R. Aljundi, P. Chakravarty, and T. Tuytelaars. 2017. Expert gate: Lifelong learning with a network of experts. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 00, pages 7120–7129.