Learning capabilities in Transformer Neural Networks
| Název práce v češtině: | Schopnosti učení v transformerových neuronových sítích |
|---|---|
| Název v anglickém jazyce: | Learning capabilities in Transformer Neural Networks |
| Klíčová slova: | neuronový strojový překlad|katastrofické zapomínání|modulární neuronové sítě|navazující učení|generalizace |
| Klíčová slova anglicky: | neural machine translation|catastrophic forgetting|modular neural networks|incremental learning|generalization |
| Akademický rok vypsání: | 2015/2016 |
| Typ práce: | disertační práce |
| Jazyk práce: | angličtina |
| Ústav: | Ústav formální a aplikované lingvistiky (32-UFAL) |
| Vedoucí / školitel: | doc. RNDr. Ondřej Bojar, Ph.D. |
| Řešitel: | skrytý - zadáno a potvrzeno stud. odd. |
| Datum přihlášení: | 03.10.2016 |
| Datum zadání: | 03.10.2016 |
| Datum potvrzení stud. oddělením: | 03.10.2016 |
| Datum a čas obhajoby: | 24.03.2023 10:40 |
| Datum odevzdání elektronické podoby: | 05.12.2022 |
| Datum odevzdání tištěné podoby: | 02.01.2023 |
| Datum proběhlé obhajoby: | 24.03.2023 |
| Oponenti: | Rico Sennrich, Dr. |
| Mgr. et Mgr. Ondřej Dušek, Ph.D. | |
| Zásady pro vypracování |
| „In recent years, Transformer-based neural networks have become a dominant approach to solving many NLP problems, reaching or surpassing human-level performance in several tasks. Although Transformers were inspired by the neural interactions inside of the biological brain, similarly to other deep learning approaches, their learning process is very different from that of a human.
The aim of this thesis is to investigate selected aspects of the Transformer networks with regards to their training process and discuss where the current training process underperforms compared to learning in humans (e.g. need for huge amounts of training examples, inability to extract meaningful priors for future few-shot learning). We plan to study the learning process in the context of sequence-to-sequence tasks, ranging from simple string manipulation to the challenging task of machine translation. The main focus will be on the problems of continuous learning and the problem of knowledge composition (using knowledge about solving simple problems to tackle more complex tasks). A part of the work will also investigate the possible overestimation of the generalization ability of the contemporary Transformers. |
| Seznam odborné literatury |
| Brenden M. Lake, Tomer D. Ullman, Joshua B. Tenenbaum, and Samuel J. Gershman. 2017. Building machines that learn and think like people. Behavioral and Brain Sciences, 40:e253.
James Kirkpatrick, Razvan Pascanu, Neil C. Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences of the United States of America, 114 13:3521–3526. Zhizhong Li and Derek Hoiem. 2016. Learning without forgetting. In European Conference on Computer Vision, pages 614–629. Springer. Ekaterina Garmash and Christof Monz. 2016. Ensemble learning for multi-source neural machine translation. In COLING. R. Aljundi, P. Chakravarty, and T. Tuytelaars. 2017. Expert gate: Lifelong learning with a network of experts. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 00, pages 7120–7129. |
- zadáno a potvrzeno stud. odd.