Thesis (Selection of subject)

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Textual Ciphers as a Tool for Better Understanding the Transformers

Thesis title in Czech:	Textové šifry jako nástroj pro lepší pochopení modelů Transformer
Thesis title in English:	Textual Ciphers as a Tool for Better Understanding the Transformers
Key words:	Transformer\|interpretovatelnost\|NLP\|deep learning\|šifry
English key words:	Transformer\|interpretability\|NLP\|deep learning\|ciphers
Academic year of topic announcement:	2023/2024
Thesis type:	Bachelor's thesis
Thesis language:	angličtina
Department:	Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor:	Mgr. Jindřich Libovický, Ph.D.
Author:	hidden - assigned and confirmed by the Study Dept.
Date of registration:	29.09.2023
Date of assignment:	29.09.2023
Confirmed by Study dept. on:	13.10.2023
Date and time of defence:	28.06.2024 09:00
Date of electronic submission:	09.05.2024
Date of submission of printed version:	09.05.2024
Date of proceeded defence:	28.06.2024
Opponents:	Ing. Zdeněk Kasner, Ph.D.

Guidelines

Basic textual ciphers (substitution, Caesar cipher, Vigenère cipher) transform meaningful texts into strings that are, at first glance, incomprehensible, and deciphering them requires quite an effort without knowing the cipher key. Transformer models that are intensively used in Natural Language Processing (NLP), including nowadays very popular language modeling, can be trained to decipher such texts even with relatively few parameters. Doing so requires reverse-engineering the cipher algorithm and some knowledge of the language that allows the model to guess the key internally. Unlike standard NLP problems such as machine translation, question answering, or sentiment analysis, there is very little interference with cultural aspects of meaning, and the tasks consist purely of language and computation. This makes it an ideal task for studying what language phenomena are the easiest for the Transformers so that they can rely on them in this noisy setup.

By experimenting with various test sets focusing on different types of language features (for instance, statistical and information theoretical properties such as character distribution, word frequencies, n-gram perplexity, or length; linguistic features such as dependency tree complexity, part-of-speech statistics, presence of named entities), the student will estimate what language phenomena make the deciphering easy and what makes deciphering difficult. This analysis will serve as a proxy for understanding the training dynamics of the Transformer models in the early stages of training.

References

Bengio, Y., Goodfellow, I., & Courville, A. (2017). Deep learning (Vol. 1). Cambridge, MA, USA: MIT Press.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901.

Greydanus, S. (2017). Learning the enigma with recurrent neural networks. arXiv preprint arXiv:1708.07576.

Aldarrab, N., & May, J. (2020). Can Sequence-to-Sequence Models Crack Substitution Ciphers?. arXiv preprint arXiv:2012.15229.