Thesis (Selection of subject)

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Non-Autoregressive Neural Machine Translation

Thesis title in Czech:	Neautoregresivní neuronový strojový překlad
Thesis title in English:	Non-Autoregressive Neural Machine Translation
Key words:	strojový překlad\|hluboké učení\|zpracování přirozených jazyků
English key words:	machine translation\|deep learning\|natural language processing
Academic year of topic announcement:	2014/2015
Thesis type:	dissertation
Thesis language:	angličtina
Department:	Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor:	prof. RNDr. Jan Hajič, Dr.
Author:	hidden - assigned and confirmed by the Study Dept.
Date of registration:	06.10.2014
Date of assignment:	06.10.2014
Confirmed by Study dept. on:	06.10.2014
Date and time of defence:	09.02.2022 14:00
Date of electronic submission:	15.11.2021
Date of submission of printed version:	16.11.2021
Date of proceeded defence:	09.02.2022
Opponents:	Kevin Duh
	Mgr. Martin Popel, Ph.D.

References

Bahdanau, D. – Cho, K. – Bengio, Y. Neural Machine Translation by Jointly Learning to
Align and Translate. CoRR. 2014, abs/1409.0473. ISSN 2331-8422.

Vaswani, A. – Shazeer, N. – Parmar, N. – Uszkoreit, J. – Jones, L. – Gomez, A. N. – Kaiser,
Ł. – Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing
Systems 30, p. 6000–6010, Long Beach, CA, USA, December 2017. Curran Associates, Inc.

Gu, J. – Bradbury, J. – Xiong, C. – Li, V. O. K. – Socher, R. Non-Autoregressive Neural
Machine Translation. In 6th International Conference on Learning Representations, ICLR
2018, Vancouver, BC, Canada, April 2018. Available at: https://openreview.net/forum?
id=B1l8BtlCb

Lee, J. – Mansimov, E. – Cho, K. Deterministic Non-Autoregressive Neural Sequence Mod-
eling by Iterative Refinement. In Proceedings of the 2018 Conference on Empirical Methods
in Natural Language Processing, p. 1173–1182, Brussels, Belgium, November 2018. Asso-
ciation for Computational Linguistics. Available at: http://www.aclweb.org/anthology/
D18-1149 .

Ghazvininejad, M. – Levy, O. – Liu, Y. – Zettlemoyer, L. Mask-Predict: Parallel Decoding
of Conditional Masked Language Models. In Proceedings of the 2019 Conference on Empir-
ical Methods in Natural Language Processing and the 9th International Joint Conference on
Natural Language Processing (EMNLP-IJCNLP), p. 6111–6120, Hong Kong, China, Novem-
ber 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1633. Available
at: https://www.aclweb.org/anthology/D19-1633 .

Kaiser, L. – Bengio, S. – Roy, A. – Vaswani, A. – Parmar, N. – Uszkoreit, J. – Shazeer, N.
Fast Decoding in Sequence Models Using Discrete Latent Variables. In Dy, J. – Krause, A.
(Ed.) Proceedings of the 35th International Conference on Machine Learning, 80 / Proceedings
of Machine Learning Research, p. 2390–2399, Stockholmsmässan, Stockholm Sweden, 10–15
Jul 2018. PMLR. Available at: http://proceedings.mlr.press/v80/kaiser18a.html .

Saharia, C. – Chan, W. – Saxena, S. – Norouzi, M. Non-Autoregressive Machine Transla-
tion with Latent Alignments. In Proceedings of the 2020 Conference on Empirical Methods
in Natural Language Processing (EMNLP), p. 1098–1108, Online, November 2020. Associ-
ation for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.83. Available at:
https://www.aclweb.org/anthology/2020.emnlp-main.83

Preliminary scope of work

In recent years, neural machine translation has become the de-facto standard approach to machine translation. Using a neural network, the source sentence is processed into a hidden intermediate representation in continuous vector space, from which the target sentence is generated word by word.
The neural network translation model is autoregressive, which means that the output word probability distributions are conditioned on the previously generated words. This property constraints the otherwise highly parallelizable computation to be sequential.
Non-autoregressive translation models the output distributions as conditionally independent. This assumption allows for parallelization of the sentence generation algorithm, which brings significant speed-ups of the decoding process. However, the translation quality of these models is lower due to higher modeling error.
In this thesis, we bring toghether a number of techniques for improving the translation quality of non-autoregressive translation models, with the goal of preserving the high decoding speed. In order to provide fair comparison, we evaluate optimization methods invented and previously used only for autoregressive translation in context of non-autoregressive translation.

Preliminary scope of work in English