Non-Autoregressive Neural Machine Translation
Název práce v češtině: | Neautoregresivní neuronový strojový překlad |
---|---|
Název v anglickém jazyce: | Non-Autoregressive Neural Machine Translation |
Klíčová slova: | strojový překlad|hluboké učení|zpracování přirozených jazyků |
Klíčová slova anglicky: | machine translation|deep learning|natural language processing |
Akademický rok vypsání: | 2014/2015 |
Typ práce: | disertační práce |
Jazyk práce: | angličtina |
Ústav: | Ústav formální a aplikované lingvistiky (32-UFAL) |
Vedoucí / školitel: | prof. RNDr. Jan Hajič, Dr. |
Řešitel: | skrytý - zadáno a potvrzeno stud. odd. |
Datum přihlášení: | 06.10.2014 |
Datum zadání: | 06.10.2014 |
Datum potvrzení stud. oddělením: | 06.10.2014 |
Datum a čas obhajoby: | 09.02.2022 14:00 |
Datum odevzdání elektronické podoby: | 15.11.2021 |
Datum odevzdání tištěné podoby: | 16.11.2021 |
Datum proběhlé obhajoby: | 09.02.2022 |
Oponenti: | Kevin Duh |
Mgr. Martin Popel, Ph.D. | |
Seznam odborné literatury |
Bahdanau, D. – Cho, K. – Bengio, Y. Neural Machine Translation by Jointly Learning to
Align and Translate. CoRR. 2014, abs/1409.0473. ISSN 2331-8422. Vaswani, A. – Shazeer, N. – Parmar, N. – Uszkoreit, J. – Jones, L. – Gomez, A. N. – Kaiser, Ł. – Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems 30, p. 6000–6010, Long Beach, CA, USA, December 2017. Curran Associates, Inc. Gu, J. – Bradbury, J. – Xiong, C. – Li, V. O. K. – Socher, R. Non-Autoregressive Neural Machine Translation. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 2018. Available at: https://openreview.net/forum? id=B1l8BtlCb Lee, J. – Mansimov, E. – Cho, K. Deterministic Non-Autoregressive Neural Sequence Mod- eling by Iterative Refinement. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, p. 1173–1182, Brussels, Belgium, November 2018. Asso- ciation for Computational Linguistics. Available at: http://www.aclweb.org/anthology/ D18-1149 . Ghazvininejad, M. – Levy, O. – Liu, Y. – Zettlemoyer, L. Mask-Predict: Parallel Decoding of Conditional Masked Language Models. In Proceedings of the 2019 Conference on Empir- ical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), p. 6111–6120, Hong Kong, China, Novem- ber 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1633. Available at: https://www.aclweb.org/anthology/D19-1633 . Kaiser, L. – Bengio, S. – Roy, A. – Vaswani, A. – Parmar, N. – Uszkoreit, J. – Shazeer, N. Fast Decoding in Sequence Models Using Discrete Latent Variables. In Dy, J. – Krause, A. (Ed.) Proceedings of the 35th International Conference on Machine Learning, 80 / Proceedings of Machine Learning Research, p. 2390–2399, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2018. PMLR. Available at: http://proceedings.mlr.press/v80/kaiser18a.html . Saharia, C. – Chan, W. – Saxena, S. – Norouzi, M. Non-Autoregressive Machine Transla- tion with Latent Alignments. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), p. 1098–1108, Online, November 2020. Associ- ation for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.83. Available at: https://www.aclweb.org/anthology/2020.emnlp-main.83 |
Předběžná náplň práce |
In recent years, neural machine translation has become the de-facto standard approach to machine translation. Using a neural network, the source sentence is processed into a hidden intermediate representation in continuous vector space, from which the target sentence is generated word by word.
The neural network translation model is autoregressive, which means that the output word probability distributions are conditioned on the previously generated words. This property constraints the otherwise highly parallelizable computation to be sequential. Non-autoregressive translation models the output distributions as conditionally independent. This assumption allows for parallelization of the sentence generation algorithm, which brings significant speed-ups of the decoding process. However, the translation quality of these models is lower due to higher modeling error. In this thesis, we bring toghether a number of techniques for improving the translation quality of non-autoregressive translation models, with the goal of preserving the high decoding speed. In order to provide fair comparison, we evaluate optimization methods invented and previously used only for autoregressive translation in context of non-autoregressive translation. |
Předběžná náplň práce v anglickém jazyce |
In recent years, neural machine translation has become the de-facto standard approach to machine translation. Using a neural network, the source sentence is processed into a hidden intermediate representation in continuous vector space, from which the target sentence is generated word by word.
The neural network translation model is autoregressive, which means that the output word probability distributions are conditioned on the previously generated words. This property constraints the otherwise highly parallelizable computation to be sequential. Non-autoregressive translation models the output distributions as conditionally independent. This assumption allows for parallelization of the sentence generation algorithm, which brings significant speed-ups of the decoding process. However, the translation quality of these models is lower due to higher modeling error. In this thesis, we bring toghether a number of techniques for improving the translation quality of non-autoregressive translation models, with the goal of preserving the high decoding speed. In order to provide fair comparison, we evaluate optimization methods invented and previously used only for autoregressive translation in context of non-autoregressive translation. |