Spoken Language Translation via Phoneme Representation of the Source Language
Thesis title in Czech: | Strojový překlad mluvené řeči přes fonetickou reprezentaci zdrojové řeči |
---|---|
Thesis title in English: | Spoken Language Translation via Phoneme Representation of the Source Language |
English key words: | spoken language translation, automatic speech recognition, transfer learning, non-native speech translation |
Academic year of topic announcement: | 2019/2020 |
Thesis type: | diploma thesis |
Thesis language: | angličtina |
Department: | Institute of Formal and Applied Linguistics (32-UFAL) |
Supervisor: | doc. RNDr. Ondřej Bojar, Ph.D. |
Author: | hidden - assigned and confirmed by the Study Dept. |
Date of registration: | 16.01.2020 |
Date of assignment: | 16.01.2020 |
Confirmed by Study dept. on: | 21.01.2020 |
Date and time of defence: | 08.07.2020 09:00 |
Date of electronic submission: | 28.05.2020 |
Date of submission of printed version: | 28.05.2020 |
Date of proceeded defence: | 08.07.2020 |
Opponents: | Mgr. Nino Peterek, Ph.D. |
Guidelines |
Spoken language translation (SLT) is an area of natural language processing with many practical scenarios such as business meetings, conferences or in multi-national organizations (e.g. EU). Currently, human interpreters are employed, but deep learning brings new possibilities in attempts to automate the task.
A promising idea is to refactor the traditional two-step approach of automatic speech recognition (ASR) of the source language and machine translation (MT) of the text into the target language. Fully end-to-end neural solutions are possible, but a simpler change could be sufficient: to shorten the ASR step, finishing only with a string of phonemes, and translate directly from source-side phonemes to target text. The goal of the thesis is to explore SLT with the outlined change: speech recognition into phonemes and translation from phonemes in the source to target words. The added benefit is the possibility to introduce a live adaptation to the current speaker by introducing a customized phoneme mapping with the aim to recover from the mismatch of the personal accent and the standard training data. The thesis will build an SLT framework with intermediate phoneme-level transcription step for the translation between Czech and English in both directions. One of the challenges will be to overcome scarcity of Czech training data, but the main focus will be the comparison of the phoneme-level SLT with the standard approach and the exploration of speaker adaptation on the fly. |
References |
Elizabeth Salesky, Matthias Sperber, and Alan W Black. Exploring phoneme-level speech representations for end-to-end speech translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1835–1841, Florence, Italy, July 2019. Association for Computational Linguistics.
Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary, Oleksii Kuchaiev, Jonathan M Cohen, Huyen Nguyen, and Ravi Teja Gadde. Jasper: An end-to-end convolutional neural acoustic model. arXiv preprint arXiv:1904.03288, 2019. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is All you Need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 6000–6010. Curran Associates, Inc., 2017. Martin Popel, Ondřej Bojar (2018): Training Tips for the Transformer Model. In: The Prague Bulletin of Mathematical Linguistics, ISSN 0032-6585, 110, pp. 43-70 |