Pivoting Machine Translation for Vietnamese
Thesis title in Czech: | Strojový překlad pro vietnamštinu s pivotním jazykem |
---|---|
Thesis title in English: | Pivoting Machine Translation for Vietnamese |
Key words: | statistický strojový překlad, metody překladu přes pivotní jazyk, kaskády systémů, triangulace frázové tabulky |
English key words: | Statistical Machine Translation, pivoting methods, system cascades, phrase table triangulation |
Academic year of topic announcement: | 2014/2015 |
Thesis type: | diploma thesis |
Thesis language: | angličtina |
Department: | Institute of Formal and Applied Linguistics (32-UFAL) |
Supervisor: | doc. RNDr. Ondřej Bojar, Ph.D. |
Author: | hidden - assigned and confirmed by the Study Dept. |
Date of registration: | 10.03.2015 |
Date of assignment: | 10.03.2015 |
Confirmed by Study dept. on: | 16.03.2015 |
Date and time of defence: | 09.09.2015 00:00 |
Date of electronic submission: | 30.07.2015 |
Date of submission of printed version: | 30.07.2015 |
Date of proceeded defence: | 09.09.2015 |
Opponents: | Mgr. Michal Novák, Ph.D. |
Guidelines |
The goal of the thesis is to create machine translation systems for translating between Vietnamese and Czech in both translation directions. The MT systems should be based on existing open-source toolkits for statistical machine translation.
Statistical MT systems rely on large collections of translated texts, a necessary part of the work is thus to collect and clean these so-called parallel corpora. The main scientific focus of the thesis are techniques of 'pivoting', i.e. translating from the source language to the target language with the help of resources from a third language. In particular, the thesis should examine the possibilities of reusing CzEng, a large Czech-English corpus, in the translation between Vietnamese and Czech. An inherent part of the thesis is a careful empirical evaluation of the proposed methods in contrast with a direct Vietnamese-Czech baseline. The systems should be evaluated by both automatic evaluation methods (given a reference translation), as well as manual evaluation methods. |
References |
Ondřej Bojar. Čeština a strojový překlad. ÚFAL, Praha, Czechia, ISBN 978-80-904571-4-0, 168 pp. 2012.
Philipp Koehn: Statistical Machine Translation. Cambridge University Press, ISBN-10: 0521874157, ISBN-13: 978-0521874151. 2009. http://www.statmt.org/book/ Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, Evan Herbst, Moses: Open Source Toolkit for Statistical Machine Translation, Annual Meeting of the Association for Computational Linguistics (ACL), demonstration session, Prague, Czech Republic, June 2007. http://www.statmt.org/moses/ Ondřej Bojar, Zdeněk Žabokrtský, et al. 2012. The Joy of Parallelism with CzEng 1.0. Proceedings of LREC2012. ELRA. Istanbul, Turkey. http://ufal.mff.cuni.cz/czeng |