Thesis (Selection of subject)Thesis (Selection of subject)(version: 381)
Thesis details
   Login via CAS
Pivoting Machine Translation for Vietnamese
Thesis title in Czech: Strojový překlad pro vietnamštinu s pivotním jazykem
Thesis title in English: Pivoting Machine Translation for Vietnamese
Key words: statistický strojový překlad, metody překladu přes pivotní jazyk, kaskády systémů, triangulace frázové tabulky
English key words: Statistical Machine Translation, pivoting methods, system cascades, phrase table triangulation
Academic year of topic announcement: 2014/2015
Thesis type: diploma thesis
Thesis language: angličtina
Department: Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor: doc. RNDr. Ondřej Bojar, Ph.D.
Author: hidden - assigned and confirmed by the Study Dept.
Date of registration: 10.03.2015
Date of assignment: 10.03.2015
Confirmed by Study dept. on: 16.03.2015
Date and time of defence: 09.09.2015 00:00
Date of electronic submission:30.07.2015
Date of submission of printed version:30.07.2015
Date of proceeded defence: 09.09.2015
Opponents: Mgr. Michal Novák, Ph.D.
 
 
 
Guidelines
The goal of the thesis is to create machine translation systems for translating between Vietnamese and Czech in both translation directions. The MT systems should be based on existing open-source toolkits for statistical machine translation.

Statistical MT systems rely on large collections of translated texts, a necessary part of the work is thus to collect and clean these so-called parallel corpora.

The main scientific focus of the thesis are techniques of 'pivoting', i.e. translating from the source language to the target language with the help of resources from a third language. In particular, the thesis should examine the possibilities of reusing CzEng, a large Czech-English corpus, in the translation between Vietnamese and Czech.

An inherent part of the thesis is a careful empirical evaluation of the proposed methods in contrast with a direct Vietnamese-Czech baseline. The systems should be evaluated by both automatic evaluation methods (given a reference translation), as well as manual evaluation methods.
References
Ondřej Bojar. Čeština a strojový překlad. ÚFAL, Praha, Czechia, ISBN 978-80-904571-4-0, 168 pp. 2012.

Philipp Koehn: Statistical Machine Translation. Cambridge University Press, ISBN-10: 0521874157, ISBN-13: 978-0521874151. 2009.
http://www.statmt.org/book/

Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, Evan Herbst, Moses: Open Source Toolkit for Statistical Machine Translation, Annual Meeting of the Association for Computational Linguistics (ACL), demonstration session, Prague, Czech Republic, June 2007.
http://www.statmt.org/moses/

Ondřej Bojar, Zdeněk Žabokrtský, et al. 2012. The Joy of Parallelism with CzEng 1.0. Proceedings of LREC2012. ELRA. Istanbul, Turkey.
http://ufal.mff.cuni.cz/czeng
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html