Thesis (Selection of subject)Thesis (Selection of subject)(version: 381)
Thesis details
   Login via CAS
Automatic Error Correction of Machine Translation Output
Thesis title in Czech: Automatická korektura chyb ve výstupu strojového překladu
Thesis title in English: Automatic Error Correction of Machine Translation Output
Key words: automatická post-editace, strojový překlad, strojové učení s dohledem, zpracování přirozeného jazyka, Treex
English key words: automatic post-editing, machine translation, supervised machine learning, natural language processing, Treex
Academic year of topic announcement: 2015/2016
Thesis type: diploma thesis
Thesis language: angličtina
Department: Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor: doc. RNDr. Ondřej Bojar, Ph.D.
Author: hidden - assigned and confirmed by the Study Dept.
Date of registration: 15.07.2015
Date of assignment: 08.04.2016
Confirmed by Study dept. on: 21.04.2016
Date and time of defence: 08.09.2016 09:00
Date of electronic submission:28.07.2016
Date of submission of printed version:28.07.2016
Date of proceeded defence: 08.09.2016
Opponents: RNDr. David Mareček, Ph.D.
 
 
 
Guidelines
The aim of the thesis is to propose, implement and evaluate methods for correcting frequent errors in machine translation output.

The designed methods need to be as language independent as possible. They will thus be primarily based on machine learning and rely on the availability of standard parallel corpora (source sentences and reference translations), monolingual data (target-side texts) as well as post-editing logs (source, MT output and editing operations of a human translator) or other possible sources.

The methods will be evaluated on standard datasets (e.g. WMT test sets) for translation from English into Czech and at least one other language (e.g. German, Polish, Romanian). The English-to-Czech direction allows for a direct comparison with Depfix, an existing tool where the corrections were manually encoded as rules.
References
Rosa Rudolf: Depfix, a Tool for Automatic Rule-based Post-editing of SMT. In: The Prague Bulletin of Mathematical Linguistics, Vol. 102, Copyright © Univerzita Karlova v Praze, ISSN 0032-6585, pp. 47-56, Oct 2014.

Bojar Ondřej, Buck Christian, Callison-Burch Chris, Federmann Christian, Haddow Barry, Koehn Philipp, Monz Christof, Post Matt, Soricut Radu, Specia Lucia: Findings of the 2013 Workshop on Statistical Machine Translation. In: Proceedings of the Eight Workshop on Statistical Machine Translation, Copyright © Association for Computational Linguistics, Sofija, Bulgaria, ISBN 978-1-937284-57-2, pp. 1-44, 2013.

Autodesk Post-Editing Data. http://www.islrn.org/resources/290-859-676-529-5/

M. Simard, C. Goutte and P. Isabelle (2007) Statistical phrase-based post-editing. Rochester, New York, pp. 508–515.

H. Béchara, Y. Ma and J. van Genabith (2011) Statistical post-editing for a statistical mt system. MT Summit XIII, pp. 308–315.
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html