Automatic Error Correction of Machine Translation Output
Thesis title in Czech: | Automatická korektura chyb ve výstupu strojového překladu |
---|---|
Thesis title in English: | Automatic Error Correction of Machine Translation Output |
Key words: | automatická post-editace, strojový překlad, strojové učení s dohledem, zpracování přirozeného jazyka, Treex |
English key words: | automatic post-editing, machine translation, supervised machine learning, natural language processing, Treex |
Academic year of topic announcement: | 2015/2016 |
Thesis type: | diploma thesis |
Thesis language: | angličtina |
Department: | Institute of Formal and Applied Linguistics (32-UFAL) |
Supervisor: | doc. RNDr. Ondřej Bojar, Ph.D. |
Author: | hidden - assigned and confirmed by the Study Dept. |
Date of registration: | 15.07.2015 |
Date of assignment: | 08.04.2016 |
Confirmed by Study dept. on: | 21.04.2016 |
Date and time of defence: | 08.09.2016 09:00 |
Date of electronic submission: | 28.07.2016 |
Date of submission of printed version: | 28.07.2016 |
Date of proceeded defence: | 08.09.2016 |
Opponents: | RNDr. David Mareček, Ph.D. |
Guidelines |
The aim of the thesis is to propose, implement and evaluate methods for correcting frequent errors in machine translation output.
The designed methods need to be as language independent as possible. They will thus be primarily based on machine learning and rely on the availability of standard parallel corpora (source sentences and reference translations), monolingual data (target-side texts) as well as post-editing logs (source, MT output and editing operations of a human translator) or other possible sources. The methods will be evaluated on standard datasets (e.g. WMT test sets) for translation from English into Czech and at least one other language (e.g. German, Polish, Romanian). The English-to-Czech direction allows for a direct comparison with Depfix, an existing tool where the corrections were manually encoded as rules. |
References |
Rosa Rudolf: Depfix, a Tool for Automatic Rule-based Post-editing of SMT. In: The Prague Bulletin of Mathematical Linguistics, Vol. 102, Copyright © Univerzita Karlova v Praze, ISSN 0032-6585, pp. 47-56, Oct 2014.
Bojar Ondřej, Buck Christian, Callison-Burch Chris, Federmann Christian, Haddow Barry, Koehn Philipp, Monz Christof, Post Matt, Soricut Radu, Specia Lucia: Findings of the 2013 Workshop on Statistical Machine Translation. In: Proceedings of the Eight Workshop on Statistical Machine Translation, Copyright © Association for Computational Linguistics, Sofija, Bulgaria, ISBN 978-1-937284-57-2, pp. 1-44, 2013. Autodesk Post-Editing Data. http://www.islrn.org/resources/290-859-676-529-5/ M. Simard, C. Goutte and P. Isabelle (2007) Statistical phrase-based post-editing. Rochester, New York, pp. 508–515. H. Béchara, Y. Ma and J. van Genabith (2011) Statistical post-editing for a statistical mt system. MT Summit XIII, pp. 308–315. |