Analýza chyb a možností zlepšení frázového strojového překladu z angličtiny do urdštiny
Thesis title in Czech: | Analýza chyb a možností zlepšení frázového strojového překladu z angličtiny do urdštiny |
---|---|
Thesis title in English: | Analyzing Errors and Chances of Improving English to Urdu Phrase-Based Translation |
Key words: | frázový překlad, jazyky svolným slovosledem, typy chyb v překladu |
English key words: | Phrase-based translation, Free-word order languages, error scheme |
Academic year of topic announcement: | 2009/2010 |
Thesis type: | diploma thesis |
Thesis language: | angličtina |
Department: | Institute of Formal and Applied Linguistics (32-UFAL) |
Supervisor: | doc. RNDr. Ondřej Bojar, Ph.D. |
Author: | hidden - assigned and confirmed by the Study Dept. |
Date of registration: | 09.11.2009 |
Date of assignment: | 09.11.2009 |
Confirmed by Study dept. on: | 29.04.2013 |
Date and time of defence: | 06.09.2010 00:00 |
Date of electronic submission: | 06.09.2010 |
Date of proceeded defence: | 06.09.2010 |
Opponents: | RNDr. Daniel Zeman, Ph.D. |
Guidelines |
The aim of the thesis is to analyze errors in English to Urdu phrase-based or hierarchical phrase-based machine translation, and to propose and evaluate a few possible improvements in translation quality.
The first step consists of setting up and running a suitable MT system, e.g. Moses or Joshua, including the necessary collection of a small training and evaluation parallel corpus. A thorough manual analysis of the system output of the given test corpus should indicate the most severe problems of the translation quality. The thesis should then attempt to tackle the identified issues by e.g.: (1) pre-processing of input English, such as word reordering, (2) preprocessing the training corpus in order to reduce unnecessary lexical ambiguity, (3) using additional factors (in Moses factored translation) to better model target-side morphological coherence. For any of the options, either rule-based or statistical approaches may be applied. The utility of the proposed modifications to the translation pipeline have to be evaluated by both automatic MT metrics as well as human judgments on a small subset of the test corpus. |
References |
Philipp Koehn and Hieu Hoang: Factored Translation Models. Proc. of EMNLP. 2007
Ondřej Bojar: English-to-Czech Factored Machine Translation. Proceedings of the Second Workshop on Statistical Machine Translation, ACL. 2007. Alexandra Birch, Miles Osborne and Philipp Koehn: CCG Supertags in Factored Statistical Machine Translation. Proceedings of the Second Workshop on Statistical Machine Translation, ACL. 2007. Ondřej Bojar, Pavel Straňák, Daniel Zeman: English-Hindi Translation in 21 Days, in Proc. of the 6th International Conference On Natural Language Processing (ICON-2008) NLP Tools Contest, International Institute of Information Technologies, Hyderabad, Pune, India, 2008. Peng Xu, Jaeho Kang, Michael Ringgaard and Franz Och: Using a Dependency Parser to Improve SMT for Subject-Object-Verb Languages. Proc. of HLT/NAACL 2009. |