Statistical Machine Translation between Languages with Significant Word Order Differences
Název práce v češtině: | |
---|---|
Název v anglickém jazyce: | Statistical Machine Translation between Languages with Significant Word Order Differences |
Akademický rok vypsání: | 2009/2010 |
Typ práce: | diplomová práce |
Jazyk práce: | angličtina |
Ústav: | Ústav formální a aplikované lingvistiky (32-UFAL) |
Vedoucí / školitel: | RNDr. Daniel Zeman, Ph.D. |
Řešitel: | skrytý - zadáno a potvrzeno stud. odd. |
Datum přihlášení: | 10.11.2009 |
Datum zadání: | 10.11.2009 |
Datum a čas obhajoby: | 06.09.2010 00:00 |
Datum odevzdání elektronické podoby: | 06.09.2010 |
Datum proběhlé obhajoby: | 06.09.2010 |
Oponenti: | doc. RNDr. Ondřej Bojar, Ph.D. |
Zásady pro vypracování |
One of the difficulties statistical machine translation (SMT) systems face are differences in word order. When translating from a language with rather fixed SVO word order, such as English, to a language where the preferred word order is dramatically different (such as the SOV order of Urdu, Hindi, Korean, ...), the system has to learn long-distance reordering of the words. Higher degree of freedom of the word order of the target language is usually accompanied by higher morphological diversity, i.e. word affixes have to be generated based on the fixed word order in the source sentence.
The goal of the thesis is to explore the two mentioned (and possibly other related) classes of problems in practice, and to implement and evaluate techniques expected to help the SMT system to solve them. This includes: 1. Selecting a language pair with word order differences and collecting parallel data for the pair. 2. Training an existing SMT system on the data. 3. Evaluating the performance of the system and analyzing the errors it does. Estimating how much the accuracy of translation is affected by the problems mentioned above, and possibly what are the other types of error causes that dominate the output. 4. Implementing preprocessing and/or other techniques aimed at minimizing the found classes of errors. Evaluating their impact. |
Seznam odborné literatury |
Zhifei Li, Chris Callison-Burch, Sanjeev Khudanpur, Wren Thornton: Decoding in Joshua: Open-Source, Parsing-Based Machine Translation. In: The Prague Bulletin of Mathematical Linguistics, 91:47-56, 2009
Peng Xu, Jaeho Kang, Michael Ringgaard, Franz Och: Using a Dependency Parser to Improve SMT for Subject-Object-Verb Languages. In: Proceedings of HLT-NAACL, Boulder, Colorado, 2009 Ananthakrishnan Ramanathan, Pushpak Bhattacharyya, Jayprasad Hegde, Ritesh M. Shah, Sasikumar M: Simple Syntactic and Morphological Processing Can Help English-Hindi Statistical Machine Translation. In: Proceedings of IJCNLP, Hyderabad, India, 2008 |