Témata prací (Výběr práce)

Váš prohlížeč nepodporuje JavaScript nebo je jeho podpora vypnutá. Některé funkce nemusejí být dostupné.

Statistical Machine Translation between Languages with Significant Word Order Differences

Název práce v češtině:
Název v anglickém jazyce:	Statistical Machine Translation between Languages with Significant Word Order Differences
Akademický rok vypsání:	2009/2010
Typ práce:	diplomová práce
Jazyk práce:	angličtina
Ústav:	Ústav formální a aplikované lingvistiky (32-UFAL)
Vedoucí / školitel:	RNDr. Daniel Zeman, Ph.D.
Řešitel:	skrytý - zadáno a potvrzeno stud. odd.
Datum přihlášení:	10.11.2009
Datum zadání:	10.11.2009
Datum a čas obhajoby:	06.09.2010 00:00
Datum odevzdání elektronické podoby:	06.09.2010
Datum proběhlé obhajoby:	06.09.2010
Oponenti:	doc. RNDr. Ondřej Bojar, Ph.D.

Zásady pro vypracování

One of the difficulties statistical machine translation (SMT) systems face are differences in word order. When translating from a language with rather fixed SVO word order, such as English, to a language where the preferred word order is dramatically different (such as the SOV order of Urdu, Hindi, Korean, ...), the system has to learn long-distance reordering of the words. Higher degree of freedom of the word order of the target language is usually accompanied by higher morphological diversity, i.e. word affixes have to be generated based on the fixed word order in the source sentence.

The goal of the thesis is to explore the two mentioned (and possibly other related) classes of problems in practice, and to implement and evaluate techniques expected to help the SMT system to solve them. This includes:

1. Selecting a language pair with word order differences and collecting parallel data for the pair.
2. Training an existing SMT system on the data.
3. Evaluating the performance of the system and analyzing the errors it does. Estimating how much the accuracy of translation is affected by the problems mentioned above, and possibly what are the other types of error causes that dominate the output.
4. Implementing preprocessing and/or other techniques aimed at minimizing the found classes of errors. Evaluating their impact.

Seznam odborné literatury

Zhifei Li, Chris Callison-Burch, Sanjeev Khudanpur, Wren Thornton: Decoding in Joshua: Open-Source, Parsing-Based Machine Translation. In: The Prague Bulletin of Mathematical Linguistics, 91:47-56, 2009

Peng Xu, Jaeho Kang, Michael Ringgaard, Franz Och: Using a Dependency Parser to Improve SMT for Subject-Object-Verb Languages. In: Proceedings of HLT-NAACL, Boulder, Colorado, 2009

Ananthakrishnan Ramanathan, Pushpak Bhattacharyya, Jayprasad Hegde, Ritesh M. Shah, Sasikumar M: Simple Syntactic and Morphological Processing Can Help English-Hindi Statistical Machine Translation. In: Proceedings of IJCNLP, Hyderabad, India, 2008