Témata prací (Výběr práce)

Váš prohlížeč nepodporuje JavaScript nebo je jeho podpora vypnutá. Některé funkce nemusejí být dostupné.

Analýza chyb a možností zlepšení frázového strojového překladu z angličtiny do urdštiny

Název práce v češtině:	Analýza chyb a možností zlepšení frázového strojového překladu z angličtiny do urdštiny
Název v anglickém jazyce:	Analyzing Errors and Chances of Improving English to Urdu Phrase-Based Translation
Klíčová slova:	frázový překlad, jazyky svolným slovosledem, typy chyb v překladu
Klíčová slova anglicky:	Phrase-based translation, Free-word order languages, error scheme
Akademický rok vypsání:	2009/2010
Typ práce:	diplomová práce
Jazyk práce:	angličtina
Ústav:	Ústav formální a aplikované lingvistiky (32-UFAL)
Vedoucí / školitel:	doc. RNDr. Ondřej Bojar, Ph.D.
Řešitel:	skrytý - zadáno a potvrzeno stud. odd.
Datum přihlášení:	09.11.2009
Datum zadání:	09.11.2009
Datum potvrzení stud. oddělením:	29.04.2013
Datum a čas obhajoby:	06.09.2010 00:00
Datum odevzdání elektronické podoby:	06.09.2010
Datum proběhlé obhajoby:	06.09.2010
Oponenti:	doc. RNDr. Daniel Zeman, Ph.D.

Zásady pro vypracování

The aim of the thesis is to analyze errors in English to Urdu phrase-based or hierarchical phrase-based machine translation, and to propose and evaluate a few possible improvements in translation quality.

The first step consists of setting up and running a suitable MT system, e.g. Moses or Joshua, including the necessary collection of a small training and evaluation parallel corpus. A thorough manual analysis of the system output of the given test corpus should indicate the most severe problems of the translation quality. The thesis should then attempt to tackle the identified issues by e.g.: (1) pre-processing of input English, such as word reordering, (2) preprocessing the training corpus in order to reduce unnecessary lexical ambiguity, (3) using additional factors (in Moses factored translation) to better model target-side morphological coherence. For any of the options, either rule-based or statistical approaches may be applied. The utility of the proposed modifications to the translation pipeline have to be evaluated by both automatic MT metrics as well as human judgments on a small subset of the test corpus.

Seznam odborné literatury

Philipp Koehn and Hieu Hoang: Factored Translation Models. Proc. of EMNLP. 2007

Ondřej Bojar: English-to-Czech Factored Machine Translation. Proceedings of the Second Workshop on Statistical Machine Translation, ACL. 2007.

Alexandra Birch, Miles Osborne and Philipp Koehn: CCG Supertags in Factored Statistical Machine Translation. Proceedings of the Second Workshop on Statistical Machine Translation, ACL. 2007.

Ondřej Bojar, Pavel Straňák, Daniel Zeman: English-Hindi Translation in 21 Days, in Proc. of the 6th International Conference On Natural Language Processing (ICON-2008) NLP Tools Contest, International Institute of Information Technologies, Hyderabad, Pune, India, 2008.

Peng Xu, Jaeho Kang, Michael Ringgaard and Franz Och: Using a Dependency Parser to Improve SMT for Subject-Object-Verb Languages. Proc. of HLT/NAACL 2009.