Témata prací (Výběr práce)Témata prací (Výběr práce)(verze: 368)
Detail práce
   Přihlásit přes CAS
Hybrid Machine Translation Approaches for Low-Resource Languages
Název práce v češtině: Hybrid Machine Translation Approaches for Low-Resource Languages
Název v anglickém jazyce: Hybrid Machine Translation Approaches for Low-Resource Languages
Klíčová slova: Hybrid Machine Translation, Low-resource languages, English-to-Urdu
Klíčová slova anglicky: Hybrid Machine Translation, Low-resource languages, English-to-Urdu
Akademický rok vypsání: 2010/2011
Typ práce: diplomová práce
Jazyk práce: angličtina
Ústav: Ústav formální a aplikované lingvistiky (32-UFAL)
Vedoucí / školitel: Mgr. Martin Popel, Ph.D.
Řešitel: skrytý - zadáno a potvrzeno stud. odd.
Datum přihlášení: 07.12.2010
Datum zadání: 07.12.2010
Datum a čas obhajoby: 06.09.2011 00:00
Datum odevzdání elektronické podoby:05.08.2011
Datum odevzdání tištěné podoby:05.08.2011
Datum proběhlé obhajoby: 06.09.2011
Oponenti: doc. RNDr. Vladislav Kuboň, Ph.D.
Zásady pro vypracování
In recent years, corpus based machine translation systems produce significant results for a number of language pairs. However, for low-resource languages like Urdu the purely statistical or purely example based methods are not performing well. On the other hand, the rule-based approaches require a huge amount of time and resources for the development of rules, which makes it difficult in most scenarios. Hybrid machine translation systems might be one of the solutions to overcome these problems, where we can combine the best of different approaches to achieve quality translation.

The goal of the thesis is to explore different combinations of approaches and to evaluate their performance over the standard corpus based methods currently in use. This includes:
1. Use of syntax-based and dependency-based reordering rules with Statistical Machine Translation.
2. Automatic extraction of lexical and syntactic rules using statistical methods to facilitate the Transfer-Based Machine Translation.

The novel element in the proposed work is to develop an algorithm to learn automatic reordering rules for English-to-Urdu statistical machine
translation. Moreover, this approach can be extended to learn lexical and syntactic rules to build a rule-based machine translation system.
Seznam odborné literatury
1. Visweswariah et al. Syntax Based Reordering with Automatically Derived Rules for Improved Statistical Machine Translation. (2010)
2. Xu et al. Using a Dependency Parser to Improve SMT for Subject-Object-Verb Languages. (2009)
3. Eisele et al. Using Moses to Integrate Multiple Rule-Based Machine Translation Engines into a Hybrid System. (2008)
4. Probst. Learning Transfer Rules for Machine Translation with Limited Data. (2005) pp. 1-297
5. Lavie et al. Experiments with a Hindi-to-English Transfer-based MT System under a Miserly Data Scenario. (2003) pp. 1-21
6. Dolan et al. MSR-MT: The Microsoft Research Machine Translation System. (2002)
Univerzita Karlova | Informační systém UK