Témata prací (Výběr práce)Témata prací (Výběr práce)(verze: 368)
Detail práce
   Přihlásit přes CAS
Creating a Bilingual Dictionary using Wikipedia
Název práce v češtině: Creating a Bilingual Dictionary using Wikipedia
Název v anglickém jazyce: Creating a Bilingual Dictionary using Wikipedia
Klíčová slova: dvojjazyčný slovník, extrakce slovníku, wikipedie, interwiki, wikislovník
Klíčová slova anglicky: bilingual dictionary, dictionary extraction, wikipedia, interwiki, wictionary
Akademický rok vypsání: 2010/2011
Typ práce: diplomová práce
Jazyk práce: angličtina
Ústav: Ústav formální a aplikované lingvistiky (32-UFAL)
Vedoucí / školitel: RNDr. Daniel Zeman, Ph.D.
Řešitel: skrytý - zadáno a potvrzeno stud. odd.
Datum přihlášení: 12.11.2010
Datum zadání: 12.11.2010
Datum a čas obhajoby: 06.09.2011 00:00
Datum odevzdání elektronické podoby:05.08.2011
Datum odevzdání tištěné podoby:05.08.2011
Datum proběhlé obhajoby: 06.09.2011
Oponenti: Mgr. Pavel Straňák, Ph.D.
 
 
 
Zásady pro vypracování
Machine-readable bilingual dictionaries are needed in NLP (natural language processing) applications such as cross-language information retrieval or machine translation. They can also serve to enhance existing dictionaries, for second-language teaching and learning.

Wikipedia is a large on-line encyclopedia, freely available in many languages. Terms in wikipedia are categorized and cross-lingual links are usually available, which makes Wikipedia a vast and unique linguistic resource, especially for named entities and special terms that are not normally found in ordinary dictionaries.

The goal of the thesis is to explore methods of automatic acquisition of a bilingual dictionary from Wikipedia. The approach should be studied and tested thoroughly on one language pair (e.g. English-Russian) while its possible application to other languages should be borne in mind.

The related research questions are:

- How accurate are cross-language links? Manual evaluation/analysis of a selected part of the dictionary should be performed.
- How useful are the translation pairs within a broader NLP application such as statistical machine translation?
- Is there a difference in accuracy between RU-EN links and EN-RU links? Does accuracy go up if we only use the intersection of both? Could following third-language links (such as Russian-German + German-English) contribute to the precision or recall?
- Can we use other Wikipedia information, like redirects and anchor text (as suggested in Erdmann et al.), categories, disambiguation pages?
- Can we use the dictionary acquisition system for other language pairs?
Seznam odborné literatury
Gerard de Melo, Gerhard Weikum: Untangling the cross-lingual link structure of Wikipedia. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 844-853. Uppsala, Sweden, 2010. http://www.aclweb.org/anthology/P/P10/P10-1087.pdf

Maike Erdmann, Kotaro Nakayama, Takahiro Hara, Shojiro Nishio: A Bilingual Dictionary Extracted from the Wikipedia Link Structure. In: Lecture Notes in Computer Science, Volume 4947/2008, pp. 686-689, DOI: 10.1007/978-3-540-78568-2_63. Springer Verlag: Berlin / Heidelberg, Germany, 2008.

Kun Yu, Junichi Tsujii: Bilingual Dictionary Extraction from Wikipedia. In: Proceedings of Machine Translation Summit XII. Ottawa, Canada, 2009. http://www.mt-archive.info/MTS-2009-Yu.pdf

Anna Tordai, Amir Ghazvinian, Jacco van Ossenbruggen, Mark A. Musen, Natalya F. Noy: Lost in Translation? Empirical Analysis of Mapping Compositions for Large Ontologies. In: Proceedings of the Fifth International Workshop on Ontology Matching (OM-2010), Shanghai, China, 2010. http://dit.unitn.it/~p2p/OM-2010/om2010_Tpaper2.pdf

Olena Medelyan, David Milne, Catherine Legg, Ian H. Witten: Mining meaning from Wikipedia. In: Int. J. Hum.-Comput. Stud. 67(9): 716-754 (2009).
 
Univerzita Karlova | Informační systém UK