Creating a Bilingual Dictionary using Wikipedia
Název práce v češtině: | Creating a Bilingual Dictionary using Wikipedia |
---|---|
Název v anglickém jazyce: | Creating a Bilingual Dictionary using Wikipedia |
Klíčová slova: | dvojjazyčný slovník, extrakce slovníku, wikipedie, interwiki, wikislovník |
Klíčová slova anglicky: | bilingual dictionary, dictionary extraction, wikipedia, interwiki, wictionary |
Akademický rok vypsání: | 2010/2011 |
Typ práce: | diplomová práce |
Jazyk práce: | angličtina |
Ústav: | Ústav formální a aplikované lingvistiky (32-UFAL) |
Vedoucí / školitel: | RNDr. Daniel Zeman, Ph.D. |
Řešitel: | skrytý - zadáno a potvrzeno stud. odd. |
Datum přihlášení: | 12.11.2010 |
Datum zadání: | 12.11.2010 |
Datum a čas obhajoby: | 06.09.2011 00:00 |
Datum odevzdání elektronické podoby: | 05.08.2011 |
Datum odevzdání tištěné podoby: | 05.08.2011 |
Datum proběhlé obhajoby: | 06.09.2011 |
Oponenti: | Mgr. Pavel Straňák, Ph.D. |
Zásady pro vypracování |
Machine-readable bilingual dictionaries are needed in NLP (natural language processing) applications such as cross-language information retrieval or machine translation. They can also serve to enhance existing dictionaries, for second-language teaching and learning.
Wikipedia is a large on-line encyclopedia, freely available in many languages. Terms in wikipedia are categorized and cross-lingual links are usually available, which makes Wikipedia a vast and unique linguistic resource, especially for named entities and special terms that are not normally found in ordinary dictionaries. The goal of the thesis is to explore methods of automatic acquisition of a bilingual dictionary from Wikipedia. The approach should be studied and tested thoroughly on one language pair (e.g. English-Russian) while its possible application to other languages should be borne in mind. The related research questions are: - How accurate are cross-language links? Manual evaluation/analysis of a selected part of the dictionary should be performed. - How useful are the translation pairs within a broader NLP application such as statistical machine translation? - Is there a difference in accuracy between RU-EN links and EN-RU links? Does accuracy go up if we only use the intersection of both? Could following third-language links (such as Russian-German + German-English) contribute to the precision or recall? - Can we use other Wikipedia information, like redirects and anchor text (as suggested in Erdmann et al.), categories, disambiguation pages? - Can we use the dictionary acquisition system for other language pairs? |
Seznam odborné literatury |
Gerard de Melo, Gerhard Weikum: Untangling the cross-lingual link structure of Wikipedia. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 844-853. Uppsala, Sweden, 2010. http://www.aclweb.org/anthology/P/P10/P10-1087.pdf
Maike Erdmann, Kotaro Nakayama, Takahiro Hara, Shojiro Nishio: A Bilingual Dictionary Extracted from the Wikipedia Link Structure. In: Lecture Notes in Computer Science, Volume 4947/2008, pp. 686-689, DOI: 10.1007/978-3-540-78568-2_63. Springer Verlag: Berlin / Heidelberg, Germany, 2008. Kun Yu, Junichi Tsujii: Bilingual Dictionary Extraction from Wikipedia. In: Proceedings of Machine Translation Summit XII. Ottawa, Canada, 2009. http://www.mt-archive.info/MTS-2009-Yu.pdf Anna Tordai, Amir Ghazvinian, Jacco van Ossenbruggen, Mark A. Musen, Natalya F. Noy: Lost in Translation? Empirical Analysis of Mapping Compositions for Large Ontologies. In: Proceedings of the Fifth International Workshop on Ontology Matching (OM-2010), Shanghai, China, 2010. http://dit.unitn.it/~p2p/OM-2010/om2010_Tpaper2.pdf Olena Medelyan, David Milne, Catherine Legg, Ian H. Witten: Mining meaning from Wikipedia. In: Int. J. Hum.-Comput. Stud. 67(9): 716-754 (2009). |