Entity retrieval on Wikipedia in the scope of the gikiCLEF track
Název práce v češtině: | |
---|---|
Název v anglickém jazyce: | Entity retrieval on Wikipedia in the scope of the gikiCLEF track |
Akademický rok vypsání: | 2008/2009 |
Typ práce: | diplomová práce |
Jazyk práce: | angličtina |
Ústav: | Ústav formální a aplikované lingvistiky (32-UFAL) |
Vedoucí / školitel: | doc. RNDr. Pavel Pecina, Ph.D. |
Řešitel: | skrytý - zadáno a potvrzeno stud. odd. |
Datum přihlášení: | 04.08.2009 |
Datum zadání: | 04.08.2009 |
Datum a čas obhajoby: | 14.09.2009 00:00 |
Datum odevzdání elektronické podoby: | 14.09.2009 |
Datum proběhlé obhajoby: | 14.09.2009 |
Oponenti: | doc. Ing. Zdeněk Žabokrtský, Ph.D. |
Zásady pro vypracování |
Query expansion and text categorization is performed in an attempt to improve the volume of relevant documents retrieved. Expanding the query is very useful because it helps the user to provide a more accurate search query and it becomes easier to match the actual user needs and the documents indexed.
For this purpose a thesaurus (used as a map of semantic relations between words and phrases in the corpus) is built using Wikipedia data (or some other available data). The categories in the thesaurus are build analyzing Wikipedia articles and the semantic relations are extracted processing the links. The way these categories and links are extracted represent a gate to applied natural language processing techniques. In previous works [1], the thesaurus is derived for each particular document collection. They use articles as building blocks for the thesaurus and its links to determine the relation between the concepts. A measure of semantic relatedness is calculated between the concepts based on some analysis performed on the links. On the other hand in [2], concepts and their semantic relations are extracted based on the inherent structure of Wikipedia pages - instead the link analysis considered in [1]. However it would be interesting to compare the ontologies obtained using both methods and find a way to exploded their advantages. Another advanced feature that can be studied is the interactive query expansion, in which is considered the feedback of the user. |
Seznam odborné literatury |
[1] A Knowledge-Based Search Engine Powered by Wikipedia, David Milne Ian H. Witten David M. Nichols Department of Computer Science, University of Waikato
[2] Gabrilovich, E. and Markovitch, S. (2006) Overcoming the brittleness bottleneck using Wikipedia: Enhancing text categorization with encyclopedic knowledge. Proc. American Association for Artificial Intelligence. |