Domain Specific Information Extraction for Semantic Annotation
Název práce v češtině: | |
---|---|
Název v anglickém jazyce: | |
Akademický rok vypsání: | 2008/2009 |
Typ práce: | diplomová práce |
Jazyk práce: | angličtina |
Ústav: | Ústav formální a aplikované lingvistiky (32-UFAL) |
Vedoucí / školitel: | RNDr. Martin Holub, Ph.D. |
Řešitel: | skrytý - zadáno a potvrzeno stud. odd. |
Datum přihlášení: | 14.07.2009 |
Datum zadání: | 14.07.2009 |
Datum a čas obhajoby: | 01.02.2010 00:00 |
Datum odevzdání elektronické podoby: | 01.02.2010 |
Datum proběhlé obhajoby: | 01.02.2010 |
Oponenti: | Mgr. Jan Dědek, Ph.D. |
Zásady pro vypracování |
Semantic annotation of natural language texts provides additional information in the form of metadata, which is domain specific. The particular domain used for experiments will be represented by an ontology describing all relevant concepts and their relationships. For this purpose the framework of the Formal Concept Analysis will be used.
Free texts in a given domain should be automatically analyzed using advanced methods of linguistic preprocessing. Automatic methods for information extraction from free sentences is the core method for both the ontology building and the automatic semantic annotation procedure. The goal of the thesis is to evaluate and compare the precision of different approaches to information extraction based either on regular expression matching or on automatic analysis of dependency syntax. |
Seznam odborné literatury |
Daniel Jurafsky and James H. Martin: SPEECH and LANGUAGE PROCESSING. An Introduction to Natural Language Processing,
Computational Linguistics, and Speech Recognition. Prentice Hall, 2009. Cinková, Silvie, Josef Toman, Jan Hajič, Kristýna Čermáková, Václav Klimeš, Lucie Mladová, Jana Šindlerová, Kristýna Tomšů, Zdeněk Žabokrtský. Tectogrammatical Annotation of the Wall Street Journal. To appear in Prague Bulletin of Mathematical Linguistics. Thomas R. Gruber. Toward principles for the design of ontologies used for knowledge sharing. In Formal Ontology in Conceptual Analysis and Knowledge Representation, Kluwer, 1993. Bernhard Ganter and Rudolf Wille. Formal Concept Analysis: Mathematical Foundations. Springer, Berlin, 1999. |
Předběžná náplň práce v anglickém jazyce |
The goal of the thesis is to evaluate and compare the precision of different approaches to information extraction used for semantic annotation in a specific domain. |