Thesis (Selection of subject)Thesis (Selection of subject)(version: 368)
Thesis details
   Login via CAS
Domain Specific Information Extraction for Semantic Annotation
Thesis title in Czech:
Thesis title in English:
Academic year of topic announcement: 2008/2009
Thesis type: diploma thesis
Thesis language: angličtina
Department: Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor: RNDr. Martin Holub, Ph.D.
Author: hidden - assigned and confirmed by the Study Dept.
Date of registration: 14.07.2009
Date of assignment: 14.07.2009
Date and time of defence: 01.02.2010 00:00
Date of electronic submission:01.02.2010
Date of proceeded defence: 01.02.2010
Opponents: Mgr. Jan Dědek, Ph.D.
 
 
 
Guidelines
Semantic annotation of natural language texts provides additional information in the form of metadata, which is domain specific. The particular domain used for experiments will be represented by an ontology describing all relevant concepts and their relationships. For this purpose the framework of the Formal Concept Analysis will be used.

Free texts in a given domain should be automatically analyzed using advanced methods of linguistic preprocessing. Automatic methods for information extraction from free sentences is the core method for both the ontology building and the automatic semantic annotation procedure. The goal of the thesis is to evaluate and compare the precision of different approaches to information extraction based either on regular expression matching or on automatic analysis of dependency syntax.
References
Daniel Jurafsky and James H. Martin: SPEECH and LANGUAGE PROCESSING. An Introduction to Natural Language Processing,
Computational Linguistics, and Speech Recognition. Prentice Hall, 2009.

Cinková, Silvie, Josef Toman, Jan Hajič, Kristýna Čermáková, Václav Klimeš, Lucie Mladová, Jana Šindlerová, Kristýna Tomšů, Zdeněk Žabokrtský. Tectogrammatical Annotation of the Wall Street Journal. To appear in Prague Bulletin of Mathematical Linguistics.

Thomas R. Gruber. Toward principles for the design of ontologies used for knowledge sharing.
In Formal Ontology in Conceptual Analysis and Knowledge Representation, Kluwer, 1993.

Bernhard Ganter and Rudolf Wille. Formal Concept Analysis: Mathematical Foundations. Springer, Berlin, 1999.

Preliminary scope of work in English
The goal of the thesis is to evaluate and compare the precision of different approaches to information extraction used for semantic annotation in a specific domain.
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html