Thesis (Selection of subject)Thesis (Selection of subject)(version: 368)
Thesis details
   Login via CAS
Entity Relationship Extraction
Thesis title in Czech: Extrakce vztahů mezi entitami
Thesis title in English: Entity Relationship Extraction
Key words: entity, pojmenované entity, vztahy mezi entitami, extrakce vztahů mezi entitami, čeština, BERT
English key words: entities, named entities, entity relationship, entity relationship extraction, Czech, BERT
Academic year of topic announcement: 2019/2020
Thesis type: diploma thesis
Thesis language: angličtina
Department: Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor: RNDr. Milan Straka, Ph.D.
Author: hidden - assigned and confirmed by the Study Dept.
Date of registration: 23.02.2020
Date of assignment: 27.02.2020
Confirmed by Study dept. on: 12.03.2020
Date and time of defence: 14.09.2020 09:00
Date of electronic submission:30.07.2020
Date of submission of printed version:30.07.2020
Date of proceeded defence: 14.09.2020
Opponents: Mgr. Pavel Straňák, Ph.D.
 
 
 
Guidelines
Initial text processing usually stars with segmentation and tokenization, followed by morphological and syntactic analysis. Then, named entities are recognized and possibly linked to a knowledge base. Finally, relations between entities are extracted, to form a semantic graph representation of the given text. While the entity relations can be sometimes found in the knowledge base, the relationship extractions should work also in cases when the relationship or the entities are not present in the knowledge base.

The goal of this thesis is to design and implement relationship extraction for Czech language. Given the lack of supervised data, first part of the thesis is to create a distance-supervised dataset using existing knowledge bases. The second part is the design and implementation of a relationship extraction model. Apart from using it on Czech, it should be evaluated also on some well-known English dataset.
References
- Sebastian Riedel, Limin Yao, and Andrew McCallum: Modeling Relations and Their Mentions without Labeled Text (creation of NYT dataset using distant supervision)

- http://nlpprogress.com/english/relationship_extraction.html (overview of English well-known datasets and current best models)
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html