Entity Relationship Extraction
Thesis title in Czech: | Extrakce vztahů mezi entitami |
---|---|
Thesis title in English: | Entity Relationship Extraction |
Key words: | entity, pojmenované entity, vztahy mezi entitami, extrakce vztahů mezi entitami, čeština, BERT |
English key words: | entities, named entities, entity relationship, entity relationship extraction, Czech, BERT |
Academic year of topic announcement: | 2019/2020 |
Thesis type: | diploma thesis |
Thesis language: | angličtina |
Department: | Institute of Formal and Applied Linguistics (32-UFAL) |
Supervisor: | RNDr. Milan Straka, Ph.D. |
Author: | hidden - assigned and confirmed by the Study Dept. |
Date of registration: | 23.02.2020 |
Date of assignment: | 27.02.2020 |
Confirmed by Study dept. on: | 12.03.2020 |
Date and time of defence: | 14.09.2020 09:00 |
Date of electronic submission: | 30.07.2020 |
Date of submission of printed version: | 30.07.2020 |
Date of proceeded defence: | 14.09.2020 |
Opponents: | Mgr. Pavel Straňák, Ph.D. |
Guidelines |
Initial text processing usually stars with segmentation and tokenization, followed by morphological and syntactic analysis. Then, named entities are recognized and possibly linked to a knowledge base. Finally, relations between entities are extracted, to form a semantic graph representation of the given text. While the entity relations can be sometimes found in the knowledge base, the relationship extractions should work also in cases when the relationship or the entities are not present in the knowledge base.
The goal of this thesis is to design and implement relationship extraction for Czech language. Given the lack of supervised data, first part of the thesis is to create a distance-supervised dataset using existing knowledge bases. The second part is the design and implementation of a relationship extraction model. Apart from using it on Czech, it should be evaluated also on some well-known English dataset. |
References |
- Sebastian Riedel, Limin Yao, and Andrew McCallum: Modeling Relations and Their Mentions without Labeled Text (creation of NYT dataset using distant supervision)
- http://nlpprogress.com/english/relationship_extraction.html (overview of English well-known datasets and current best models) |