Thesis (Selection of subject)

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Získávání komplexní informace z textových dokumentů

Thesis title in Czech:	Získávání komplexní informace z textových dokumentů
Thesis title in English:	Document-level information extraction
Key words:	extrakce informací\|zpracování přirozeného jazyka\|hluboké učení
English key words:	information extraction\|natural language processing\|deep learning
Academic year of topic announcement:	2023/2024
Thesis type:	dissertation
Thesis language:
Department:	Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor:	doc. RNDr. Pavel Pecina, Ph.D.
Author:	hidden - assigned and confirmed by the Study Dept.
Date of registration:	19.09.2023
Date of assignment:	19.09.2023
Confirmed by Study dept. on:	21.09.2023

Guidelines

Information extraction is the task of automatically extracting structured information from unstructured data, usually textual documents. The basic sub-tasks are mainly solved on the sentence level (e.g. named entity recognition, extraction of relations between the entities, and linking the entities to an ontology). More complex information is extracted on document level and includes, for instance, template filling which attempts to fill a fixed set of fields from an entire document. The thesis will explore document-level information extraction using deep-learning based models in multilingual and domain-specific settings.

References

Goodfellow, I., Y. Bengio, and A. Courville 2016. Deep learning. Cambridge, MA, USA: MIT press.

Du, Xinya, Alexander M. Rush, and Claire Cardie. "Template filling with generative transformers." Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021.