Thesis (Selection of subject)Thesis (Selection of subject)(version: 392)
Thesis details
   Login via CAS
Získávání komplexní informace z textových dokumentů
Thesis title in Czech: Získávání komplexní informace z textových dokumentů
Thesis title in English: Document-level information extraction
Key words: extrakce informací|zpracování přirozeného jazyka|hluboké učení
English key words: information extraction|natural language processing|deep learning
Academic year of topic announcement: 2023/2024
Thesis type: dissertation
Thesis language:
Department: Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor: doc. RNDr. Pavel Pecina, Ph.D.
Author: hidden - assigned and confirmed by the Study Dept.
Date of registration: 19.09.2023
Date of assignment: 19.09.2023
Confirmed by Study dept. on: 21.09.2023
Guidelines
Information extraction is the task of automatically extracting structured information from unstructured data, usually textual documents. The basic sub-tasks are mainly solved on the sentence level (e.g. named entity recognition, extraction of relations between the entities, and linking the entities to an ontology). More complex information is extracted on document level and includes, for instance, template filling which attempts to fill a fixed set of fields from an entire document. The thesis will explore document-level information extraction using deep-learning based models in multilingual and domain-specific settings.
References
Goodfellow, I., Y. Bengio, and A. Courville 2016. Deep learning. Cambridge, MA, USA: MIT press.

Du, Xinya, Alexander M. Rush, and Claire Cardie. "Template filling with generative transformers." Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021.
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html