Thesis (Selection of subject)

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Unsupervised Open Information Extraction with Large Language Models

Thesis title in Czech:	Neomezená extrakce informací bez učitele pomocí velkých jazykových modelů
Thesis title in English:	Unsupervised Open Information Extraction with Large Language Models
Key words:	hluboké učení\|předtrénované jazykové modely\|extrakce informací\|strojové učení bez učitele
English key words:	deep learning\|pretrained language models\|unsupervised machine learning\|information extraction
Academic year of topic announcement:	2022/2023
Thesis type:	diploma thesis
Thesis language:	angličtina
Department:	Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor:	doc. RNDr. Pavel Pecina, Ph.D.
Author:	hidden - assigned and confirmed by the Study Dept.
Date of registration:	24.03.2023
Date of assignment:	24.03.2023
Confirmed by Study dept. on:	31.03.2023
Date and time of defence:	10.09.2024 09:00
Date of electronic submission:	18.07.2024
Date of submission of printed version:	18.07.2024
Date of proceeded defence:	10.09.2024
Opponents:	RNDr. Martin Holub, Ph.D.

Guidelines

Open Information Extraction(OIE) is an NLP task which involves extracting the relationship between entities in textual corpora. Some methods of OIE involve using linguistic knowledge to extract relations between entities in an unsupervised manner. Recent studies have indicated that pre-trained Large Language Models (LLM’s) represent linguistic as well as relational information. Recognising this, the IELM benchmark (Wang et al., 2022) seeks to exploit the relational information that is stored in LLM’s to extract entities and
their relations by successfully converting an LLM into a zero-shot OIE system. The goal of this thesis is to improve OIE as outlined by IELM and following that, investigate how
the use of linguistic constraints/knowledge prompting applied on the input controls the behaviour of the information extraction process.

References

Wang, Chenguang, Xiao Liu, and Dawn Song. "IELM: An Open Information Extraction Benchmark for Pre-Trained Language Models." arXiv preprint arXiv:2210.14128 (2022).
Wang, Chenguang, et al. "Zero-shot information extraction as a unified text-to-triple translation." arXiv preprint arXiv:2109.11171 (2021).