Unsupervised Open Information Extraction with Large Language Models
Thesis title in Czech: | Neomezená extrakce informací bez učitele pomocí velkých jazykových modelů |
---|---|
Thesis title in English: | Unsupervised Open Information Extraction with Large Language Models |
Key words: | hluboké učení|předtrénované jazykové modely|extrakce informací|strojové učení bez učitele |
English key words: | deep learning|pretrained language models|unsupervised machine learning|information extraction |
Academic year of topic announcement: | 2022/2023 |
Thesis type: | diploma thesis |
Thesis language: | angličtina |
Department: | Institute of Formal and Applied Linguistics (32-UFAL) |
Supervisor: | doc. RNDr. Pavel Pecina, Ph.D. |
Author: | hidden![]() |
Date of registration: | 24.03.2023 |
Date of assignment: | 24.03.2023 |
Confirmed by Study dept. on: | 31.03.2023 |
Date and time of defence: | 10.09.2024 09:00 |
Date of electronic submission: | 18.07.2024 |
Date of submission of printed version: | 18.07.2024 |
Date of proceeded defence: | 10.09.2024 |
Opponents: | RNDr. Martin Holub, Ph.D. |
Guidelines |
Open Information Extraction(OIE) is an NLP task which involves extracting the relationship between entities in textual corpora. Some methods of OIE involve using linguistic knowledge to extract relations between entities in an unsupervised manner. Recent studies have indicated that pre-trained Large Language Models (LLM’s) represent linguistic as well as relational information. Recognising this, the IELM benchmark (Wang et al., 2022) seeks to exploit the relational information that is stored in LLM’s to extract entities and
their relations by successfully converting an LLM into a zero-shot OIE system. The goal of this thesis is to improve OIE as outlined by IELM and following that, investigate how the use of linguistic constraints/knowledge prompting applied on the input controls the behaviour of the information extraction process. |
References |
Wang, Chenguang, Xiao Liu, and Dawn Song. "IELM: An Open Information Extraction Benchmark for Pre-Trained Language Models." arXiv preprint arXiv:2210.14128 (2022).
Wang, Chenguang, et al. "Zero-shot information extraction as a unified text-to-triple translation." arXiv preprint arXiv:2109.11171 (2021). |