Multilingual Entity Linking Using Dense Retrieval
Thesis title in Czech: | Vícejazyčné propojování entit pomocí vektorového vyhledávání |
---|---|
Thesis title in English: | Multilingual Entity Linking Using Dense Retrieval |
Key words: | propojování entit|vektorové vyhledávání|vícejazyčné propojování entit|bi-enkóder |
English key words: | entity linking|dense retrieval|entity disambiguation|multilingual entity linking|bi-encoder |
Academic year of topic announcement: | 2023/2024 |
Thesis type: | Bachelor's thesis |
Thesis language: | angličtina |
Department: | Institute of Formal and Applied Linguistics (32-UFAL) |
Supervisor: | RNDr. Milan Straka, Ph.D. |
Author: | Bc. Dominik Farhan - assigned and confirmed by the Study Dept. |
Date of registration: | 24.01.2024 |
Date of assignment: | 24.01.2024 |
Confirmed by Study dept. on: | 26.01.2024 |
Date and time of defence: | 28.06.2024 09:00 |
Date of electronic submission: | 09.05.2024 |
Date of submission of printed version: | 09.05.2024 |
Date of proceeded defence: | 28.06.2024 |
Opponents: | doc. RNDr. Ondřej Bojar, Ph.D. |
Guidelines |
The goal of the work is to implement and evaluate entity linking model using neural-network-based bi-encoder dense retrieval approach. WikiData should be used as a knowledge base, and contrary to prior work by large commercial companies, publicly available training data should be used. The evaluation should be performed in several languages, for example using the Mewsli-9 dataset. |
References |
- Learning Dense Representations for Entity Retrieval. Daniel Gillick, Sayali Kulkarni, Larry Lansing, Alessandro Presta, Jason Baldridge, Eugene Ie, Diego Garcia-Olano. https://aclanthology.org/K19-1049/
- Entity Linking in 100 Languages. Jan A. Botha, Zifei Shan, Daniel Gillick. https://aclanthology.org/2020.emnlp-main.630/ - MOLEMAN: Mention-Only Linking of Entities with a Mention Annotation Network. Nicholas FitzGerald, Dan Bikel, Jan Botha, Daniel Gillick, Tom Kwiatkowski, Andrew McCallum. https://aclanthology.org/2021.acl-short.37/ - DaMuEL 1.0: A Large Multilingual Dataset for Entity Linking. LINDAT/CLARIAH-CZ digital library. http://hdl.handle.net/11234/1-5047 |