Thesis (Selection of subject)Thesis (Selection of subject)(version: 372)
Thesis details
   Login via CAS
Multilingual Entity Linking Using Dense Retrieval
Thesis title in Czech: Vícejazyčné propojování entit pomocí vektorového vyhledávání
Thesis title in English: Multilingual Entity Linking Using Dense Retrieval
Key words: propojování entit|vektorové vyhledávání|vícejazyčné propojování entit|bi-enkóder
English key words: entity linking|dense retrieval|entity disambiguation|multilingual entity linking|bi-encoder
Academic year of topic announcement: 2023/2024
Thesis type: Bachelor's thesis
Thesis language: angličtina
Department: Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor: RNDr. Milan Straka, Ph.D.
Author: Bc. Dominik Farhan - assigned and confirmed by the Study Dept.
Date of registration: 24.01.2024
Date of assignment: 24.01.2024
Confirmed by Study dept. on: 26.01.2024
Date and time of defence: 28.06.2024 09:00
Date of electronic submission:09.05.2024
Date of submission of printed version:09.05.2024
Date of proceeded defence: 28.06.2024
Opponents: doc. RNDr. Ondřej Bojar, Ph.D.
 
 
 
Guidelines
The goal of the work is to implement and evaluate entity linking model using neural-network-based bi-encoder dense retrieval approach. WikiData should be used as a knowledge base, and contrary to prior work by large commercial companies, publicly available training data should be used. The evaluation should be performed in several languages, for example using the Mewsli-9 dataset.
References
- Learning Dense Representations for Entity Retrieval. Daniel Gillick, Sayali Kulkarni, Larry Lansing, Alessandro Presta, Jason Baldridge, Eugene Ie, Diego Garcia-Olano. https://aclanthology.org/K19-1049/
- Entity Linking in 100 Languages. Jan A. Botha, Zifei Shan, Daniel Gillick. https://aclanthology.org/2020.emnlp-main.630/
- MOLEMAN: Mention-Only Linking of Entities with a Mention Annotation Network. Nicholas FitzGerald, Dan Bikel, Jan Botha, Daniel Gillick, Tom Kwiatkowski, Andrew McCallum. https://aclanthology.org/2021.acl-short.37/
- DaMuEL 1.0: A Large Multilingual Dataset for Entity Linking. LINDAT/CLARIAH-CZ digital library. http://hdl.handle.net/11234/1-5047
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html