Thesis (Selection of subject)Thesis (Selection of subject)(version: 393)
Thesis details
   Login via CAS
Neural models for multilingual natural language understanding and generation
Thesis title in Czech: Vícejazyčné neuronové modely pro porozumění jazyku a generování textu
Thesis title in English: Neural models for multilingual natural language understanding and generation
Key words: porozumění přirozenému jazyku|sémantický parsing|generování přirozeného jazyka|zpracování přirozeného jazyka
English key words: natural language understanding|semantic parsing|natural language generation|natural language processing
Academic year of topic announcement: 2021/2022
Thesis type: diploma thesis
Thesis language: angličtina
Department: Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor: Mgr. et Mgr. Ondřej Dušek, Ph.D.
Author: hidden - assigned and confirmed by the Study Dept.
Date of registration: 19.09.2022
Date of assignment: 19.09.2022
Confirmed by Study dept. on: 29.09.2022
Date and time of defence: 10.09.2024 09:00
Date of electronic submission:19.07.2024
Date of submission of printed version:19.07.2024
Date of proceeded defence: 10.09.2024
Opponents: Mgr. Dušan Variš, Ph.D.
 
 
 
Guidelines
Recent work on natural language understanding and natural language generation has shown a lot of progress, based on pretrained language models (Lewis et al., 2020). In addition, it has been shown that these tasks often benefit from a joint solution (Tseng et al., 2020; Schmitt et al., 2020). The 2020 WebNLG shared task (Castro Ferreira et al., 2020), which provides a dataset of RDF triples and corresponding natural language descriptions, has been an important benchmark for these kinds of experiments. The state-of-the-art approaches still have significant problems, especially regarding the accuracy of both understanding and generation (Ji et al., 2022a; Ji et al., 2022b).

The aim of this thesis is to explore joint approaches to language understanding (i.e. RDF triple parsing) and language generation (verbalizing RDF triples as text), specifically focusing on improve their accuracy. The thesis will use additional training tasks, such as named entity recognition, to regularize the trained models. An important aspect will be to explore lower-resource settings and multilingual models, in order to ensure that the chosen methods generalize well beyond the WebNLG benchmark.
References
M. Lewis et al., “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, Jul. 2020, pp. 7871–7880. doi: 10.18653/v1/2020.acl-main.703.
B.-H. Tseng, J. Cheng, Y. Fang, and D. Vandyke, “A Generative Model for Joint Natural Language Understanding and Generation,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, Jul. 2020, pp. 1795–1807. doi: 10.18653/v1/2020.acl-main.163.
M. Schmitt, S. Sharifzadeh, V. Tresp, and H. Schütze, “An Unsupervised Joint System for Text Generation from Knowledge Graphs and Semantic Parsing,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, Nov. 2020, pp. 7117–7130. doi: 10.18653/v1/2020.emnlp-main.577.
T. Castro Ferreira et al., “The 2020 Bilingual, Bi-Directional WebNLG+ Shared Task: Overview and Evaluation Results (WebNLG+ 2020),” in Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+), Dublin, Ireland (Virtual), Dec. 2020, pp. 55–76. Accessed: Apr. 06, 2022. [Online]. Available: https://aclanthology.org/2020.webnlg-1.7
Z. Ji et al., “Survey of Hallucination in Natural Language Generation,” arXiv:2202.03629 [cs], Feb. 2022a, Accessed: Feb. 15, 2022. [Online]. Available: http://arxiv.org/abs/2202.03629
S. Ji, S. Pan, E. Cambria, P. Marttinen, and P. S. Yu, “A Survey on Knowledge Graphs: Representation, Acquisition, and Applications,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 2, pp. 494–514, Feb. 2022b, doi: 10.1109/TNNLS.2021.3070843.
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html