Neural models for multilingual natural language understanding and generation
| Thesis title in Czech: | Vícejazyčné neuronové modely pro porozumění jazyku a generování textu |
|---|---|
| Thesis title in English: | Neural models for multilingual natural language understanding and generation |
| Key words: | porozumění přirozenému jazyku|sémantický parsing|generování přirozeného jazyka|zpracování přirozeného jazyka |
| English key words: | natural language understanding|semantic parsing|natural language generation|natural language processing |
| Academic year of topic announcement: | 2021/2022 |
| Thesis type: | diploma thesis |
| Thesis language: | angličtina |
| Department: | Institute of Formal and Applied Linguistics (32-UFAL) |
| Supervisor: | Mgr. et Mgr. Ondřej Dušek, Ph.D. |
| Author: | hidden - assigned and confirmed by the Study Dept. |
| Date of registration: | 19.09.2022 |
| Date of assignment: | 19.09.2022 |
| Confirmed by Study dept. on: | 29.09.2022 |
| Date and time of defence: | 10.09.2024 09:00 |
| Date of electronic submission: | 19.07.2024 |
| Date of submission of printed version: | 19.07.2024 |
| Date of proceeded defence: | 10.09.2024 |
| Opponents: | Mgr. Dušan Variš, Ph.D. |
| Guidelines |
| Recent work on natural language understanding and natural language generation has shown a lot of progress, based on pretrained language models (Lewis et al., 2020). In addition, it has been shown that these tasks often benefit from a joint solution (Tseng et al., 2020; Schmitt et al., 2020). The 2020 WebNLG shared task (Castro Ferreira et al., 2020), which provides a dataset of RDF triples and corresponding natural language descriptions, has been an important benchmark for these kinds of experiments. The state-of-the-art approaches still have significant problems, especially regarding the accuracy of both understanding and generation (Ji et al., 2022a; Ji et al., 2022b).
The aim of this thesis is to explore joint approaches to language understanding (i.e. RDF triple parsing) and language generation (verbalizing RDF triples as text), specifically focusing on improve their accuracy. The thesis will use additional training tasks, such as named entity recognition, to regularize the trained models. An important aspect will be to explore lower-resource settings and multilingual models, in order to ensure that the chosen methods generalize well beyond the WebNLG benchmark. |
| References |
| M. Lewis et al., “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, Jul. 2020, pp. 7871–7880. doi: 10.18653/v1/2020.acl-main.703.
B.-H. Tseng, J. Cheng, Y. Fang, and D. Vandyke, “A Generative Model for Joint Natural Language Understanding and Generation,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, Jul. 2020, pp. 1795–1807. doi: 10.18653/v1/2020.acl-main.163. M. Schmitt, S. Sharifzadeh, V. Tresp, and H. Schütze, “An Unsupervised Joint System for Text Generation from Knowledge Graphs and Semantic Parsing,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, Nov. 2020, pp. 7117–7130. doi: 10.18653/v1/2020.emnlp-main.577. T. Castro Ferreira et al., “The 2020 Bilingual, Bi-Directional WebNLG+ Shared Task: Overview and Evaluation Results (WebNLG+ 2020),” in Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+), Dublin, Ireland (Virtual), Dec. 2020, pp. 55–76. Accessed: Apr. 06, 2022. [Online]. Available: https://aclanthology.org/2020.webnlg-1.7 Z. Ji et al., “Survey of Hallucination in Natural Language Generation,” arXiv:2202.03629 [cs], Feb. 2022a, Accessed: Feb. 15, 2022. [Online]. Available: http://arxiv.org/abs/2202.03629 S. Ji, S. Pan, E. Cambria, P. Marttinen, and P. S. Yu, “A Survey on Knowledge Graphs: Representation, Acquisition, and Applications,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 2, pp. 494–514, Feb. 2022b, doi: 10.1109/TNNLS.2021.3070843. |
- assigned and confirmed by the Study Dept.