Témata prací (Výběr práce)Témata prací (Výběr práce)(verze: 392)
Detail práce
   Přihlásit přes CAS
Neural models for multilingual natural language understanding and generation
Název práce v češtině: Vícejazyčné neuronové modely pro porozumění jazyku a generování textu
Název v anglickém jazyce: Neural models for multilingual natural language understanding and generation
Klíčová slova: porozumění přirozenému jazyku|sémantický parsing|generování přirozeného jazyka|zpracování přirozeného jazyka
Klíčová slova anglicky: natural language understanding|semantic parsing|natural language generation|natural language processing
Akademický rok vypsání: 2021/2022
Typ práce: diplomová práce
Jazyk práce: angličtina
Ústav: Ústav formální a aplikované lingvistiky (32-UFAL)
Vedoucí / školitel: Mgr. et Mgr. Ondřej Dušek, Ph.D.
Řešitel: skrytý - zadáno a potvrzeno stud. odd.
Datum přihlášení: 19.09.2022
Datum zadání: 19.09.2022
Datum potvrzení stud. oddělením: 29.09.2022
Datum a čas obhajoby: 10.09.2024 09:00
Datum odevzdání elektronické podoby:19.07.2024
Datum odevzdání tištěné podoby:19.07.2024
Datum proběhlé obhajoby: 10.09.2024
Oponenti: Mgr. Dušan Variš, Ph.D.
 
 
 
Zásady pro vypracování
Recent work on natural language understanding and natural language generation has shown a lot of progress, based on pretrained language models (Lewis et al., 2020). In addition, it has been shown that these tasks often benefit from a joint solution (Tseng et al., 2020; Schmitt et al., 2020). The 2020 WebNLG shared task (Castro Ferreira et al., 2020), which provides a dataset of RDF triples and corresponding natural language descriptions, has been an important benchmark for these kinds of experiments. The state-of-the-art approaches still have significant problems, especially regarding the accuracy of both understanding and generation (Ji et al., 2022a; Ji et al., 2022b).

The aim of this thesis is to explore joint approaches to language understanding (i.e. RDF triple parsing) and language generation (verbalizing RDF triples as text), specifically focusing on improve their accuracy. The thesis will use additional training tasks, such as named entity recognition, to regularize the trained models. An important aspect will be to explore lower-resource settings and multilingual models, in order to ensure that the chosen methods generalize well beyond the WebNLG benchmark.
Seznam odborné literatury
M. Lewis et al., “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, Jul. 2020, pp. 7871–7880. doi: 10.18653/v1/2020.acl-main.703.
B.-H. Tseng, J. Cheng, Y. Fang, and D. Vandyke, “A Generative Model for Joint Natural Language Understanding and Generation,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, Jul. 2020, pp. 1795–1807. doi: 10.18653/v1/2020.acl-main.163.
M. Schmitt, S. Sharifzadeh, V. Tresp, and H. Schütze, “An Unsupervised Joint System for Text Generation from Knowledge Graphs and Semantic Parsing,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, Nov. 2020, pp. 7117–7130. doi: 10.18653/v1/2020.emnlp-main.577.
T. Castro Ferreira et al., “The 2020 Bilingual, Bi-Directional WebNLG+ Shared Task: Overview and Evaluation Results (WebNLG+ 2020),” in Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+), Dublin, Ireland (Virtual), Dec. 2020, pp. 55–76. Accessed: Apr. 06, 2022. [Online]. Available: https://aclanthology.org/2020.webnlg-1.7
Z. Ji et al., “Survey of Hallucination in Natural Language Generation,” arXiv:2202.03629 [cs], Feb. 2022a, Accessed: Feb. 15, 2022. [Online]. Available: http://arxiv.org/abs/2202.03629
S. Ji, S. Pan, E. Cambria, P. Marttinen, and P. S. Yu, “A Survey on Knowledge Graphs: Representation, Acquisition, and Applications,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 2, pp. 494–514, Feb. 2022b, doi: 10.1109/TNNLS.2021.3070843.
 
Univerzita Karlova | Informační systém UK