Thesis (Selection of subject)Thesis (Selection of subject)(version: 368)
Thesis details
   Login via CAS
Neuronové generování textu z pojmů se znalostními grafy
Thesis title in Czech: Neuronové generování textu z pojmů se znalostními grafy
Thesis title in English: Neural Concept-to-text Generation with Knowledge Graphs
Key words: generování přirozeného jazyka|generování textu z pojmů|znalostní graf|zpracování přirozeného jazyka
English key words: natural language generation|concept to text generation|knowledge graph|natural language processing
Academic year of topic announcement: 2022/2023
Thesis type: diploma thesis
Thesis language: čeština
Department: Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor: Mgr. et Mgr. Ondřej Dušek, Ph.D.
Author: hidden - assigned and confirmed by the Study Dept.
Date of registration: 16.03.2023
Date of assignment: 16.03.2023
Confirmed by Study dept. on: 22.03.2023
Date and time of defence: 06.09.2023 09:00
Date of electronic submission:20.07.2023
Date of submission of printed version:24.07.2023
Date of proceeded defence: 06.09.2023
Opponents: Mgr. Jindřich Libovický, Ph.D.
 
 
 
Guidelines
Commonsense reasoning is an integral part of the task of natural language generation (NLG; Gatt & Krahmer, 2018; Yu et al., 2022). Failure to reproduce commonsense relations often leads to unrealistic outputs in NLG systems. The CommonGen benchmark (Lin et al., 2020) offers a straightforward way of evaluating commonsense capabilities of various NLG models, providing the task of concept-to-text generation. Here, small sets of concepts are described by sentences based on commonsense relations (e.g., {apple, tree, pick} → “a boy is picking an apple from a tree"). Pretrained language models, which are the state-of-the-art in current NLG research (e.g. Kale & Rastogi, 2020), score relatively well on this dataset, but often fail to pick up the correct commonsense relations.

The aim of this thesis is thus to explore the possibility of enriching pretrained language models with commonsense knowledge. The relevant knowledge may be retrieved from knowledge graphs, such as ATOMIC or ConceptNet (Bauer & Bansal, 2021; Hwang et al., 2020; Speer et al., 2017; Ji et al., 2022). The thesis will experiment with various ways of adding knowledge and/or other approaches to maintaining generation consistency (e.g. Feng et al., 2021; Wang et al., 2021).

The resulting approach will be evaluated on CommonGen and compared to baseline language model generation, using both automatic metrics and small-scale human evaluation and/or manual error analysis.
References
L. Bauer and M. Bansal, “Identify, Align, and Integrate: Matching Knowledge Graphs to Commonsense Reasoning Tasks,” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Online, Apr. 2021. https://aclanthology.org/2021.eacl-main.192
S. Y. Feng, J. Huynh, C. P. Narisetty, E. Hovy, and V. Gangal, “SAPPHIRE: Approaches for Enhanced Concept-to-Text Generation,” in Proceedings of the 14th International Conference on Natural Language Generation, Aberdeen, Scotland, UK, Aug. 2021. https://aclanthology.org/2021.inlg-1.21
A. Gatt and E. Krahmer, “Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation,” Journal of Artificial Intelligence Research (JAIR), vol. 61, pp. 65–170, Jan. 2018. http://arxiv.org/abs/1703.09902
J. D. Hwang et al., “(Comet-) Atomic 2020: On Symbolic and Neural Commonsense Knowledge Graphs,” in Proceedings of the AAAI Conference on Artificial Intelligence, May 2021, vol. 35, pp. 6384–6392. https://ojs.aaai.org/index.php/AAAI/article/view/16792
S. Ji, S. Pan, E. Cambria, P. Marttinen, and P. S. Yu, “A Survey on Knowledge Graphs: Representation, Acquisition, and Applications,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 2, pp. 494–514, Feb. 2022. https://arxiv.org/abs/2002.00388
M. Kale and A. Rastogi, “Text-to-Text Pre-Training for Data-to-Text Tasks,” in Proceedings of the 13th International Conference on Natural Language Generation, Dublin, Ireland, Dec. 2020, pp. 97–102. https://www.aclweb.org/anthology/2020.inlg-1.14
B. Y. Lin et al., “CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning,” in Findings of the Association for Computational Linguistics: EMNLP 2020, Online, Nov. 2020. https://aclanthology.org/2020.findings-emnlp.165
Y. Liu, Y. Wan, L. He, H. Peng, and P. S. Yu, “KG-BART: Knowledge Graph-Augmented BART for Generative Commonsense Reasoning,” in Proceedings of the AAAI Conference on Artificial Intelligence, May 2021, vol. 35, pp. 6418–6425. https://ojs.aaai.org/index.php/AAAI/article/view/16796
R. Speer, J. Chin, and C. Havasi, “ConceptNet 5.5: An Open Multilingual Graph of General Knowledge,” in AAAI, San Francisco, CA, USA, Feb. 2017. https://arxiv.org/abs/1612.03975v2
Y. Wang, I. Wood, S. Wan, M. Dras, and M. Johnson, “Mention Flags (MF): Constraining Transformer-based Text Generators,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, Aug. 2021. https://aclanthology.org/2021.acl-long.9
W. Yu et al., “A Survey of Knowledge-enhanced Text Generation,” ACM Comput. Surv., vol. 54, no. 11s, pp. 1–38, Jan. 2022. http://arxiv.org/abs/2010.04389
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html