Thesis (Selection of subject)

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Table-to-Text Generation via Logical Forms

Thesis title in Czech:	Generování textu z tabulek přes logické formy
Thesis title in English:	Table-to-Text Generation via Logical Forms
Key words:	generování přirozeného jazyka\|generování textu z tabulek\|uvažování\|plánování obsahu\|zpracování přirozeného jazyka
English key words:	natural language generation\|table to text generation\|reasoning\|content planning\|natural language processing
Academic year of topic announcement:	2022/2023
Thesis type:	diploma thesis
Thesis language:	angličtina
Department:	Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor:	Mgr. et Mgr. Ondřej Dušek, Ph.D.
Author:	hidden - assigned and confirmed by the Study Dept.
Date of registration:	12.03.2023
Date of assignment:	12.03.2023
Confirmed by Study dept. on:	20.03.2023
Date and time of defence:	06.02.2024 09:00
Date of electronic submission:	11.01.2024
Date of submission of printed version:	15.01.2024
Date of proceeded defence:	06.02.2024
Opponents:	RNDr. Jiří Hana, Ph.D.

Guidelines

Table-to-text generation is a sub-task of Natural Language Generation (NLG) that aims at generating text in natural language from structured tables, summarizing the main or most interesting pieces of information contained therein (Bao et al., 2018; Liu et al., 2018). While multiple works have been addressing this task in the past few years, mostly using pre-trained language models (Chen et al., 2020a; Liu et al., 2022), it still poses significant problems, mainly (1) pre-selecting the most important content (table cells) to be verbalized (2) making language models work consistently with numeric operations over the table, which are often needed for summarizing statements.

This thesis will address both problems by generating sentences with logical inference (Saha et al., 2022), given the whole table as input. It will investigate whether a pipeline-based approach using separate planning/reasoning and surface realization steps (Gatt & Krahmer, 2018) is superior to end-to-end language-model-based generation (Kale & Rastogi, 2020) in terms of accuracy/consistency. The pipeline approach will focus mostly on the planning/reasoning step and the possibility of using a formal semantic representation, while the surface realization will make maximum use of existing approaches. The end-to-end baseline will use standard language model fine-tuning.

Both approaches will be compared using automatic metrics as well as small-scale human evaluation or manual analysis. The project will work on the Logic2Text (Chen et al., 2020a) and/or LogicNLG (Chen et al., 2020b) datasets.

References

J. Bao et al., “Table-to-Text: Describing Table Region with Natural Language,” in Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, Feb. 2018, pp. 5020–5027.
W. Chen, J. Chen, Y. Su, Z. Chen, and W. Y. Wang, “Logical Natural Language Generation from Open-Domain Tables,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, Jul. 2020a, pp. 7929–7942.
Z. Chen et al., “Logic2Text: High-Fidelity Natural Language Generation from Logical Forms,” in Findings of the Association for Computational Linguistics: EMNLP 2020, Online, Nov. 2020b, pp. 2096–2111.
A. Gatt and E. Krahmer, “Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation,” Journal of Artificial Intelligence Research (JAIR), vol. 61, pp. 65–170, Jan. 2018.
M. Kale and A. Rastogi, “Text-to-Text Pre-Training for Data-to-Text Tasks,” in Proceedings of the 13th International Conference on Natural Language Generation, Dublin, Ireland, Dec. 2020, pp. 97–102.
A. Liu, H. Dong, N. Okazaki, S. Han, and D. Zhang, “PLOG: Table-to-Logic Pretraining for Logical Table-to-Text Generation,” in EMNLP, Abu Dhabi, UAE, Dec. 2022.
T. Liu, K. Wang, L. Sha, B. Chang, and Z. Sui, “Table-to-text Generation by Structure-aware Seq2seq Learning,” in AAAI 2018, New Orleans, LA, USA, Feb. 2018.
S. Saha, X. V. Yu, M. Bansal, R. Pasunuru, and A. Celikyilmaz, “MURMUR: Modular Multi-Step Reasoning for Semi-Structured Data-to-Text Generation.” arXiv, Dec. 16, 2022.