Practical neural dialogue management using pretrained language models
Thesis title in Czech: | Praktický neuronový dialogový manažer s použitím předtrénovaných jazykových modelů |
---|---|
Thesis title in English: | Practical neural dialogue management using pretrained language models |
Key words: | dialogové systémy|předtrénované jazykové modely|zpracování přirozeného jazyka|dialogový manažer |
English key words: | dialogue systems|pretrained language models|natural language processing|dialogue management |
Academic year of topic announcement: | 2021/2022 |
Thesis type: | diploma thesis |
Thesis language: | angličtina |
Department: | Institute of Formal and Applied Linguistics (32-UFAL) |
Supervisor: | Mgr. et Mgr. Ondřej Dušek, Ph.D. |
Author: | hidden - assigned and confirmed by the Study Dept. |
Date of registration: | 19.09.2022 |
Date of assignment: | 19.09.2022 |
Confirmed by Study dept. on: | 27.09.2022 |
Date and time of defence: | 05.09.2023 09:00 |
Date of electronic submission: | 20.07.2023 |
Date of submission of printed version: | 24.07.2023 |
Date of proceeded defence: | 05.09.2023 |
Opponents: | doc. RNDr. Ondřej Bojar, Ph.D. |
Guidelines |
While a lot of research in dialogue systems nowadays is dedicated to end-to-end neural models (Lin et al., 2020; Peng et al., 2021), this approach is notorious for requiring large amounts of annotated data, which is costly to obtain, and neural generative models are generally unsafe to use in practical applications due to their tendency to hallucinate/produce ungrounded outputs (Ji et al., 2022). Dialogue systems for practical applications thus remain composed of multiple separate modules (language understanding, state tracking, dialogue policy, language generation). Hybrid Code Networks (HCN; Williams et al., 2017) are a neural data-driven architecture combining all modules apart from language generation and allowing to train on limited data, but it does not take advantage of recent developments in the field, i.e. pretrained language models (Radford et al., 2019; Lewis et al., 2020).
The goal of this thesis is to explore HCN-based or similar neural architectures for practical dialogue modeling while making use of pretrained language models. The implemented architecture will be tested on the language understanding – state tracking – dialogue policy combination, and it will be evaluated in a limited data setting. |
References |
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” Minneapolis, MN, USA, Jun. 2019. https://www.aclweb.org/anthology/N19-1423
Z. Ji et al., “Survey of Hallucination in Natural Language Generation,” arXiv:2202.03629 [cs], Feb. 2022. http://arxiv.org/abs/2202.03629 M. Lewis et al., “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, Jul. 2020, pp. 7871–7880. doi: 10.18653/v1/2020.acl-main.703. Z. Lin, A. Madotto, G. I. Winata, and P. Fung, “MinTL: Minimalist Transfer Learning for Task-Oriented Dialogue Systems,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, Nov. 2020, pp. 3391–3405. doi: 10.18653/v1/2020.emnlp-main.273. B. Peng, C. Li, J. Li, S. Shayandeh, L. Liden, and J. Gao, “Soloist: Building Task Bots at Scale with Transfer Learning and Machine Teaching,” Transactions of the Association for Computational Linguistics, vol. 9, pp. 807–824, Aug. 2021, doi: 10.1162/tacl_a_00399. A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language Models are Unsupervised Multitask Learners,” OpenAI, Feb. 2019. https://openai.com/blog/better-language-models/ J. D. Williams, K. Asadi, and G. Zweig, “Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning,” Vancouver, Canada, Feb. 2017. https://aclanthology.org/P17-1062/ |