Data-to-Text Generation with Neural Language Models
Název práce v češtině: | Generování textu z dat s neuronovými jazykovými modely |
Název v anglickém jazyce: | Data-to-Text Generation with Neural Language Models |
Klíčová slova: | generování textu z dat|generování přirozeného jazyka|zpracování přirozeného jazyka|architektura transformer|předtrénované jazykové modely|velké jazykové modely |
Klíčová slova anglicky: | data-to-text generation|natural language generation|natural language processing|transformer architecture|pretrained language models|large language models |
Akademický rok vypsání: | 2018/2019 |
Typ práce: | disertační práce |
Jazyk práce: | angličtina |
Ústav: | Ústav formální a aplikované lingvistiky (32-UFAL) |
Vedoucí / školitel: | Mgr. et Mgr. Ondřej Dušek, Ph.D. |
Řešitel: | skrytý![]() |
Datum přihlášení: | 29.08.2019 |
Datum zadání: | 29.08.2019 |
Datum potvrzení stud. oddělením: | 04.10.2019 |
Datum a čas obhajoby: | 05.09.2024 09:30 |
Datum odevzdání elektronické podoby: | 16.06.2024 |
Datum odevzdání tištěné podoby: | 18.06.2024 |
Datum proběhlé obhajoby: | 05.09.2024 |
Oponenti: | Dr. Yaji Sripada |
prof. Dr. Emiel Krahmer | |
Zásady pro vypracování |
Current statistical natural language generation (NLG) systems require significant amounts of in-domain training data. While there are a few solutions for domain adaptation, their scope is limited – they require very similar domains and use the rather crude technique of delexicalization (Wen et al., 2016; Tran & Nguyen, 2018) or complex and detailed input representations (Dethlefs, 2017). This project will explore using large amounts of unannotated data to improve domain adaptation in NLG systems – selecting matching data based on limited in-domain data and using them to improve model performance. It will test the suitability of using general-domain implicit semantic representations (embeddings; e.g. Peters et al., 2018, Devlin et al., 2018) for the task. The project will also explore how a single model can retain previously learned domains while adapting to new ones. |
Seznam odborné literatury |
Dethlefs, Nina. “Domain Transfer for Deep Natural Language Generation from Abstract Meaning Representations.” IEEE Computational Intelligence Magazine, July 2017, 18–28.
Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” ArXiv:1810.04805 [Cs], October 10, 2018. Dušek, Ondřej, Jekaterina Novikova, and Verena Rieser. “Evaluating the State-of-the-Art of End-to-End Natural Language Generation: The E2E NLG Challenge.” ArXiv:1901.07931 [Cs], January 23, 2019. Freitag, Markus, and Scott Roy. “Unsupervised Natural Language Generation with Denoising Autoencoders.” In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 3922–3929. Brussels, Belgium: Association for Computational Linguistics, 2018. Gatt, Albert, and Emiel Krahmer. “Survey of the State of the Art in Natural Language Generation: Core Tasks, Applications and Evaluation.” Journal of Artificial Intelligence Research (JAIR) 61 (January 2018): 65–170. Peters, Matthew E., Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. “Deep Contextualized Word Representations.” In NAACL. New Orleans, LA, USA, 2018. Tran, Van-Khanh, and Le-Minh Nguyen. “Adversarial Domain Adaptation for Variational Neural Language Generation in Dialogue Systems.” In COLING. Santa Fe, NM, USA, 2018. Wen, Tsung-Hsien, Milica Gasic, Nikola Mrksic, Lina M. Rojas-Barahona, Pei-Hao Su, David Vandyke, and Steve Young. “Multi-Domain Neural Network Language Generation for Spoken Dialogue Systems.” In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 120–29. San Diego, CA, USA, 2016. |