Generování diskuzních příspěkvů
|Thesis title in Czech:||Generování diskuzních příspěkvů|
|Thesis title in English:||Generating discussion posts|
|Academic year of topic announcement:||2019/2020|
|Type of assignment:||Bachelor's thesis|
|Department:||Institute of Formal and Applied Linguistics (32-UFAL)|
|Supervisor:||Mgr. Rudolf Rosa, Ph.D.|
|Author:||hidden - assigned and confirmed by the Study Dept.|
|Date of registration:||06.11.2019|
|Date of assignment:||30.01.2020|
|Confirmed by Study dept. on:||23.03.2020|
|The goal of the bachelor thesis is to apply NLP techniques to the problem of generating a discussion post for a newspaper article, given the headline of the article as input (potentially using more context on input). The generated discussion post should be coherent and applicable to the topic (as much as typical discussion posts are, which is often not a lot).
A base variant is to use a recurrent neural network to encode the input string into an intermediate representation, followed by a decoding phase of sequentially generating the output sentence. However, due to limitations of this architecture, it is expected that much better results can be obtained by using a Transformer model, also making the training more efficient.
Given the similarity of the problem to machine translation, it is possible to directly apply NMT techniques and tools (such as Marian NMT), as well as many state of the art techniques of improving the model.
A dataset of news articles and discussion posts needs to be acquired from a newspaper agency or collected manually.
|JUNCZYS-DOWMUNT, Marcin, et al. Marian: Fast Neural Machine Translation in C++. In: Proceedings of ACL 2018, System Demonstrations. 2018. p. 116-121.
DEVLIN, Jacob, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. p. 4171-4186.
KOCMI, Tom; BOJAR, Ondřej. Trivial Transfer Learning for Low-Resource Neural Machine Translation. In: Proceedings of the Third Conference on Machine Translation: Research Papers. 2018. p. 244-252.
WANAS, Nayer, et al. Automatic scoring of online discussion posts. In: Proceedings of the 2nd ACM workshop on Information credibility on the web. ACM, 2008. p. 19-26.