Thesis (Selection of subject)Thesis (Selection of subject)(version: 368)
Thesis details
   Login via CAS
Generování diskuzních příspěkvů
Thesis title in Czech: Generování diskuzních příspěkvů
Thesis title in English: Generating discussion posts
Academic year of topic announcement: 2019/2020
Thesis type: Bachelor's thesis
Thesis language: čeština
Department: Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor: Mgr. Rudolf Rosa, Ph.D.
Author: hidden - assigned and confirmed by the Study Dept.
Date of registration: 06.11.2019
Date of assignment: 30.01.2020
Confirmed by Study dept. on: 23.03.2020
Guidelines
The goal of the bachelor thesis is to apply NLP techniques to the problem of generating a discussion post for a newspaper article, given the headline of the article as input (potentially using more context on input). The generated discussion post should be coherent and applicable to the topic (as much as typical discussion posts are, which is often not a lot).

A base variant is to use a recurrent neural network to encode the input string into an intermediate representation, followed by a decoding phase of sequentially generating the output sentence. However, due to limitations of this architecture, it is expected that much better results can be obtained by using a Transformer model, also making the training more efficient.
Given the similarity of the problem to machine translation, it is possible to directly apply NMT techniques and tools (such as Marian NMT), as well as many state of the art techniques of improving the model.

A dataset of news articles and discussion posts needs to be acquired from a newspaper agency or collected manually.
References
JUNCZYS-DOWMUNT, Marcin, et al. Marian: Fast Neural Machine Translation in C++. In: Proceedings of ACL 2018, System Demonstrations. 2018. p. 116-121.
https://www.aclweb.org/anthology/P18-4020/

DEVLIN, Jacob, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. p. 4171-4186.
https://www.aclweb.org/anthology/N19-1423/

KOCMI, Tom; BOJAR, Ondřej. Trivial Transfer Learning for Low-Resource Neural Machine Translation. In: Proceedings of the Third Conference on Machine Translation: Research Papers. 2018. p. 244-252.
https://www.aclweb.org/anthology/W18-6325/

WANAS, Nayer, et al. Automatic scoring of online discussion posts. In: Proceedings of the 2nd ACM workshop on Information credibility on the web. ACM, 2008. p. 19-26.
https://dl.acm.org/doi/10.1145/1458527.1458534
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html