Témata prací (Výběr práce)Témata prací (Výběr práce)(verze: 368)
Detail práce
   Přihlásit přes CAS
Generování diskuzních příspěkvů
Název práce v češtině: Generování diskuzních příspěkvů
Název v anglickém jazyce: Generating discussion posts
Akademický rok vypsání: 2019/2020
Typ práce: bakalářská práce
Jazyk práce: čeština
Ústav: Ústav formální a aplikované lingvistiky (32-UFAL)
Vedoucí / školitel: Mgr. Rudolf Rosa, Ph.D.
Řešitel: skrytý - zadáno a potvrzeno stud. odd.
Datum přihlášení: 06.11.2019
Datum zadání: 30.01.2020
Datum potvrzení stud. oddělením: 23.03.2020
Zásady pro vypracování
The goal of the bachelor thesis is to apply NLP techniques to the problem of generating a discussion post for a newspaper article, given the headline of the article as input (potentially using more context on input). The generated discussion post should be coherent and applicable to the topic (as much as typical discussion posts are, which is often not a lot).

A base variant is to use a recurrent neural network to encode the input string into an intermediate representation, followed by a decoding phase of sequentially generating the output sentence. However, due to limitations of this architecture, it is expected that much better results can be obtained by using a Transformer model, also making the training more efficient.
Given the similarity of the problem to machine translation, it is possible to directly apply NMT techniques and tools (such as Marian NMT), as well as many state of the art techniques of improving the model.

A dataset of news articles and discussion posts needs to be acquired from a newspaper agency or collected manually.
Seznam odborné literatury
JUNCZYS-DOWMUNT, Marcin, et al. Marian: Fast Neural Machine Translation in C++. In: Proceedings of ACL 2018, System Demonstrations. 2018. p. 116-121.
https://www.aclweb.org/anthology/P18-4020/

DEVLIN, Jacob, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. p. 4171-4186.
https://www.aclweb.org/anthology/N19-1423/

KOCMI, Tom; BOJAR, Ondřej. Trivial Transfer Learning for Low-Resource Neural Machine Translation. In: Proceedings of the Third Conference on Machine Translation: Research Papers. 2018. p. 244-252.
https://www.aclweb.org/anthology/W18-6325/

WANAS, Nayer, et al. Automatic scoring of online discussion posts. In: Proceedings of the 2nd ACM workshop on Information credibility on the web. ACM, 2008. p. 19-26.
https://dl.acm.org/doi/10.1145/1458527.1458534
 
Univerzita Karlova | Informační systém UK