Genres classification by means of machine learning
Thesis title in Czech: | Klasifikace žánrů pomocí strojového učení |
---|---|
Thesis title in English: | Genres classification by means of machine learning |
Key words: | Strojové učení, zpracování přirozeného jazyka, klasifikace žánrů, vnoření slov, paragraph vector |
English key words: | Machine learning, natural language processing, genre classification, word embeddings, paragraph vector |
Academic year of topic announcement: | 2017/2018 |
Thesis type: | diploma thesis |
Thesis language: | angličtina |
Department: | Department of Theoretical Computer Science and Mathematical Logic (32-KTIML) |
Supervisor: | Mgr. Roman Neruda, CSc. |
Author: | hidden - assigned and confirmed by the Study Dept. |
Date of registration: | 10.05.2018 |
Date of assignment: | 04.06.2018 |
Confirmed by Study dept. on: | 18.07.2018 |
Date and time of defence: | 13.09.2018 00:00 |
Date of electronic submission: | 20.07.2018 |
Date of submission of printed version: | 20.07.2018 |
Date of proceeded defence: | 13.09.2018 |
Opponents: | Mgr. Marta Vomlelová, Ph.D. |
Guidelines |
The goal of the thesis is to compare several approaches to text processing and classification and apply them on the task of literary genre classification. The student will propose and design a model based on machine learning that can predict genres given a short part from an English text. A corpus of selected texts from project Gutenberg will be used for training and testing the model. As part of the thesis, the dataset will be explored, and interesting text and language properties as well as typical structures for different genres will be identified. A practical implementation of the proposed algorithms in suitable environment (such as Python, scikit-learn, and TensorFlow) is expected. |
References |
Ian Goodfellow, Yoshua Bengio, Aaron Courville: Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org
Peter Flach: Machine learning. Cambridge University Press, 2012. Quoc Le, Tomáš Mikolov: Distributed Representations of Sentences and Documents. CoRR journal, 2014. http://arxiv.org/abs/1405.4053v2 Tomáš Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean: Efficient Estimation of Word Representations in Vector Space. CoRR journal, 2013. http://arxiv.org/abs/1301.3781v3 Yoon Kim: Convolutional Neural Networks for Sentence Classification. CoRR journal, 2014. http://arxiv.org/abs/1408.5882v2 |