Comparison of approaches to text classification
Thesis title in Czech: | Porovnání přístupů ke klasifikaci textu |
---|---|
Thesis title in English: | Comparison of approaches to text classification |
Key words: | NLP, klasifikace textu, strojové učení, klasifikace recenzí |
English key words: | NLP, text classification, machine learning, review classification |
Academic year of topic announcement: | 2018/2019 |
Thesis type: | Bachelor's thesis |
Thesis language: | angličtina |
Department: | Institute of Formal and Applied Linguistics (32-UFAL) |
Supervisor: | RNDr. Jiří Hana, Ph.D. |
Author: | hidden - assigned and confirmed by the Study Dept. |
Date of registration: | 01.11.2018 |
Date of assignment: | 01.11.2018 |
Confirmed by Study dept. on: | 27.03.2019 |
Date and time of defence: | 05.09.2019 09:00 |
Date of electronic submission: | 18.07.2019 |
Date of submission of printed version: | 19.07.2019 |
Date of proceeded defence: | 05.09.2019 |
Opponents: | doc. Mgr. Barbora Vidová Hladká, Ph.D. |
Guidelines |
Compare approaches to text classification based on machine learning. Special attention should be paid to an evaluation of the usefulness of various features, ranging from simple (length of text, bag-of-words) to more complicated ones derived from syntax, detected entities, etc.
For training and testing, use the current Yelp challenge dataset of reviews. The data contain several candidate target variables (usefulness of review, rating), select one or more of them. The comparison should include - Comparison of basic algorithms (their results, speed, ...) - Evaluation of impact of training data size - Evaluation of various text features - Comparison of text features with metadata features |
References |
Jurafsky, Daniel a Martin, James H. 2015. Speech and Language Processing. 2015.
Raschka, Sebastian and Vahid Mirjalili 2017. Python Machine Learning Mai, Jens-Erik 2011. The modernity of classification. Journal of Documentation67. 4: 710-730. Sebastiani, Fabrizio 2002. Machine Learning in Automated Text Categorization. ACM Computing Surveys, Vol. 34, No. 1, March 2002, pp. 1–47. |
Preliminary scope of work |
Compare approaches to text classification based on machine learning. Special attention should be paid to an evaluation of the usefulness of various features, ranging from simple (length of text, bag-of-words) to more complicated ones derived from syntax, detected entities, etc. |
Preliminary scope of work in English |
Compare approaches to text classification based on machine learning. Special attention should be paid to an evaluation of the usefulness of various features, ranging from simple (length of text, bag-of-words) to more complicated ones derived from syntax, detected entities, etc.
|