Thesis (Selection of subject)Thesis (Selection of subject)(version: 368)
Thesis details
   Login via CAS
Text classification with limited training data
Thesis title in Czech: Textová klasifikace s limitovanými trénovacími daty
Thesis title in English: Text classification with limited training data
Key words: NLP|klasifikace textu|weakly supervised learning
English key words: NLP|text classification|weakly supervised learning
Academic year of topic announcement: 2019/2020
Thesis type: diploma thesis
Thesis language: angličtina
Department: Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor: RNDr. Jiří Hana, Ph.D.
Author: hidden - assigned and confirmed by the Study Dept.
Date of registration: 06.05.2020
Date of assignment: 06.05.2020
Confirmed by Study dept. on: 17.12.2020
Date and time of defence: 22.06.2021 09:00
Date of electronic submission:21.05.2021
Date of submission of printed version:21.05.2021
Date of proceeded defence: 22.06.2021
Opponents: doc. Mgr. Barbora Vidová Hladká, Ph.D.
 
 
 
Guidelines
Design a system for classification of short text (e.g. reviews) minimizing the cost of manual work (in terms of time, expertise or both) needed to create training data. The system might use a small amount of high-quality annotated data, low-quality crowd-sourced data, noisy data produced by various heuristics, etc.
References
Zhou, Z. (2018). A brief introduction to weakly supervised learning.

Ratner, A.J., Sa, C.D., Wu, S., Selsam, D., & Ré, C. (2016). Data Programming: Creating Large Training Sets, Quickly. Advances in neural information processing systems, 29, 3567-3575 .

Bach, S.H., Rodriguez, D., Liu, Y., Luo, C., Shao, H., Xia, C., Sen, S., Ratner, A., Hancock, B., Alborzi, H., Kuchhal, R., Ré, C., & Malkin, R. (2018). Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale. Proceedings. ACM-Sigmod International Conference on Management of Data, 2019, 362-375.

Joulin, A.; Grave, E.; Bojanowski, P. & Mikolov, T. (2017):
Bag of Tricks for Efficient Text Classification
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, Association for Computational Linguistics, 427-431
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html