Text classification with limited training data
Thesis title in Czech: | Textová klasifikace s limitovanými trénovacími daty |
---|---|
Thesis title in English: | Text classification with limited training data |
Key words: | NLP|klasifikace textu|weakly supervised learning |
English key words: | NLP|text classification|weakly supervised learning |
Academic year of topic announcement: | 2019/2020 |
Thesis type: | diploma thesis |
Thesis language: | angličtina |
Department: | Institute of Formal and Applied Linguistics (32-UFAL) |
Supervisor: | RNDr. Jiří Hana, Ph.D. |
Author: | hidden - assigned and confirmed by the Study Dept. |
Date of registration: | 06.05.2020 |
Date of assignment: | 06.05.2020 |
Confirmed by Study dept. on: | 17.12.2020 |
Date and time of defence: | 22.06.2021 09:00 |
Date of electronic submission: | 21.05.2021 |
Date of submission of printed version: | 21.05.2021 |
Date of proceeded defence: | 22.06.2021 |
Opponents: | doc. Mgr. Barbora Vidová Hladká, Ph.D. |
Guidelines |
Design a system for classification of short text (e.g. reviews) minimizing the cost of manual work (in terms of time, expertise or both) needed to create training data. The system might use a small amount of high-quality annotated data, low-quality crowd-sourced data, noisy data produced by various heuristics, etc. |
References |
Zhou, Z. (2018). A brief introduction to weakly supervised learning.
Ratner, A.J., Sa, C.D., Wu, S., Selsam, D., & Ré, C. (2016). Data Programming: Creating Large Training Sets, Quickly. Advances in neural information processing systems, 29, 3567-3575 . Bach, S.H., Rodriguez, D., Liu, Y., Luo, C., Shao, H., Xia, C., Sen, S., Ratner, A., Hancock, B., Alborzi, H., Kuchhal, R., Ré, C., & Malkin, R. (2018). Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale. Proceedings. ACM-Sigmod International Conference on Management of Data, 2019, 362-375. Joulin, A.; Grave, E.; Bojanowski, P. & Mikolov, T. (2017): Bag of Tricks for Efficient Text Classification Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, Association for Computational Linguistics, 427-431 |