Thesis (Selection of subject)Thesis (Selection of subject)(version: 356)
Assignment details
   Login via CAS
Výpočetní modely slovotvorby
Thesis title in Czech: Výpočetní modely slovotvorby
Thesis title in English: Computational Models of Word Formation
Key words: vektorová reprezentace slov, slovotvorba, morfologie
English key words: vector space models, word formation, morphology
Academic year of topic announcement: 2023/2024
Type of assignment: dissertation
Thesis language:
Department: Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor: doc. Ing. Zdeněk Žabokrtský, Ph.D.
Word formation data resources harmonized for multiple natural languages were almost non-existent until very recently ([1],[2]), which was a limiting factor for developing models whose validity would be empirically testable in a multilingual setting. The aim of the thesis is to develop, implement, and evaluate word formation models that make use of modern distributional vector space word representations (word embedding models), with a special focus on derivational morphology ([3]) and on multilingual aspects ([4]). Optionally, optimization criteria used in the models can be interpreted in terms of Information Theory, and might reflect hierarchical interactions in a language’s vocabulary, biological and cognitive biases relevant for natural languages, as well as language evolution perspectives.
[1] Batsuren, K., Bella, G., & Giunchiglia, F. (2019, July). CogNet: A Large-Scale Cognate Database. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 3136-3145).
[2] Kyjánek, L., Žabokrtský, Z., Ševčíková, M., & Vidra, J. (2019). Universal Derivations Kickoff: A Collection of Harmonized Derivational Resources for Eleven Languages. In Proceedings of the Second International Workshop on Resources and Tools for Derivational Morphology (pp. 101-110).
[3] Bonami, O., & Paperno, D. (2018). Inflection vs. derivation in a distributional vector space. Lingue e linguaggio, 17(2), 173-196.
[4] Ruder, S., Vulić, I., & Søgaard, A. (2019). A survey of cross-lingual word embedding models. Journal of Artificial Intelligence Research, 65, 569-631.
Charles University | Information system of Charles University |