Modelování kompozit pro vícejazyčné zdroje jazykových dat
Thesis title in Czech: | Modelování kompozit pro vícejazyčné zdroje jazykových dat |
---|---|
Thesis title in English: | Modelling compounds for multilingual language data resources |
Key words: | kompozitum, slovotvorba, základové slovo, zdroj jazykových dat, vícejazyčný |
English key words: | compound, word-formation, base word, language data resource, multilingual |
Academic year of topic announcement: | 2020/2021 |
Thesis type: | dissertation |
Thesis language: | čeština |
Department: | Institute of Formal and Applied Linguistics (32-UFAL) |
Supervisor: | doc. Mgr. Magda Ševčíková, Ph.D. |
Author: | hidden![]() |
Date of registration: | 08.09.2020 |
Date of assignment: | 08.09.2020 |
Confirmed by Study dept. on: | 30.09.2020 |
Date and time of defence: | 27.09.2024 10:40 |
Date of electronic submission: | 31.07.2024 |
Date of submission of printed version: | 01.08.2024 |
Date of proceeded defence: | 27.09.2024 |
Opponents: | RNDr. Jiří Hana, Ph.D. |
prof. Nabil Hathout | |
Guidelines |
Compounds, defined generally as words based on more than one word (e.g., En. sun+flower > sunflower, Cz. ryba ‘fish’+lov ‘hunt’ > rybolov ‘fishery’), are an inherent part of existing language data resources. Their delimitation, though, differs largely across languages, depending on the grammatical structure of the languages as well as on the particular linguistic tradition (Lieber & Štekauer 2011, Štekauer et al. 2012). The goal of the thesis is to elaborate a workable definition and representation of compound words that would be robust and general enough for a number of typologically diverse languages in a way that is both understandable by humans and implementable for multilingual language data resources (e.g., Kyjánek et al. 2019).
The thesis will deal with the identification of base words for compounds, aiming at delineating boundaries between compounding and other word-formation processes (in particular, derivation and blending) and between compounding and syntax (cf. Russian город-сад ‘garden-city’ or English examples with multiple spelling variants flowerpot / flower-pot / flower pot). The intra-word analysis will focus on both syntactic and semantic relationships between the compound parts; cf. German [Schule+Jahr]+Ende > Schuljahresende, Cz. modrý ‘blue’+oko ‘eye’ > modrooký ‘blue-eyed’ (Scalise & Vogel 2010, Štichauer 2013). By extending the multilingual resources with a coherent compound annotation and classification, the resulting data will be exploitable in linguistic typological studies as well as Natural Language Processing tasks, e.g., when dealing with out-of-vocabulary words. |
References |
Kyjánek, L., Žabokrtský Z., Ševčíková M. & Vidra J. (2019). Universal Derivations Kickoff: A Collection of Harmonized Derivational Resources for Eleven Languages. In Proceedings of the Second International Workshop on Resources and Tools for Derivational Morphology (DeriMo 2019). Praha: ÚFAL MFF UK, pp. 101-110.
Lieber, R. & Štekauer, P. (2011). The Oxford handbook of compounding. Oxford: Oxford University Press. Scalise, S. & Vogel, I. (eds.; 2010). Cross-disciplinary issues in compounding. Amsterdam: Benjamins. Štekauer, P., Valera, S. & Körtvélyessy, L. (2012). Word-Formation in the World’s Languages. Cambridge: Cambridge University Press. Štichauer, P. (2013). Je možná nová klasifikace českých kompozit? Časopis pro moderní filologii, 95(2), 113–128. |