Témata prací (Výběr práce)

Váš prohlížeč nepodporuje JavaScript nebo je jeho podpora vypnutá. Některé funkce nemusejí být dostupné.

Modelování kompozit pro vícejazyčné zdroje jazykových dat

Název práce v češtině:	Modelování kompozit pro vícejazyčné zdroje jazykových dat
Název v anglickém jazyce:	Modelling compounds for multilingual language data resources
Klíčová slova:	kompozitum, slovotvorba, základové slovo, zdroj jazykových dat, vícejazyčný
Klíčová slova anglicky:	compound, word-formation, base word, language data resource, multilingual
Akademický rok vypsání:	2020/2021
Typ práce:	disertační práce
Jazyk práce:	čeština
Ústav:	Ústav formální a aplikované lingvistiky (32-UFAL)
Vedoucí / školitel:	doc. Mgr. Magda Ševčíková, Ph.D.
Řešitel:	skrytý - zadáno a potvrzeno stud. odd.
Datum přihlášení:	08.09.2020
Datum zadání:	08.09.2020
Datum potvrzení stud. oddělením:	30.09.2020
Datum a čas obhajoby:	27.09.2024 10:40
Datum odevzdání elektronické podoby:	31.07.2024
Datum odevzdání tištěné podoby:	01.08.2024
Datum proběhlé obhajoby:	27.09.2024
Oponenti:	RNDr. Jiří Hana, Ph.D.
	prof. Nabil Hathout

Zásady pro vypracování

Compounds, defined generally as words based on more than one word (e.g., En. sun+flower > sunflower, Cz. ryba ‘fish’+lov ‘hunt’ > rybolov ‘fishery’), are an inherent part of existing language data resources. Their delimitation, though, differs largely across languages, depending on the grammatical structure of the languages as well as on the particular linguistic tradition (Lieber & Štekauer 2011, Štekauer et al. 2012). The goal of the thesis is to elaborate a workable definition and representation of compound words that would be robust and general enough for a number of typologically diverse languages in a way that is both understandable by humans and implementable for multilingual language data resources (e.g., Kyjánek et al. 2019).
The thesis will deal with the identification of base words for compounds, aiming at delineating boundaries between compounding and other word-formation processes (in particular, derivation and blending) and between compounding and syntax (cf. Russian город-сад ‘garden-city’ or English examples with multiple spelling variants flowerpot / flower-pot / flower pot). The intra-word analysis will focus on both syntactic and semantic relationships between the compound parts; cf. German [Schule+Jahr]+Ende > Schuljahresende, Cz. modrý ‘blue’+oko ‘eye’ > modrooký ‘blue-eyed’ (Scalise & Vogel 2010, Štichauer 2013). By extending the multilingual resources with a coherent compound annotation and classification, the resulting data will be exploitable in linguistic typological studies as well as Natural Language Processing tasks, e.g., when dealing with out-of-vocabulary words.

Seznam odborné literatury

Kyjánek, L., Žabokrtský Z., Ševčíková M. & Vidra J. (2019). Universal Derivations Kickoff: A Collection of Harmonized Derivational Resources for Eleven Languages. In Proceedings of the Second International Workshop on Resources and Tools for Derivational Morphology (DeriMo 2019). Praha: ÚFAL MFF UK, pp. 101-110.
Lieber, R. & Štekauer, P. (2011). The Oxford handbook of compounding. Oxford: Oxford University Press.
Scalise, S. & Vogel, I. (eds.; 2010). Cross-disciplinary issues in compounding. Amsterdam: Benjamins.
Štekauer, P., Valera, S. & Körtvélyessy, L. (2012). Word-Formation in the World’s Languages. Cambridge: Cambridge University Press.
Štichauer, P. (2013). Je možná nová klasifikace českých kompozit? Časopis pro moderní filologii, 95(2), 113–128.