Témata prací (Výběr práce)

Váš prohlížeč nepodporuje JavaScript nebo je jeho podpora vypnutá. Některé funkce nemusejí být dostupné.

Syntaktická analýza textů se střídáním kódů

Název práce v češtině:	Syntaktická analýza textů se střídáním kódů
Název v anglickém jazyce:	Parsing of Texts with Code-Switching
Klíčová slova:	syntaktická analýza, závislostní analýza, treebank, universal dependencies, střídání kódů
Klíčová slova anglicky:	parsing, dependency parsing, treebank, universal dependencies, code switching
Akademický rok vypsání:	2017/2018
Typ práce:	diplomová práce
Jazyk práce:	angličtina
Ústav:	Ústav formální a aplikované lingvistiky (32-UFAL)
Vedoucí / školitel:	RNDr. Daniel Zeman, Ph.D.
Řešitel:	skrytý - zadáno a potvrzeno stud. odd.
Datum přihlášení:	08.03.2018
Datum zadání:	11.03.2018
Datum potvrzení stud. oddělením:	07.08.2018
Datum a čas obhajoby:	11.09.2018 09:00
Datum odevzdání elektronické podoby:	20.07.2018
Datum odevzdání tištěné podoby:	20.07.2018
Datum proběhlé obhajoby:	11.09.2018
Oponenti:	RNDr. David Mareček, Ph.D.

Zásady pro vypracování

The aim of this thesis is to create and evaluate systems for dependency parsing of code-switched language data (i.e. utterances where speakers use two languages and switch between them freely). This involves several tasks. Besides selecting and training existing dependency parsers, it will be also necessary to adapt them for the domain of the task (code-switching is often tied to informal domains such as social media). Some attention should be paid to tokenization and preprocessing so that the parser can operate on raw text. The main task is then the model selection (i.e. language recognition) and/or training a joint model for the two languages. The parsing system will be evaluated on at least one language pair, depending on data availability. Code-switched corpora are being developed for several language pairs but their manual syntactic annotation may not be available in time for this thesis. If gold-standard data cannot be obtained from other sources, a small evaluation dataset will be manually annotated as a part of this thesis project.

Seznam odborné literatury

* Bhat, Irshad & Bhat, Riyaz & Shrivastava, Manish. (2018). Universal Dependency Parsing for Hindi-English Code-switching.

* Özlem Çetinoğlu and Çağrı Çöltekin. (2016). Part of Speech Annotation of a Turkish-German Code-Switching Corpus. In the Proceedings of the 10th Linguistic Annotation Workshop (LAW-X), August 2016, Berlin, Germany.