The aim of this thesis is to create and evaluate systems for dependency parsing of code-switched language data (i.e. utterances where speakers use two languages and switch between them freely). This involves several tasks. Besides selecting and training existing dependency parsers, it will be also necessary to adapt them for the domain of the task (code-switching is often tied to informal domains such as social media). Some attention should be paid to tokenization and preprocessing so that the parser can operate on raw text. The main task is then the model selection (i.e. language recognition) and/or training a joint model for the two languages. The parsing system will be evaluated on at least one language pair, depending on data availability. Code-switched corpora are being developed for several language pairs but their manual syntactic annotation may not be available in time for this thesis. If gold-standard data cannot be obtained from other sources, a small evaluation dataset will be manually annotated as a part of this thesis project.
* Özlem Çetinoğlu and Çağrı Çöltekin. (2016). Part of Speech Annotation of a Turkish-German Code-Switching Corpus. In the Proceedings of the 10th Linguistic Annotation Workshop (LAW-X), August 2016, Berlin, Germany.