Empirical Models for an Indic Language Continuum
Thesis title in Czech: | Empirické modely pro indické jazykové kontinuum |
---|---|
Thesis title in English: | Empirical Models for an Indic Language Continuum |
Key words: | vícejazyčná data|jazykové kontinuum|zpracování přirozeného jazyka |
English key words: | multilingual data|language continuum|Natural Language Processing |
Academic year of topic announcement: | 2021/2022 |
Thesis type: | diploma thesis |
Thesis language: | angličtina |
Department: | Institute of Formal and Applied Linguistics (32-UFAL) |
Supervisor: | doc. Ing. Zdeněk Žabokrtský, Ph.D. |
Author: | hidden - assigned and confirmed by the Study Dept. |
Date of registration: | 04.03.2022 |
Date of assignment: | 04.03.2022 |
Confirmed by Study dept. on: | 28.04.2022 |
Date and time of defence: | 02.09.2022 09:00 |
Date of electronic submission: | 20.07.2022 |
Date of submission of printed version: | 25.07.2022 |
Date of proceeded defence: | 02.09.2022 |
Opponents: | RNDr. Daniel Zeman, Ph.D. |
Guidelines |
One can observe a set of language varieties in some geographical areas, with neighboring varieties being mutually intelligible. An example is the Indo-Aryan language family in North India, with tens of languages and dialects ranging from Punjabi on west to Bengali on east. The goal of the thesis is to study this continuum by computational methods. The student will gather language data for languages and dialects from this family, will design and evaluate an empirical model for quantifying similarities and differences across the range of language varieties, and propose ways for employing such a model in modern Natural Language Processing applications. |
References |
Chakravarthi, Bharathi Raja, et al. Findings of the VarDial Evaluation Campaign 2021. Proceedings of the 8th VarDial Workshop on NLP for Similar Languages, Varieties and Dialects. The Association for Computational Linguistics, 2021.
Paltridge, Brian, and Aek Phakiti, eds. Continuum companion to research methods in applied linguistics. A&C Black, 2010. Masica, Colin P. The indo-aryan languages. Cambridge University Press, 1993. Jha, Saurav, Akhilesh Sudhakar, and Anil Kumar Singh. Learning cross-lingual phonological and orthographic adaptations: a case study in improving neural machine translation between low-resource languages. arXiv preprint arXiv:1811.08816 (2018). |