Thesis (Selection of subject)

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Empirical Models for an Indic Language Continuum

Thesis title in Czech:	Empirické modely pro indické jazykové kontinuum
Thesis title in English:	Empirical Models for an Indic Language Continuum
Key words:	vícejazyčná data\|jazykové kontinuum\|zpracování přirozeného jazyka
English key words:	multilingual data\|language continuum\|Natural Language Processing
Academic year of topic announcement:	2021/2022
Thesis type:	diploma thesis
Thesis language:	angličtina
Department:	Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor:	doc. Ing. Zdeněk Žabokrtský, Ph.D.
Author:	hidden - assigned and confirmed by the Study Dept.
Date of registration:	04.03.2022
Date of assignment:	04.03.2022
Confirmed by Study dept. on:	28.04.2022
Date and time of defence:	02.09.2022 09:00
Date of electronic submission:	20.07.2022
Date of submission of printed version:	25.07.2022
Date of proceeded defence:	02.09.2022
Opponents:	RNDr. Daniel Zeman, Ph.D.

Guidelines

One can observe a set of language varieties in some geographical areas, with neighboring varieties being mutually intelligible. An example is the Indo-Aryan language family in North India, with tens of languages and dialects ranging from Punjabi on west to Bengali on east. The goal of the thesis is to study this continuum by computational methods. The student will gather language data for languages and dialects from this family, will design and evaluate an empirical model for quantifying similarities and differences across the range of language varieties, and propose ways for employing such a model in modern Natural Language Processing applications.

References

Chakravarthi, Bharathi Raja, et al. Findings of the VarDial Evaluation Campaign 2021. Proceedings of the 8th VarDial Workshop on NLP for Similar Languages, Varieties and Dialects. The Association for Computational Linguistics, 2021.
Paltridge, Brian, and Aek Phakiti, eds. Continuum companion to research methods in applied linguistics. A&C Black, 2010.
Masica, Colin P. The indo-aryan languages. Cambridge University Press, 1993.
Jha, Saurav, Akhilesh Sudhakar, and Anil Kumar Singh. Learning cross-lingual phonological and orthographic adaptations: a case study in improving neural machine translation between low-resource languages. arXiv preprint arXiv:1811.08816 (2018).