Témata prací (Výběr práce)

Váš prohlížeč nepodporuje JavaScript nebo je jeho podpora vypnutá. Některé funkce nemusejí být dostupné.

Native Language Identification of L2 Speakers of Czech

Název práce v češtině:	Identifikace rodného jazyka cizinců mluvících česky
Název v anglickém jazyce:	Native Language Identification of L2 Speakers of Czech
Klíčová slova:	počítačová lingvistika, NLP, strojové učení, Identifikace rodného jazyka, NLI
Klíčová slova anglicky:	computational linguistics, NLP, machine learning, Native Language Identification, NLI
Akademický rok vypsání:	2015/2016
Typ práce:	bakalářská práce
Jazyk práce:	angličtina
Ústav:	Ústav formální a aplikované lingvistiky (32-UFAL)
Vedoucí / školitel:	RNDr. Jiří Hana, Ph.D.
Řešitel:	skrytý - zadáno a potvrzeno stud. odd.
Datum přihlášení:	06.11.2015
Datum zadání:	06.11.2015
Datum potvrzení stud. oddělením:	08.12.2015
Datum a čas obhajoby:	08.09.2016 00:00
Datum odevzdání elektronické podoby:	28.07.2016
Datum odevzdání tištěné podoby:	28.07.2016
Datum proběhlé obhajoby:	08.09.2016
Oponenti:	doc. Mgr. Barbora Vidová Hladká, Ph.D.

Zásady pro vypracování

The thesis will explore the identification of the native language of non-native speakers of Czech.

Native Language Identification is the task of identifying author's native language based only on their productions in a second language. It has been explored by a series of researchers (Cf. Koppel, 2005, Tetreault et al, 2012, Wong and Das 2009, etc). The absolute majority of previous work has focused on English as the second language.

By applying machine learning methods, the goal of the thesis is to train a classifier for predicting the native language group of authors of Czech texts and to analyse and evaluate the role of various linguistic features.

Seznam odborné literatury

Koppel, Moshe, Jonathan Schler, and Kfir Zigdon (2005). Determining an author’s native language by mining a text for errors. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD ’05), pages 624–628, Chicago, Illinois, USA.

Wong, Sze-Meng Jojo, and Mark Dras (2011). Exploiting parse structures for native language identification". Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.

Tetreault et al (2012). Native Tongues, Lost and Found: Resources and Empirical Evaluations in Native Language Identification, In Proc. International Conf. on Computational Linguistics (COLING).

Malmasi, Shervin, Sze-Meng Jojo Wong, and Mark Dras (2013). NLI Shared Task 2013: MQ submission. Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications.

Tetreault et al (2013). A report on the first native language identification shared task.