Thesis (Selection of subject)

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Zpracování češtiny s využitím kontextualizované reprezentace

Thesis title in Czech:	Zpracování češtiny s využitím kontextualizované reprezentace
Thesis title in English:	Czech NLP with Contextualized Embeddings
Key words:	čeština\|zpracování přirozeného jazyka\|kontextualizované slovní reprezentace\|BERT
English key words:	Czech\|natural language processing\|contextualized word embeddings\|BERT
Academic year of topic announcement:	2019/2020
Thesis type:	diploma thesis
Thesis language:	čeština
Department:	Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor:	RNDr. Milan Straka, Ph.D.
Author:	hidden - assigned and confirmed by the Study Dept.
Date of registration:	01.04.2020
Date of assignment:	09.04.2020
Confirmed by Study dept. on:	06.05.2020
Date and time of defence:	02.09.2021 09:00
Date of electronic submission:	22.07.2021
Date of submission of printed version:	22.07.2021
Date of proceeded defence:	02.09.2021
Opponents:	prof. RNDr. Jan Hajič, Dr.

Guidelines

Recently, several methods for unsupervised pre-training of contextualized word embeddings have been proposed, most importantly the BERT model (Devlin et al., 2018). Such contextualized representations have been extremely useful as additional features in many NLP tasks like morphosyntactic analysis, entity recognition or text classification.

Most of the evaluation have been carried out on English. However, several of the released models have been pre-trained on many languages including Czech, like multilingual BERT or XLM-RoBERTa (Conneau et al, 2019). Therefore, the goal of this thesis is to perform experiments quantifying improvements of employing pre-trained contextualized representation in Czech natural language processing.

References

- Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://arxiv.org/abs/1810.04805

- Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov: Unsupervised Cross-lingual Representation Learning at Scale. https://arxiv.org/abs/1911.02116