Zpracování češtiny s využitím kontextualizované reprezentace
Thesis title in Czech: | Zpracování češtiny s využitím kontextualizované reprezentace |
---|---|
Thesis title in English: | Czech NLP with Contextualized Embeddings |
Key words: | čeština|zpracování přirozeného jazyka|kontextualizované slovní reprezentace|BERT |
English key words: | Czech|natural language processing|contextualized word embeddings|BERT |
Academic year of topic announcement: | 2019/2020 |
Thesis type: | diploma thesis |
Thesis language: | čeština |
Department: | Institute of Formal and Applied Linguistics (32-UFAL) |
Supervisor: | RNDr. Milan Straka, Ph.D. |
Author: | hidden![]() |
Date of registration: | 01.04.2020 |
Date of assignment: | 09.04.2020 |
Confirmed by Study dept. on: | 06.05.2020 |
Date and time of defence: | 02.09.2021 09:00 |
Date of electronic submission: | 22.07.2021 |
Date of submission of printed version: | 22.07.2021 |
Date of proceeded defence: | 02.09.2021 |
Opponents: | prof. RNDr. Jan Hajič, Dr. |
Guidelines |
Recently, several methods for unsupervised pre-training of contextualized word embeddings have been proposed, most importantly the BERT model (Devlin et al., 2018). Such contextualized representations have been extremely useful as additional features in many NLP tasks like morphosyntactic analysis, entity recognition or text classification.
Most of the evaluation have been carried out on English. However, several of the released models have been pre-trained on many languages including Czech, like multilingual BERT or XLM-RoBERTa (Conneau et al, 2019). Therefore, the goal of this thesis is to perform experiments quantifying improvements of employing pre-trained contextualized representation in Czech natural language processing. |
References |
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://arxiv.org/abs/1810.04805
- Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov: Unsupervised Cross-lingual Representation Learning at Scale. https://arxiv.org/abs/1911.02116 |