Témata prací (Výběr práce)

Váš prohlížeč nepodporuje JavaScript nebo je jeho podpora vypnutá. Některé funkce nemusejí být dostupné.

Multilingual Learning using Syntactic Multi-Task Training

Název práce v češtině:	Vícejazyčné učení pomocí víceúlohového trénování syntaxe
Název v anglickém jazyce:	Multilingual Learning using Syntactic Multi-Task Training
Akademický rok vypsání:	2018/2019
Typ práce:	diplomová práce
Jazyk práce:	angličtina
Ústav:	Ústav formální a aplikované lingvistiky (32-UFAL)
Vedoucí / školitel:	RNDr. Milan Straka, Ph.D.
Řešitel:	skrytý - zadáno a potvrzeno stud. odd.
Datum přihlášení:	14.02.2019
Datum zadání:	14.02.2019
Datum potvrzení stud. oddělením:	25.04.2019
Datum a čas obhajoby:	11.06.2019 09:00
Datum odevzdání elektronické podoby:	10.05.2019
Datum odevzdání tištěné podoby:	10.05.2019
Datum proběhlé obhajoby:	11.06.2019
Oponenti:	RNDr. David Mareček, Ph.D.

Zásady pro vypracování

Recent research has shown promising results in learning syntactic representations of text for improving NLP models via transfer learning, e.g., for Machine Translation and Question Answering (Nadejde et al. 2017, Currey and Heafield 2018, Zhang et al. 2018, Franco-Salvador et al. 2018). The goal of the thesis is to investigate the use of embeddings pretrained on Universal Dependencies (McDonald et al. 2013) using a multi-task neural network (Straka 2018) for multilingual cross-domain transfer learning. Additionally, other approaches find improvement when pretraining unsupervised word representations for language modeling (Peters et al. 2018, Pennington et al. 2014). For instance, Google's recent BERT model (Devlin, et al. 2018) provides state-of-the-art multilingual representations of text useful for improving the performance of a wide range of NLP tasks. However, it is not particularly suitable for syntactic tasks without proper fine-tuning. The thesis will also investigate any possible improvements to BERT by fine-tuning its contextualized representations on Universal Dependencies treebanks, and evaluate these effects on multiple tasks across several languages.

Seznam odborné literatury

- Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).

- Straka, Milan. "UDPipe 2.0 Prototype at CoNLL 2018 UD Shared Task." Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies(2018): 197-207.

- Nadejde, Maria, et al. "Predicting target language CCG supertags improves neural machine translation." arXiv preprint arXiv:1702.01147 (2017).

- Currey, Anna, and Kenneth Heafield. "Multi-Source Syntactic Neural Machine Translation." arXiv preprint arXiv:1808.10267(2018).

- Zhang, Yuhao, Peng Qi, and Christopher D. Manning. "Graph convolution over pruned dependency trees improves relation extraction." arXiv preprint arXiv:1809.10185 (2018).

- Franco-Salvador, Marc, et al. "UH-PRHLT at SemEval-2016 Task 3: Combining lexical and semantic-based features for community question answering." arXiv preprint arXiv:1807.11584 (2018).

- Peters, Matthew E., et al. "Deep contextualized word representations." arXiv preprint arXiv:1802.05365 (2018).

- Pennington, Jeffrey, Richard Socher, and Christopher Manning. "Glove: Global vectors for word representation." Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014.

- McDonald, Ryan, et al. "Universal dependency annotation for multilingual parsing." Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Vol. 2. 2013.