Multilingual Learning using Syntactic Multi-Task Training
Název práce v češtině: | Vícejazyčné učení pomocí víceúlohového trénování syntaxe |
---|---|
Název v anglickém jazyce: | Multilingual Learning using Syntactic Multi-Task Training |
Akademický rok vypsání: | 2018/2019 |
Typ práce: | diplomová práce |
Jazyk práce: | angličtina |
Ústav: | Ústav formální a aplikované lingvistiky (32-UFAL) |
Vedoucí / školitel: | RNDr. Milan Straka, Ph.D. |
Řešitel: | skrytý![]() |
Datum přihlášení: | 14.02.2019 |
Datum zadání: | 14.02.2019 |
Datum potvrzení stud. oddělením: | 25.04.2019 |
Datum a čas obhajoby: | 11.06.2019 09:00 |
Datum odevzdání elektronické podoby: | 10.05.2019 |
Datum odevzdání tištěné podoby: | 10.05.2019 |
Datum proběhlé obhajoby: | 11.06.2019 |
Oponenti: | RNDr. David Mareček, Ph.D. |
Zásady pro vypracování |
Recent research has shown promising results in learning syntactic representations of text for improving NLP models via transfer learning, e.g., for Machine Translation and Question Answering (Nadejde et al. 2017, Currey and Heafield 2018, Zhang et al. 2018, Franco-Salvador et al. 2018). The goal of the thesis is to investigate the use of embeddings pretrained on Universal Dependencies (McDonald et al. 2013) using a multi-task neural network (Straka 2018) for multilingual cross-domain transfer learning. Additionally, other approaches find improvement when pretraining unsupervised word representations for language modeling (Peters et al. 2018, Pennington et al. 2014). For instance, Google's recent BERT model (Devlin, et al. 2018) provides state-of-the-art multilingual representations of text useful for improving the performance of a wide range of NLP tasks. However, it is not particularly suitable for syntactic tasks without proper fine-tuning. The thesis will also investigate any possible improvements to BERT by fine-tuning its contextualized representations on Universal Dependencies treebanks, and evaluate these effects on multiple tasks across several languages. |
Seznam odborné literatury |
- Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).
- Straka, Milan. "UDPipe 2.0 Prototype at CoNLL 2018 UD Shared Task." Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies(2018): 197-207. - Nadejde, Maria, et al. "Predicting target language CCG supertags improves neural machine translation." arXiv preprint arXiv:1702.01147 (2017). - Currey, Anna, and Kenneth Heafield. "Multi-Source Syntactic Neural Machine Translation." arXiv preprint arXiv:1808.10267(2018). - Zhang, Yuhao, Peng Qi, and Christopher D. Manning. "Graph convolution over pruned dependency trees improves relation extraction." arXiv preprint arXiv:1809.10185 (2018). - Franco-Salvador, Marc, et al. "UH-PRHLT at SemEval-2016 Task 3: Combining lexical and semantic-based features for community question answering." arXiv preprint arXiv:1807.11584 (2018). - Peters, Matthew E., et al. "Deep contextualized word representations." arXiv preprint arXiv:1802.05365 (2018). - Pennington, Jeffrey, Richard Socher, and Christopher Manning. "Glove: Global vectors for word representation." Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014. - McDonald, Ryan, et al. "Universal dependency annotation for multilingual parsing." Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Vol. 2. 2013. |