Témata prací (Výběr práce)

Váš prohlížeč nepodporuje JavaScript nebo je jeho podpora vypnutá. Některé funkce nemusejí být dostupné.

Deep contextualized word embeddings from character language models for neural sequence labeling

Název práce v češtině:	Použití hlubokých kontextualizovaných slovních reprezentací založených na znacích pro neuronové sekvenční značkování
Název v anglickém jazyce:	Deep contextualized word embeddings from character language models for neural sequence labeling
Klíčová slova:	umělé nuronové sítě, sekvenční značkování, znakové jazykové modely
Klíčová slova anglicky:	artificial neural networks, sequence labeling, character language models, part-of-speech tagging, named entity recognition, multiword expression, word embedding, deep learning, Portuguese
Akademický rok vypsání:	2018/2019
Typ práce:	diplomová práce
Jazyk práce:	angličtina
Ústav:	Ústav formální a aplikované lingvistiky (32-UFAL)
Vedoucí / školitel:	doc. RNDr. Pavel Pecina, Ph.D.
Řešitel:	skrytý - zadáno a potvrzeno stud. odd.
Datum přihlášení:	05.12.2018
Datum zadání:	09.12.2018
Datum potvrzení stud. oddělením:	14.12.2018
Datum a čas obhajoby:	04.02.2019 09:00
Datum odevzdání elektronické podoby:	04.01.2019
Datum odevzdání tištěné podoby:	04.01.2019
Datum proběhlé obhajoby:	04.02.2019
Oponenti:	Ing. Tom Kocmi, Ph.D.

Zásady pro vypracování

Although pretrained distributed word embeddings such as word2vec (Mikolov et al 2013) have proven quite useful in modeling the latent semantic and syntactic similarities of words, they suffer the drawback that they do not model how words very frequently have different meanings (polysemy). Two recent studies (Akbik et al 2018 and Peters et al 2018), making use of pretrained contextualized word representations which model word meaning in context, have advanced the state of the art on sequence labeling tasks such as part of speech (PoS) tagging and named entity recognition (NER). The goal of the thesis is to extend the scope of the above studies and evaluate the use of contextualized embeddings for other sequence labeling tasks for a selected language (PoS tagging, NER, and Verbal Multiword Expression identification) and leverage different combinations of embeddings (pretrained character LM, pretrained non-contextual, in-task character, in-task PoS, in-task lemma) in a neural sequence tagging model.

Seznam odborné literatury

Matthew Peters, Mark Neumann, and Christopher Clark Kenton Lee Luke Zettlemoyer Mohit Iyyer, Matt Gardner. 2018. Deep contextualized word representations. 6th International Conference on Learning Representations.

Alan Akbik , Duncan Blythe, and Roland Vollgraf. 2018. Contextual string embeddings for sequence labeling. Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics.

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean. 2013 Distributed Representations of Words and Phrases and their Compositionality [arXiv:1310.4546].