Universal Morphological Analysis using ReinforcementLearning
Název práce v češtině: | Univerzální morfologická analýza s využitím reinforcement learning |
---|---|
Název v anglickém jazyce: | Universal Morphological Analysis using ReinforcementLearning |
Klíčová slova: | morfologická analýza, reinforcement learning |
Klíčová slova anglicky: | morphological analysis, reinforcement learning |
Akademický rok vypsání: | 2018/2019 |
Typ práce: | diplomová práce |
Jazyk práce: | angličtina |
Ústav: | Ústav formální a aplikované lingvistiky (32-UFAL) |
Vedoucí / školitel: | RNDr. Daniel Zeman, Ph.D. |
Řešitel: | Mgr. Ronald Ahmed Cardenas Acosta - zadáno a potvrzeno stud. odd. |
Datum přihlášení: | 25.02.2019 |
Datum zadání: | 07.03.2019 |
Datum potvrzení stud. oddělením: | 25.04.2019 |
Datum a čas obhajoby: | 04.02.2020 09:00 |
Datum odevzdání elektronické podoby: | 04.01.2020 |
Datum odevzdání tištěné podoby: | 06.01.2020 |
Datum proběhlé obhajoby: | 04.02.2020 |
Oponenti: | RNDr. David Mareček, Ph.D. |
Zásady pro vypracování |
In this thesis we take a universal approach to morphological analysis in context. The approach consists of jointly simulating word formation steps and morphological label assignment, one step at a time. Such mechanism is modeled as a neural WFSA (Schwartz et al., 2018), in an effort to add interpretability to an otherwise ‘blackbox’ architecture. Then, the problem is formulated as a multi-armed bandit problem in which each arm captures a specific kind of word formation process. Each arm can then learn how word formation processes are carried out in different languages. Moreover, the model has the potential to learn how to combine processes from different arms, i.e. to model how a language can combine different kind of processes in the same derivation (e.g. German exhibits circumfixation, affixation, and compounding).
Our model leverages paradigm annotations and morphologically labeled sentences in a varied sample of high resource languages made available by the CONLL-SIGMORPHON shared tasks. We evaluate the effectiveness of our approach in high and low-resource scenarios against strong neural baselines for the languages of English, Spanish, German, Czech, Turkish, and Shipibo-Konibo. |
Seznam odborné literatury |
Ramy Eskander, Owen Rambow, and Smaranda Muresan. 2018. Automatically tailoring
unsupervised morphological segmentation to the language. In Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 78-83. Ramy Eskander, Owen Rambow, and Tianchun Yang. 2016. Extending the Use of Adaptor Grammars for Unsupervised Morphological Segmentation of Unseen Languages. Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 900-910. Mark Johnson. 2008. Unsupervised word segmentation for sesotho using adaptor grammars. In Proceedings of the Tenth Meeting of ACL Special Interest Group on Computational Morphology and Phonology, pages 20-27. Association for Computational Linguistics. Hao Peng, Roy Schwartz, Sam Thomson, and Noah A Smith. 2018. Rational recurrences. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1203-1214. Roy Schwartz, Sam Thomson, and Noah A. Smith. 2018. SoPa: Bridging CNNs, RNNs, and Weighted Finite-State Machines. Kairit Sirts and Sharon Goldwater. 2013. Minimally-supervised morphological segmentation using adaptor grammars. Transactions of the Association of Computational Linguistics, 1:255-266. |