Univerzalní morfologický značkovač
Název práce v češtině: | Univerzální značkování slovních druhů |
---|---|
Název v anglickém jazyce: | Universal POS Tagger |
Akademický rok vypsání: | 2012/2013 |
Typ práce: | diplomová práce |
Jazyk práce: | angličtina |
Ústav: | Ústav formální a aplikované lingvistiky (32-UFAL) |
Vedoucí / školitel: | doc. RNDr. Pavel Pecina, Ph.D. |
Řešitel: | skrytý - zadáno a potvrzeno stud. odd. |
Datum přihlášení: | 16.07.2013 |
Datum zadání: | 16.07.2013 |
Datum potvrzení stud. oddělením: | 01.08.2013 |
Datum a čas obhajoby: | 02.09.2013 00:00 |
Datum odevzdání elektronické podoby: | 04.09.2013 |
Datum odevzdání tištěné podoby: | 31.07.2013 |
Datum proběhlé obhajoby: | 02.09.2013 |
Oponenti: | doc. Ing. Zdeněk Žabokrtský, Ph.D. |
Zásady pro vypracování |
Part-of-speech (POS) tagging is one of the most basic operations of computational linguistic. Since it helps to disambiguate syntactic categories (and possibly senses), POS are regularly used in various natural language processing (NLP) tasks such as parsing, sentence classifying, word sense disambiguation etc. The big challenge for POS tagging is the training data. Supervised algorithms for POS tagger perform well on resource-rich languages where manually annotated data is available. Unsupervised POS tagging, on the other hand, does not need any manually annotated data and particularly suitable for resource-poor languages.
The goals of the project include to study the current approaches for POS tagging for both resource-rich and resource-poor languages and develop taggers for resource-poor languages based on POS knowledge from resource-rich languages. |
Seznam odborné literatury |
(1) Berg-Kirkpatrick, Taylor, Alexandre Bouchard-Cote, John DeNero, and Dan Klein. 2010. Painless unsupervised learning with features. In Proceeding of HLT-NAACL, 582-590. Association for Computational Linguistics.
(2) Brants, Thorsten. 2000. TnT: A statistical part-of-speech tagger. In Proceedings of the Sixth Conference on Applied Natural Language Processing (ANLP '00),224-231, Seattle, Washington, USA. (3) Das, Dipanjan, and Slav Petrov. 2011. Unsupervised part-of-speech tagging with bilingual graph-based projections. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1 , HLT '11, 600-609. Association for Computational Linguistics. (4) Petrov, Slav, Dipanjan Das, and Ryan McDonald. 2012. A universal part-of-speech tagset. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12), Istanbul, Turkey. European Language Resources Association (ELRA). (5) Snyder, Benjamin, Tahira Naseem, Jacob Eisenstein, and Regina Barzilay. 2008. Unsupervised multilingual learning for POS tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '08), 1041-1050, Honolulu, Hawaii. |