Univerzalní morfologický značkovač
Thesis title in Czech: | Univerzální značkování slovních druhů |
---|---|
Thesis title in English: | Universal POS Tagger |
Academic year of topic announcement: | 2012/2013 |
Thesis type: | diploma thesis |
Thesis language: | angličtina |
Department: | Institute of Formal and Applied Linguistics (32-UFAL) |
Supervisor: | doc. RNDr. Pavel Pecina, Ph.D. |
Author: | hidden - assigned and confirmed by the Study Dept. |
Date of registration: | 16.07.2013 |
Date of assignment: | 16.07.2013 |
Confirmed by Study dept. on: | 01.08.2013 |
Date and time of defence: | 02.09.2013 00:00 |
Date of electronic submission: | 04.09.2013 |
Date of submission of printed version: | 31.07.2013 |
Date of proceeded defence: | 02.09.2013 |
Opponents: | doc. Ing. Zdeněk Žabokrtský, Ph.D. |
Guidelines |
Part-of-speech (POS) tagging is one of the most basic operations of computational linguistic. Since it helps to disambiguate syntactic categories (and possibly senses), POS are regularly used in various natural language processing (NLP) tasks such as parsing, sentence classifying, word sense disambiguation etc. The big challenge for POS tagging is the training data. Supervised algorithms for POS tagger perform well on resource-rich languages where manually annotated data is available. Unsupervised POS tagging, on the other hand, does not need any manually annotated data and particularly suitable for resource-poor languages.
The goals of the project include to study the current approaches for POS tagging for both resource-rich and resource-poor languages and develop taggers for resource-poor languages based on POS knowledge from resource-rich languages. |
References |
(1) Berg-Kirkpatrick, Taylor, Alexandre Bouchard-Cote, John DeNero, and Dan Klein. 2010. Painless unsupervised learning with features. In Proceeding of HLT-NAACL, 582-590. Association for Computational Linguistics.
(2) Brants, Thorsten. 2000. TnT: A statistical part-of-speech tagger. In Proceedings of the Sixth Conference on Applied Natural Language Processing (ANLP '00),224-231, Seattle, Washington, USA. (3) Das, Dipanjan, and Slav Petrov. 2011. Unsupervised part-of-speech tagging with bilingual graph-based projections. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1 , HLT '11, 600-609. Association for Computational Linguistics. (4) Petrov, Slav, Dipanjan Das, and Ryan McDonald. 2012. A universal part-of-speech tagset. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12), Istanbul, Turkey. European Language Resources Association (ELRA). (5) Snyder, Benjamin, Tahira Naseem, Jacob Eisenstein, and Regina Barzilay. 2008. Unsupervised multilingual learning for POS tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '08), 1041-1050, Honolulu, Hawaii. |