Thesis (Selection of subject)Thesis (Selection of subject)(version: 368)
Thesis details
   Login via CAS
Univerzalní morfologický značkovač
Thesis title in Czech: Univerzální značkování slovních druhů
Thesis title in English: Universal POS Tagger
Academic year of topic announcement: 2012/2013
Thesis type: diploma thesis
Thesis language: angličtina
Department: Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor: doc. RNDr. Pavel Pecina, Ph.D.
Author: hidden - assigned and confirmed by the Study Dept.
Date of registration: 16.07.2013
Date of assignment: 16.07.2013
Confirmed by Study dept. on: 01.08.2013
Date and time of defence: 02.09.2013 00:00
Date of electronic submission:04.09.2013
Date of submission of printed version:31.07.2013
Date of proceeded defence: 02.09.2013
Opponents: doc. Ing. Zdeněk Žabokrtský, Ph.D.
 
 
 
Guidelines
Part-of-speech (POS) tagging is one of the most basic operations of computational linguistic. Since it helps to disambiguate syntactic categories (and possibly senses), POS are regularly used in various natural language processing (NLP) tasks such as parsing, sentence classifying, word sense disambiguation etc. The big challenge for POS tagging is the training data. Supervised algorithms for POS tagger perform well on resource-rich languages where manually annotated data is available. Unsupervised POS tagging, on the other hand, does not need any manually annotated data and particularly suitable for resource-poor languages.

The goals of the project include to study the current approaches for POS tagging for both resource-rich and resource-poor languages and develop taggers for resource-poor languages based on POS knowledge from resource-rich languages.
References
(1) Berg-Kirkpatrick, Taylor, Alexandre Bouchard-Cote, John DeNero, and Dan Klein. 2010. Painless unsupervised learning with features. In Proceeding of HLT-NAACL, 582-590. Association for Computational Linguistics.

(2) Brants, Thorsten. 2000. TnT: A statistical part-of-speech tagger. In Proceedings of the Sixth Conference on Applied Natural Language Processing (ANLP '00),224-231, Seattle, Washington, USA.

(3) Das, Dipanjan, and Slav Petrov. 2011. Unsupervised part-of-speech tagging with bilingual graph-based projections. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1 , HLT '11, 600-609. Association for Computational Linguistics.

(4) Petrov, Slav, Dipanjan Das, and Ryan McDonald. 2012. A universal part-of-speech tagset. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12), Istanbul, Turkey. European Language Resources Association (ELRA).

(5) Snyder, Benjamin, Tahira Naseem, Jacob Eisenstein, and Regina Barzilay. 2008. Unsupervised multilingual learning for POS tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '08), 1041-1050, Honolulu, Hawaii.
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html