SubjectsSubjects(version: 845)
Course, academic year 2018/2019
   Login via CAS
Prague Dependency Treebank - NPFL075
Title in English: Pražský závislostní korpus
Guaranteed by: Institute of Formal and Applied Linguistics (32-UFAL)
Faculty: Faculty of Mathematics and Physics
Actual: from 2018 to 2018
Semester: summer
E-Credits: 6
Hours per week, examination: summer s.:2/2 C+Ex [hours/week]
Capacity: unlimited
Min. number of students: unlimited
State of the course: taught
Language: Czech, English
Teaching methods: full-time
Guarantor: doc. RNDr. Markéta Lopatková, Ph.D.
RNDr. Jiří Mírovský, Ph.D.
Class: Informatika Mgr. - Matematická lingvistika
Classification: Informatics > Computer and Formal Linguistics
Annotation -
Last update: Mgr. Barbora Vidová Hladká, Ph.D. (29.01.2019)
The subject should make the students familiar with Prague Dependency Treebank (PDT 2.0) project, starting from its theoretical base, including particular layers of annotation and ending with the way how important linguistic phenomena are represented. Emphasis is also placed on annotation schemata and data format, on familiarization with useful tools and practical work with the treebank.
Course completion requirements -
Last update: doc. RNDr. Markéta Lopatková, Ph.D. (10.06.2019)

The course finishes with a written exam - the questions cover topics from the syllabus - a student has to gain at least 50% of the total score. Students pass the practicals by submitting all assignments. Passing the practicals is not a requirement for going to the exam.

Literature -
Last update: Mgr. Barbora Vidová Hladká, Ph.D. (29.01.2019)

Hajičová, E., Panevová, J., Sgall, P. (2002) Úvod do teoretické a počítačové lingvistiky, sv. I. Karolinum, Praha

Hajičová, E., Abeillé, A., Hajič, J., Mírovský, J., Urešová, Z. (2010) Treebank Annotation. Chapter in (eds. Indurkhya, N., Damerau, f.j.) Handbook of Natural Language Processing, Second Edition, CRC Press, Taylor and Francis Group, Boca Raton, pp. 167-188,

Hajič, J. (2014) Disambiguation of Rich Inflection (Computational Morphology of Czech). Karolinum, Charles Univeristy Press, Prague, see also

Anotace na analytické rovině. Návod pro anotátory. Technická zpráva ÚFAL TR-1997-03, Universita Karlova, 1997

see also

Mikulová, M., Bémová, A., Hajič, J., Hajičová, E., Havelka, J., Kolářová, V., Kučová, L.e, Lopatková, M., Pajas, P., Panevová, J., Ševčíková, M., Sgall, P., Štěpánek, J., Urešová, Z., Veselá, K., Žabokrtský, Z. (2007) Annotation on the tectogrammatical level in the Prague Dependency Treebank. Reference Version. Technical report no. 2007/3.1, ÚFAL, Charles Universit, see also

PDT Guide -

Nivre, J., de Marneffe, M.C., Ginter, F., Goldberg, Y., Hajič, ., Manning, C., McDonald, R., Petrov, S., Pyysalo, S.,

Silveira, N., Tsarfaty, R., Zeman, D. (2006) Universal Dependencies v1: A Multilingual Treebank Collection. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), ELRA, Paris, pp. 1659-1666

Universal Dependencies v2, see

Syllabus -
Last update: Mgr. Barbora Vidová Hladká, Ph.D. (29.01.2019)

1. Theoretical background - Functional generative description (system of layers, relation of composition and relations of form and function, dependency and non-dependency relations).

2. Morphological layer (tokenization, lemma, tag).

3. Analytical layer (dependency tree, analytical function, word order and projectivity).

4. Tectogrammatical layer (structure, functors, tectogrammatical lemma, valency, grammatemes, ellipses, coreference, reflexivity, topic-focus articulation, personal names, direct speech).

5. Universal Dependencies

6. Annotation schema, data format (XML).

7. Tools (xsh, TrEd, PML-TQ).

Charles University | Information system of Charles University |