SubjectsSubjects(version: 845)
Course, academic year 2018/2019
   Login via CAS
Language Data Resources - NPFL070
Title in English: Zdroje lingvistických dat
Guaranteed by: Institute of Formal and Applied Linguistics (32-UFAL)
Faculty: Faculty of Mathematics and Physics
Actual: from 2014 to 2018
Semester: summer
E-Credits: 5
Hours per week, examination: summer s.:1/2 MC [hours/week]
Capacity: unlimited
Min. number of students: unlimited
State of the course: taught
Language: Czech, English
Teaching methods: full-time
Guarantor: doc. Ing. Zdeněk Žabokrtský, Ph.D.
Mgr. Martin Popel, Ph.D.
Class: Informatika Mgr. - Matematická lingvistika
Classification: Informatics > Computer and Formal Linguistics
Annotation -
Last update: Mgr. Barbora Vidová Hladká, Ph.D. (25.01.2019)
The goal of the seminar is to provide students with the survey of the field of Language Resources. Selected types of linguistic annotations will be described, with emphasis on annotating textual data (morphological categories, constituency and dependency syntactic trees, anaphora, discourse structure, word-sense disambiguation, parallel-text alignment etc.) and lexical data (wordnets, translation dictionaries, valency lexicons etc.). Leading projects for English, Czech, and some other languages will be used for illustration.
Literature - Czech
Last update: Mgr. Barbora Vidová Hladká, Ph.D. (25.01.2019)

Vybrané články z konferencí (LREC,ACL atd.), technické zprávy ÚFAL/CKL.

(Selected conference papers (LREC,ACL etc.), UFAL/CKL technical reports.)

Syllabus -
Last update: Mgr. Barbora Vidová Hladká, Ph.D. (25.01.2019)

1. Introduction

  • motivation for building linguistically annotated data
  • annotation principles
  • classification of language resources, application perspective
  • technical support, encoding, data formats

2. Corpora

  • corpus typology, tag sets
  • Brown Corpus, Czech National Corpus
  • searching in corpora

3. Treebanks

  • constituency and dependency frameworks, mutual convertibility
  • Penn Treebank, Prague Dependency Treebank, Negra/Tiger
  • searching in treebanks

4. Computer Lexicography

  • types of lexical information
  • machine readable/tractable dictionaries
  • wordnets, valency lexicons, translation lexicons
  • Princeton Wordnet, EuroWordNet, FrameNet, PropBank, Vallex
  • dictionary production systems

5. Tectogrammatic level of the Prague Dependency Treebank

  • dependency tree, types of edges, inner structure of nodes
  • coreference
  • grammmatemes
  • information structure

Charles University | Information system of Charles University |