SubjectsSubjects(version: 944)
Course, academic year 2023/2024
   Login via CAS
Language Data Resources II - NPFL076
Title: Zdroje lingvistických dat II
Guaranteed by: Institute of Formal and Applied Linguistics (32-UFAL)
Faculty: Faculty of Mathematics and Physics
Actual: from 2010
Semester: summer
E-Credits: 3
Hours per week, examination: summer s.:0/2, MC [HT]
Capacity: unlimited
Min. number of students: unlimited
4EU+: no
Virtual mobility / capacity: no
State of the course: cancelled
Language: Czech
Teaching methods: full-time
Teaching methods: full-time
Guarantor: doc. Ing. Zdeněk Žabokrtský, Ph.D.
Class: Informatika Mgr. - Matematická lingvistika
Classification: Informatics > Computer and Formal Linguistics
Co-requisite : NPFL070
Annotation -
Last update: T_UFAL (10.05.2006)
The seminar is aimed at obtaining practical experience in application of knowledge from Language Resouces I in the Linux/Perl environment.
Literature - Czech
Last update: T_UFAL (10.05.2006)

Vybrané články z konferencí (LREC, ACL atd.), technické zprávy ÚFAL/CKL.

(Selected conference papers (LREC, ACL etc.), UFAL/CKL technical reports.)

Syllabus -
Last update: T_UFAL (10.05.2006)

1) Short introduction into programming language Perl

  • data structures
  • basic processing of textual data
  • Perl Best Practices

2) Language resources in XML

  • declaration (DTD, schemata)
  • XSL transformations
  • XPath queries
  • docbook

3) PDT 2.0 data processing

  • data formats used in PDT
  • btred/ntred batch processing of PDT 2.0 data

4) Processing of other languages resources

  • conversion from other formats/formalisms/languages (including the typologically distant languages)
  • mutual conversion of dependency and constituency structures
  • rapid development of syntactically tagged data for languages with scarce resources

5) Experiment evaluation

  • precision/recall in morphological and syntactical tagging
  • 10-fold cross evaluation, significance tests
  • BLEU score

Charles University | Information system of Charles University |