SubjectsSubjects(version: 908)
Course, academic year 2022/2023
   Login via CAS
Multilingual Natural Language Processing - NPFL120
Title: Mnohojazyčné počítačové zpracování jazyka
Guaranteed by: Institute of Formal and Applied Linguistics (32-UFAL)
Faculty: Faculty of Mathematics and Physics
Actual: from 2019
Semester: summer
E-Credits: 3
Hours per week, examination: summer s.:1/1, MC [HT]
Capacity: unlimited
Min. number of students: unlimited
Virtual mobility / capacity: no
State of the course: taught
Language: English, Czech
Teaching methods: full-time
Additional information:
Guarantor: RNDr. Daniel Zeman, Ph.D.
Mgr. Rudolf Rosa, Ph.D.
doc. RNDr. Ondřej Bojar, Ph.D.
Annotation -
Last update: doc. Mgr. Barbora Vidová Hladká, Ph.D. (03.05.2019)
The course focuses on multilingual aspects of natural language processing. It explains both the issues and the benefits of doing NLP in a multilingual setting, and shows possible approaches to use. We will target both dealing with multilingual variety in monolingual methods applied to multiple languages, as well as truly multilingual and crosslingual approaches which use resources in multiple languages at once. We will review and work with a range of freely available multilingual resources, both plaintext and annotated. The course has the form of a practical seminar in the computer lab.
Course completion requirements -
Last update: Mgr. Rudolf Rosa, Ph.D. (15.02.2018)

To pass the course, you will be required to actively participate in the classes and to submit all of the homework tasks. The quality of your homework solutions will determine your grade.

Literature -
Last update: doc. Mgr. Barbora Vidová Hladká, Ph.D. (29.01.2019)
  • HASPELMATH, Martin. The world atlas of language structures. Oxford University Press, 2005.
  • PETROV, Slav; DAS, Dipanjan; MCDONALD, Ryan. A universal part-of-speech tagset. In: LREC. 2012. p. 2089-2096.
  • ZEMAN, Daniel. Reusable Tagset Conversion Using Tagset Drivers. In: LREC. 2008. p. 213-218.
  • ZEMAN, Daniel, et al. HamleDT: To Parse or Not to Parse?. In: LREC. 2012. p. 2735-2741.
  • MCDONALD, Ryan; LERMAN, Kevin; PEREIRA, Fernando. Multilingual dependency analysis with a two-stage discriminative parser. In: CoNLL. 2006. p. 216-220.
  • NIVRE, Joakim, et al. Universal dependencies v1: A multilingual treebank collection. In: LREC. 2016. p. 1659-1666.
  • DAS, Dipanjan; PETROV, Slav. Unsupervised part-of-speech tagging with bilingual graph-based projections. In: ACL-HLT. 2011. p. 600-609.
  • ZEMAN, Daniel; RESNIK, Philip. Cross-Language Parser Adaptation between Related Languages. In: IJCNLP. 2008. p. 35-42.
  • TIEDEMANN, Jörg. Parallel Data, Tools and Interfaces in OPUS. In: LREC. 2012. p. 2214-2218.
  • AGIĆ, Željko; HOVY, Dirk; SØGAARD, Anders. If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages. In: ACL-IJCNLP. 2015. p. 268-272.
  • AGIĆ, Željko, et al. Multilingual projection for parsing truly low-resource languages. In: TACL. 2016. 301-312.
  • SØGAARD, Anders. Data point selection for cross-language adaptation of dependency parsers. In: ACL-HLT. 2011. p. 682-686.
  • TIEDEMANN, Jörg; AGIĆ, Željko; NIVRE, Joakim. Treebank translation for cross-lingual parser induction. In: CoNLL. 2014. p. 130-140.
  • FORCADA, Mikel L., et al. Apertium: a free/open-source platform for rule-based machine translation. In: Machine translation. 2011. p. 127-144.
  • JOHNSON, Melvin, et al. Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. In: arXiv preprint arXiv:1611.04558. 2016.
  • ZEMAN, Daniel: The World of Tokens, Tags and Trees. Studies in Computational and Theoretical Linguistics, vol. 19. ÚFAL, Praha, 2018, ISBN 978-80-88132-09-7.

Syllabus -
Last update: RNDr. Daniel Zeman, Ph.D. (05.05.2022)
  • Introduction to multilinguality (what it is, why it is hard to deal with, what it is good for, WALS)
  • Plain text (alphabets, transliteration, tokenization, language identification, language similarity)
  • Machine translation for multilingual processing (Apertium, OPUS, Bible, Watchtower, alignment algorithms, multilingual machine translation)
  • Morphology (morphological variability of languages, morphological annotation, Universal POS tags, Universal features, tagset conversions, cross-lingual tagging)
  • Syntax (syntactic variability of languages, harmonization of treebank annotations, Universal Dependencies; multilingual parsing, cross-lingual parsing)
  • Word embeddings, multilingual embeddings, contextual vector representations.

Charles University | Information system of Charles University |