SubjectsSubjects(version: 978)
Course, academic year 2025/2026
   Login via CAS
   
Statistical Methods in Natural Language Processing - NPFL147
Title: Statistické metody zpracování přirozených jazyků
Guaranteed by: Institute of Formal and Applied Linguistics (32-UFAL)
Faculty: Faculty of Mathematics and Physics
Actual: from 2025
Semester: winter
E-Credits: 6
Hours per week, examination: winter s.:2/2, C+Ex [HT]
Capacity: unlimited
Min. number of students: unlimited
4EU+: no
Virtual mobility / capacity: no
State of the course: taught
Language: Czech, English
Teaching methods: full-time
Additional information: https://ufal.mff.cuni.cz/courses/npfl147
Guarantor: doc. RNDr. Pavel Pecina, Ph.D.
Teacher(s): Mgr. Jindřich Helcl, Ph.D.
doc. RNDr. Pavel Pecina, Ph.D.
Class: DS, matematická lingvistika
Informatika Mgr. - Matematická lingvistika
Classification: Informatics > Computer and Formal Linguistics
Is interchangeable with: NPFL068, NPFL067
Annotation -
The aim is to familiarize students with basic concepts of computational linguistics and the basics of probabilistic and statistical methods for language modeling.
Last update: Mírovský Jiří, RNDr., Ph.D. (23.05.2025)
Course completion requirements -

To pass the course, both the course credit and the exam must be completed.

The course credit will be awarded upon completion of the homework assignments.

The final grade will be based on the results of the exam and the homework assignments.

The open-book exam is in written form. Students are allowed to use a textbook, lecture slide printouts, or the internet. The exam carries the same weight in the final grade as one homework assignment.

Last update: Mírovský Jiří, RNDr., Ph.D. (23.05.2025)
Literature -

Jurafsky, D. and J. Martin: Speech and Language Processing. Prentice Hall. 3rd edition, 2025.

Cover, T. M. and J. A. Thomas: Elements of Information Theory. Wiley. 1991. ISBN 0-471-06259-6.

Manning, C. D. and H. Schütze: Foundations of Statistical Natural Language Processing. The MIT Press. 1999. ISBN 0-262-13360-1.

Last update: Mírovský Jiří, RNDr., Ph.D. (23.05.2025)
Syllabus -

1. Introduction, Probability, Essential Information Theory

2. Statistical language modelling (n-gram)

3. Statistical properties of words

4. Word embeddings

5. Hidden Markov models, Tagging

Last update: Mírovský Jiří, RNDr., Ph.D. (23.05.2025)
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html