Subjects

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Statistical Methods in Natural Language Processing I - NPFL067

Title:	Statistické metody zpracování přirozených jazyků I
Guaranteed by:	Institute of Formal and Applied Linguistics (32-UFAL)
Faculty:	Faculty of Mathematics and Physics
Actual:	from 2025
Semester:	winter
E-Credits:	5
Hours per week, examination:	winter s.:2/2, C+Ex [HT]
Capacity:	unlimited
Min. number of students:	unlimited
4EU+:	no
Virtual mobility / capacity:	no
State of the course:	not taught
Language:	Czech, English
Teaching methods:	full-time
Additional information:	https://ufal.mff.cuni.cz/courses/npfl067

Guarantor:	prof. RNDr. Jan Hajič, Dr.
Class:	DS, matematická lingvistika Informatika Mgr. - Matematická lingvistika
Classification:	Informatics > Computer and Formal Linguistics
Interchangeability :	NPFL147
Is co-requisite for:	NPFX068, NPFL068
Is incompatible with:	NPFX067
Is interchangeable with:	NPFX067

Opinion survey results Schedule Noticeboard

Annotation -

Since the year 25/26, replaced by NPFL147! Introduction to formal linguistics and the fundamentals of statistical natural language processing, including basics of Infromation Theory, Language MOdeling and Markov Models. Continues as Statistical Methods in Natural Language Processing II.

Last update: Mírovský Jiří, RNDr., Ph.D. (02.09.2025)

Course completion requirements -

Turning in both homeworks (66,7 %), written exam (33,3 %). "Zápočet" is not a prerequisite for taking the exam. To get "zápočet", homework grade total must be at least 80 points (out of 200). Homework can be turned in max. three times, at the latest on the date announced on the course webpage. Every late day subtracts 5 points. Turning in the homework later than 10 days after the deadline, carries a constant penalty of 50 points.

Last update: Hajič Jan, prof. RNDr., Dr. (28.09.2020)

Literature -

Manning, C. D. and H. Schütze: Foundations of Statistical Natural Language Processing. The MIT Press. 1999. ISBN 0-262-13360-1.

Jurafsky, D. and J. Martin: Speech and Language Processing. Prentice Hall. Any edition (1st: 2000).

Cover, T. M. and J. A. Thomas: Elements of Information Theory. Wiley. 1991. ISBN 0-471-06259-6.

Last update: Hajič Jan, prof. RNDr., Dr. (28.09.2020)

Requirements to the exam -

There is one written exam, with 4-5 questions with sub-questions. The extent of the exam corresponds to the syllabus and to the material presented in the lectures and exercises. The net time allowed for finishing the exam is 60 minutes, and it is an open books type exam. Calculators are allowed. The grading is on the scale of 0 to 100 points. The weight of the points for the final grade is 33,3 %. The exam may be administered online.

Last update: Hajič Jan, prof. RNDr., Dr. (28.09.2020)

Syllabus -

Introduction. Course Overview: Intro to NLP. Main Issues.

The Very Basics on Probability Theory. Elements of Information Theory I. Elements of Information Theory II.

Language Modeling in General and the Noisy Channel Model. Smoothing and the EM algorithm.

Word Classes and Lexicography. Mutual Information (the "pointwise" version). The t-score. The Chi-square test. Word Classes for NLP tasks. Parameter Estimation. The Partitioning Algorithm. Complexity Issues of Word Classes. Programming Tricks & Tips.

Markov models, Hidden Markov Models (HMMs). The Trellis & the Viterbi Algorithms. Estimating the Parameters of HMMs. The Forward-Backward Algorithm. Implementation Issues.

Last update: Hajič Jan, prof. RNDr., Dr. (28.09.2020)