SubjectsSubjects(version: 875)
Course, academic year 2020/2021
Statistical Methods in Natural Language Processing II - NPFL068
Title: Statistické metody zpracování přirozených jazyků II
Guaranteed by: Institute of Formal and Applied Linguistics (32-UFAL)
Faculty: Faculty of Mathematics and Physics
Actual: from 2020
Semester: summer
E-Credits: 5
Hours per week, examination: summer s.:2/2 C+Ex [hours/week]
Capacity: unlimited
Min. number of students: unlimited
State of the course: taught
Language: Czech, English
Teaching methods: full-time
Additional information:
Guarantor: prof. RNDr. Jan Hajič, Dr.
doc. RNDr. Pavel Pecina, Ph.D.
Class: DS, matematická lingvistika
Informatika Mgr. - Matematická lingvistika
Classification: Informatics > Computer and Formal Linguistics
Co-requisite : NPFL067
Annotation -
Last update: T_UFAL (13.05.2014)
Continuation of Statistical Methods in Natural Language Processing I. Introduces the notion of linguistic experiment and its evaluation. The role of corpora in statistical NLP. Standard NLP tasks (tagging, phrase-structure and dependency parsing, generative and discriminative models) are explained and methods presented.
Course completion requirements -
Last update: doc. RNDr. Pavel Pecina, Ph.D. (10.06.2019)

Turning in two homeworks (each 33%), written exam (34%). "Zápočet" is not a prerequisite for taking the exam. To get "zápočet", homework grade must be at least 40 points (out of 100). Homework can be turned in max. three times, at the latest on the date announced on the course webpage. Every late day subtracts 5 points. Turning in the homework later than 10 days after the date, carries a constant penalty of 50 points.

Literature -
Last update: prof. RNDr. Jan Hajič, Dr. (28.10.2019)

Manning, C. D. and H. Schütze: Foundations of Statistical Natural Language Processing . The MIT Press. 1999. ISBN 0-262-13360-1.

Wall, L., Christiansen, T. and R. L. Schwartz: Programming PERL. O'Reilly. 1996. ISBN 1-56592-149-6.

Charniak, E.: Statistical Language Learning. The MIT Press. 1996. ISBN 0-262-53141-0.

Jelinek, F.: Statistical Methods for Speech Recognition. The MIT Press. 1998. ISBN 0-262-10066-5.

Sborníky z hlavních světových konferencí: ACL (vč. EMNLP/CoNLL), COLING.

Syllabus -
Last update: T_UFAL (20.05.2004)

Introduction. Course Overview.

Evaluation methodology (examples from tagging). Precision, Recall, Accuracy, F-measure. NL Corpora.

The task of Tagging. Tagsets, Morphology, Lemmatization. Morphological Analysis and Generation. Tagging methods. Manually designed Rules and Grammars. Statistical Methods (overview). HMM Tagging (Supervised, Unsupervised). Statistical Transformation Rule-Based Tagging.

Introduction to Parsing. Generative Grammars. Properties of Regular and Context-free Grammars. Non-statistical Parsing Algorithms (An Overview). Simple top-down parser with backtracking. Shift-reduce parser. Treebanks and Treebanking. Evaluation of Parsers.

Probabilistic Parsing. Introduction. PCFG Parameter Estimation. PCFG: Best parse. Probability of a string. Lexicalized PCFG.

Statistical Machine Translation (MT). Alignment and Parameter Estimation for MT.

Charles University | Information system of Charles University |