SubjectsSubjects(version: 944)
Course, academic year 2023/2024
   Login via CAS
Statistical Methods in Natural Language Processing II - NPFX068
Title: Statistické metody zpracování přirozených jazyků II
Guaranteed by: Student Affairs Department (32-STUD)
Faculty: Faculty of Mathematics and Physics
Actual: from 2019
Semester: summer
E-Credits: 6
Hours per week, examination: summer s.:2/2, C+Ex [HT]
Capacity: unlimited
Min. number of students: unlimited
4EU+: no
Virtual mobility / capacity: no
State of the course: taught
Language: Czech
Teaching methods: full-time
Teaching methods: full-time
Is provided by: NPFL068
Additional information:
Guarantor: prof. RNDr. Jan Hajič, Dr.
doc. RNDr. Pavel Pecina, Ph.D.
Class: DS, matematická lingvistika
Informatika Mgr. - Matematická lingvistika
Classification: Informatics > Computer and Formal Linguistics
Pre-requisite : {NXXX011, NXXX012, NXXX013, NXXX038, NXXX039, NXXX040, NXXX067, NXXX069, NXXX070, NXXX071}
Co-requisite : NPFL067
Incompatibility : NPFL068
Interchangeability : NPFL068
Annotation -
Last update: T_UFAL (13.05.2014)
Continuation of Statistical Methods in Natural Language Processing I. Introduces the notion of linguistic experiment and its evaluation. The role of corpora in statistical NLP. Standard NLP tasks (tagging, phrase-structure and dependency parsing, generative and discriminative models) are explained and methods presented.
Course completion requirements -
Last update: prof. RNDr. Jan Hajič, Dr. (02.03.2021)

Turning in one homework (50% of the grade), written exam (50%). "Zápočet" is not a prerequisite for taking the exam. To get "zápočet", homework grade must be at least 1 point (out of 100). Homework can be turned in max. three times, at the latest on the date announced on the course webpage. Every late day subtracts 5 points. Turning in the homework later than 10 days after the deadline, carries a constant penalty of 50 points.

Literature -
Last update: prof. RNDr. Jan Hajič, Dr. (02.03.2021)

Manning, C. D. and H. Schütze: Foundations of Statistical Natural Language Processing . The MIT Press. 1999. ISBN 0-262-13360-1.

Wall, L., Christiansen, T. and R. L. Schwartz: Programming PERL. O'Reilly. 1996. ISBN 1-56592-149-6.

Charniak, E.: Statistical Language Learning. The MIT Press. 1996. ISBN 0-262-53141-0.

Jelinek, F.: Statistical Methods for Speech Recognition. The MIT Press. 1998. ISBN 0-262-10066-5.

McDonald, R. et al.: Non-projective dependency parsing using spanning tree algorithms. 2005. EMNLP conference proceedings, s. 523-530.

Sborníky z hlavních světových konferencí: ACL (vč. EMNLP/CoNLL), COLING.

Syllabus -
Last update: prof. RNDr. Jan Hajič, Dr. (02.03.2021)

Introduction. Course Overview.

Evaluation methodology (examples from tagging). Precision, Recall, Accuracy, F-measure. NL Corpora.

The task of Tagging. Tagsets, Morphology, Lemmatization. Morphological Analysis and Generation. Tagging methods. Manually designed Rules and Grammars. Statistical Methods (overview). HMM Tagging (Supervised, Unsupervised). Statistical Transformation Rule-Based Tagging.

Introduction to Parsing. Generative Grammars. Properties of Regular and Context-free Grammars. Non-statistical Parsing Algorithms (An Overview). Simple top-down parser with backtracking. Shift-reduce parser. Treebanks and Treebanking. Evaluation of Parsers.

Probabilistic Parsing. Introduction. PCFG Parameter Estimation. PCFG: Best parse. Probability of a string. Lexicalized PCFG. Dependency parsing.

Statistical Machine Translation (MT). Alignment and Parameter Estimation for MT.

Charles University | Information system of Charles University |