SubjectsSubjects(version: 962)
Course, academic year 2024/2025
   Login via CAS
Statistical Methods in Natural Language Processing II - NPFL068
Title: Statistické metody zpracování přirozených jazyků II
Guaranteed by: Institute of Formal and Applied Linguistics (32-UFAL)
Faculty: Faculty of Mathematics and Physics
Actual: from 2023
Semester: summer
E-Credits: 5
Hours per week, examination: summer s.:2/2, C+Ex [HT]
Capacity: unlimited
Min. number of students: unlimited
4EU+: no
Virtual mobility / capacity: no
State of the course: taught
Language: Czech, English
Teaching methods: full-time
Teaching methods: full-time
Additional information: https://ufal.mff.cuni.cz/courses/npfl068
Guarantor: prof. RNDr. Jan Hajič, Dr.
Class: DS, matematická lingvistika
Informatika Mgr. - Matematická lingvistika
Classification: Informatics > Computer and Formal Linguistics
Co-requisite : NPFL067
Is incompatible with: NPFX068
Is interchangeable with: NPFX068
Annotation -
Continuation of Statistical Methods in Natural Language Processing I. Introduces the notion of linguistic experiment and its evaluation. The role of corpora in statistical NLP. Standard NLP tasks (tagging, phrase-structure and dependency parsing, generative and discriminative models) are explained and methods presented.
Last update: T_UFAL (13.05.2014)
Course completion requirements -

Turning in one homework (50% of the grade), written exam (50%). "Zápočet" is not a prerequisite for taking the exam. To get "zápočet", homework grade must be at least 1 point (out of 100). Homework can be turned in max. three times, at the latest on the date announced on the course webpage. Every late day subtracts 5 points. Turning in the homework later than 10 days after the deadline, carries a constant penalty of 50 points.

Last update: Hajič Jan, prof. RNDr., Dr. (02.03.2021)
Literature -

Manning, C. D. and H. Schütze: Foundations of Statistical Natural Language Processing . The MIT Press. 1999. ISBN 0-262-13360-1.

Wall, L., Christiansen, T. and R. L. Schwartz: Programming PERL. O'Reilly. 1996. ISBN 1-56592-149-6.

Charniak, E.: Statistical Language Learning. The MIT Press. 1996. ISBN 0-262-53141-0.

Jelinek, F.: Statistical Methods for Speech Recognition. The MIT Press. 1998. ISBN 0-262-10066-5.

McDonald, R. et al.: Non-projective dependency parsing using spanning tree algorithms. 2005. EMNLP conference proceedings, s. 523-530.

Sborníky z hlavních světových konferencí: ACL (vč. EMNLP/CoNLL), COLING.

Last update: Hajič Jan, prof. RNDr., Dr. (02.03.2021)
Syllabus -

Introduction. Course Overview.

Evaluation methodology (examples from tagging). Precision, Recall, Accuracy, F-measure. NL Corpora.

The task of Tagging. Tagsets, Morphology, Lemmatization. Morphological Analysis and Generation. Tagging methods. Manually designed Rules and Grammars. Statistical Methods (overview). HMM Tagging (Supervised, Unsupervised). Statistical Transformation Rule-Based Tagging.

Introduction to Parsing. Generative Grammars. Properties of Regular and Context-free Grammars. Non-statistical Parsing Algorithms (An Overview). Simple top-down parser with backtracking. Shift-reduce parser. Treebanks and Treebanking. Evaluation of Parsers.

Probabilistic Parsing. Introduction. PCFG Parameter Estimation. PCFG: Best parse. Probability of a string. Lexicalized PCFG. Dependency parsing.

Statistical Machine Translation (MT). Alignment and Parameter Estimation for MT.

Last update: Hajič Jan, prof. RNDr., Dr. (02.03.2021)
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html