SubjectsSubjects(version: 945)
Course, academic year 2016/2017
   Login via CAS
Statistical Methods in Natural Language Processing I - NPFL067
Title: Statistické metody zpracování přirozených jazyků I
Guaranteed by: Institute of Formal and Applied Linguistics (32-UFAL)
Faculty: Faculty of Mathematics and Physics
Actual: from 2016 to 2016
Semester: winter
E-Credits: 6
Hours per week, examination: winter s.:2/2, C+Ex [HT]
Capacity: unlimited
Min. number of students: unlimited
4EU+: no
Virtual mobility / capacity: no
State of the course: taught
Language: Czech, English
Teaching methods: full-time
Teaching methods: full-time
Additional information: https://ufal.mff.cuni.cz/courses/npfl067
Guarantor: prof. RNDr. Jan Hajič, Dr.
Class: DS, matematická lingvistika
Informatika Mgr. - Matematická lingvistika
Classification: Informatics > Computer and Formal Linguistics
Is co-requisite for: NPFL068
Annotation -
Last update: T_UFAL (20.05.2004)
Introduction to formal linguistics and the fundamentals of statistical natural language processing, including basics of Infromation Theory, Language MOdeling and Markov Models. Continues as Statistical Methods in Natural Language Processing II.
Literature - Czech
Last update: prof. RNDr. Jan Hajič, Dr. (02.10.2017)

Manning, C. D. and H. Schütze: Foundations of Statistical Natural Language Processing. The MIT Press. 1999. ISBN 0-262-13360-1.

Allen, J.: Natural Language Understanding. The Benajmins/Cummings Publishing Company Inc. 1994. ISBN 0-8053-0334-0.

Wall, L., Christiansen, T. and R. L. Schwartz: Programming PERL. O'Reilly. 1996. ISBN 1-56592-149-6.

Cover, T. M. and J. A. Thomas: Elements of Information Theory. Wiley. 1991. ISBN 0-471-06259-6.

Syllabus -
Last update: prof. RNDr. Jan Hajič, Dr. (02.10.2017)

Introduction. Course Overview: Intro to NLP. Main Issues.

The Very Basics on Probability Theory. Elements of Information Theory I. Elements of Information Theory II.

Language Modeling in General and the Noisy Channel Model. Smoothing and the EM algorithm.

Linguistics: Phonology and Morphology. Syntax (Phrase Structure vs. Dependency).

Word Classes and Lexicography. Mutual Information (the "pointwise" version). The t-score. The Chi-square test. Word Classes for NLP tasks. Parameter Estimation. The Partitioning Algorithm. Complexity Issues of Word Classes. Programming Tricks & Tips.

Markov models, Hidden Markov Models (HMMs). The Trellis & the Viterbi Algorithms. Estimating the Parameters of HMMs. The Forward-Backward Algorithm. Implementation Issues.

Maximum Entropy. Maximum Entropy Tagging. Feature Based Tagging. Results on Tagging Various Natural Languages.

 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html