SubjectsSubjects(version: 964)
Course, academic year 2024/2025
   Login via CAS
Unsupervised Machine Learning in NLP - NPFL097
Title: Neřízené strojové učení v NLP
Guaranteed by: Institute of Formal and Applied Linguistics (32-UFAL)
Faculty: Faculty of Mathematics and Physics
Actual: from 2020
Semester: winter
E-Credits: 3
Hours per week, examination: winter s.:1/1, C [HT]
Capacity: unlimited
Min. number of students: unlimited
4EU+: no
Virtual mobility / capacity: no
State of the course: taught
Language: Czech, English
Teaching methods: full-time
Additional information: http://ufal.mff.cuni.cz/courses/npfl097
Guarantor: RNDr. David Mareček, Ph.D.
Teacher(s): RNDr. David Mareček, Ph.D.
Class: Informatika Mgr. - volitelný
Classification: Informatics > Computer and Formal Linguistics
Annotation -
The goal of the course is to introduce basic methods of unsupervised machine learning and their applications in natural language processing. We will discuss methods like Bayesian inference, Expectation-Maximization, Cluster analysis, methods using neural networks and other currently used methods. Selected applications will be discussed in detail and implemented at the lab sessions.
Last update: Vidová Hladká Barbora, doc. Mgr., Ph.D. (25.04.2019)
Course completion requirements -

To get the credit, students are required to implement and deliver in time (usually three) programming assignments. Missing points can be obtained in the final test.

Last update: Mareček David, RNDr., Ph.D. (05.05.2022)
Literature -

Christopher Bishop: Pattern Recognition and Machine Learning, Springer-Verlag New York, 2006

Kevin P. Murphy: Machine Learning: A Probabilistic Perspective, The MIT Press, Cambridge, Massachusetts, 2012

Kar Wi Lim, Wray Buntine, Changyou Chen, Lan Du: Nonparametric Bayesian topic modelling with the hierarchical Pitman-Yor processes, International Journal of Approximate Reasoning 78, Elsevier, 2016

Kevin Knight: Bayesian Inference with Tears, 2009, http://www.isi.edu/natural-language/people/bayes-with-tears.pdf

Last update: Mareček David, RNDr., Ph.D. (24.04.2019)
Syllabus -

1. Introduction

2. Beta-Bernouli and Dirichlet-Categorial models

3. Modeling document collections, Categorical Mixture models, Expectation-Maximization

4. Gibbs Sampling, Latent Dirichlet allocation

5. Unsupervised Text Segmentation

6. Unsupervised tagging, Word alignment, Unsupervised parsing

7. K-means, Mixture of Gaussians, Hierarchical clustering, evaluation

8. T-SNE, Principal Component Analysis, Independent Component Analysis

9. Linguistic Interpretation of Neural Networks

Last update: Mareček David, RNDr., Ph.D. (05.05.2022)
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html