Subjects

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Introduction to Machine Learning with Python - NPFL129

Title:	Úvod do strojového učení v Pythonu
Guaranteed by:	Institute of Formal and Applied Linguistics (32-UFAL)
Faculty:	Faculty of Mathematics and Physics
Actual:	from 2023 to 2024
Semester:	winter
E-Credits:	5
Hours per week, examination:	winter s.:2/2, C+Ex [HT]
Capacity:	unlimited
Min. number of students:	unlimited
4EU+:	no
Virtual mobility / capacity:	no
State of the course:	taught
Language:	Czech, English
Teaching methods:	full-time
Additional information:	http://ufal.mff.cuni.cz/courses/npfl129

Guarantor:	Mgr. Jindřich Libovický, Ph.D.
Teacher(s):	Ing. Zdeněk Kasner, Ph.D. Mgr. Jindřich Libovický, Ph.D. Mgr. Tomáš Musil
Class:	Informatika Bc. Informatika Mgr. - Matematická lingvistika M Bc. MMIT M Bc. MMIT > Povinně volitelné
Classification:	Informatics > Computer and Formal Linguistics
Incompatibility :	NPFL054
Interchangeability :	NPFL054
Is incompatible with:	NPFL054
Is interchangeable with:	NPFL054

Opinion survey results Examination dates WS schedule Noticeboard

Annotation -

Machine learning is reaching notable success when solving complex tasks in many fields. This course serves as in introduction to basic machine learning concepts and techniques, focusing both on the theoretical foundation, and on implementation and utilization of machine learning algorithms in Python programming language. High attention is paid to the ability of application of the machine learning techniques on practical tasks, in which the students try to devise a solution with highest performance.

Last update: Vidová Hladká Barbora, doc. Mgr., Ph.D. (15.05.2019)

Aim of the course -

After this course, students should…

Be able to reason about tasks/problems suitable for ML

Know when to use classification, regression, and clustering

Be able to choose from this method Linear and Logistic Regression, Multilayer Perceptron, Nearest Neighbors, Naive Bayes, Gradient Boosted Decision Trees, kk-means clustering

Think about learning as (mostly probabilistic) optimization on training data

Know how the ML methods learn, including theoretical explanation

Know how to properly evaluate ML

Think about generalization (and avoiding overfitting)

Be able to choose a suitable evaluation metric

Responsibly decide what model is better

Be able to implement ML algorithms on a conceptual level

Be able to use Scikit-learn to solve ML problems in Python

Last update: Libovický Jindřich, Mgr., Ph.D. (12.03.2024)

Course completion requirements -

Students pass the practicals by submitting sufficient number of assignments. The assignments are announced regularly through the whole semester (usually two per lecture) and are due in several weeks. Considering the rules for completing the practicals, it is not possible to retry passing it. Passing the practicals is not a requirement for going to the exam.

Last update: Straka Milan, RNDr., Ph.D. (10.05.2020)

Literature -

Christopher M. Bishop: Pattern Recognition and Machine Learning. Springer Verlag. 2006.

John Platt: Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. 1998.

Tianqi Chen, Carlos Guestrin: XGBoost: A Scalable Tree Boosting System. 2016.

https://scikit-learn.org/

Last update: Straka Milan, RNDr., Ph.D. (10.05.2020)

Requirements to the exam -

The exam is written and consists of questions randomly chosen from a publicly known list. The requirements of the exam correspond to the course syllabus, in the level of detail which was presented on the lectures.

Last update: Straka Milan, RNDr., Ph.D. (15.06.2020)

Syllabus -

Basic machine learning concepts

supervised learning, unsupervised learning, reinforcement learning

fitting, generalization, overfitting, regularization

data generating distribution, train/development/test set

Linear regression

analytical solution

a solution based on stochastic gradient descent (SGD)

Classification

binary classification via perceptron

binary classification using logistic regression

multiclass classification using logistic regression

deriving sigmoid and softmax functions from the maximum entropy principle

classification with a multilayer perceptron (MLP)

naive Bayes classifier

maximum margin binary classifiers

Support vector machines (SVM)

Text representation

TF-IDF

Word embeddings

Decision trees

classification and regression trees (CART)

random forests

gradient boosting decision trees (GBDT)

Clustering

K-Means algorithm

Dimensionality reduction

singular value decomposition

principal component analysis (PCA)

Training

dataset preparation, classification features design

constructing loss functions according to the maximum likelihood estimation principle

first-order gradient methods (SGD) and second-order methods

regularization

Statistical testing

Student t-test

Chi-squared test

correlation coefficients

paired bootstrap test

Used Python libraries

numpy (n-dimensional array representation and their manipulation)

scikit-learn (construction of machine learning pipelines)

matplotlib (visualization)

This course is also part of the inter-university programme prg.ai Minor. It pools the best of AI education in Prague to provide students with a deeper and broader insight into the field of artificial intelligence. More information is available at prg.ai/minor.

Last update: Libovický Jindřich, Mgr., Ph.D. (12.03.2024)

Entry requirements -

Basic programming skills in Python and basic knowledge of differential calculus and linear algebra (working with vectors and matrices) are required; knowledge of probability and statistics is recommended.

Last update: Straka Milan, RNDr., Ph.D. (08.10.2021)