SubjectsSubjects(version: 908)
Course, academic year 2022/2023
   Login via CAS
Introduction to Machine Learning with Python - NPFL129
Title: Úvod do strojového učení v Pythonu
Guaranteed by: Institute of Formal and Applied Linguistics (32-UFAL)
Faculty: Faculty of Mathematics and Physics
Actual: from 2020
Semester: winter
E-Credits: 5
Hours per week, examination: winter s.:2/2, C+Ex [HT]
Capacity: unlimited
Min. number of students: unlimited
Virtual mobility / capacity: no
State of the course: taught
Language: Czech, English
Teaching methods: full-time
Additional information: http://ufal.mff.cuni.cz/courses/npfl129
Guarantor: RNDr. Milan Straka, Ph.D.
Incompatibility : NPFL054
Interchangeability : NPFL054
Is incompatible with: NPFL054
Is interchangeable with: NPFL054
Annotation -
Last update: doc. Mgr. Barbora Vidová Hladká, Ph.D. (15.05.2019)
Machine learning is reaching notable success when solving complex tasks in many fields. This course serves as in introduction to basic machine learning concepts and techniques, focusing both on the theoretical foundation, and on implementation and utilization of machine learning algorithms in Python programming language. High attention is paid to the ability of application of the machine learning techniques on practical tasks, in which the students try to devise a solution with highest performance.
Aim of the course -
Last update: doc. Mgr. Barbora Vidová Hladká, Ph.D. (15.05.2019)

The goal of the course is to introduce basic concepts and methods of machine learning. The course will focus both on theory as well as on implementation of machine learning algorithms and the ability to apply machine learning techniques to practical tasks, using Python programming language.

Course completion requirements -
Last update: RNDr. Milan Straka, Ph.D. (10.05.2020)

Students pass the practicals by submitting sufficient number of assignments. The assignments are announced regularly through the whole semester (usually two per lecture) and are due in several weeks. Considering the rules for completing the practicals, it is not possible to retry passing it. Passing the practicals is not a requirement for going to the exam.

Literature -
Last update: RNDr. Milan Straka, Ph.D. (10.05.2020)
  • Christopher M. Bishop: Pattern Recognition and Machine Learning. Springer Verlag. 2006.
  • John Platt: Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. 1998.
  • Tianqi Chen, Carlos Guestrin: XGBoost: A Scalable Tree Boosting System. 2016.
  • https://scikit-learn.org/
Requirements to the exam -
Last update: RNDr. Milan Straka, Ph.D. (15.06.2020)

The exam is written and consists of questions randomly chosen from a publicly known list. The requirements of the exam correspond to the course syllabus, in the level of detail which was presented on the lectures.

Syllabus -
Last update: RNDr. Milan Straka, Ph.D. (04.01.2021)

Basic machine learning concepts

  • supervised learning, unsupervised learning, reinforcement learning
  • fitting, generalization, overfitting, regularization
  • data generating distribution, train/development/test set

Linear regression

  • analytical solution
  • solution based on stochastic gradient descent (SGD)

Classification

  • binary classification via perceptron
  • binary classification using logistic regression
  • multiclass classification using logistic regression
  • deriving sigmoid and softmax functions from the maximum entropy principle
  • classification with a multilayer perceptron (MLP)
  • naive Bayes classifier
  • maximum margin binary classifiers

Kernel methods

  • kernelized linear regression
  • Support vector machines (SVM) and their training with Sequencial minimization optimization algorithm (SMO)

Decision trees

  • classification and regression trees (CART)
  • random forests
  • gradient boosting decision trees (GBDT)

Clustering

  • K-Means algorithm
  • Gaussian mixture model

Dimensionality reduction

  • principal component analysis (PCA)

Training

  • dataset preparation, classification features design
  • constructing loss functions according to maximum likelihood estimation principle
  • first-order gradient methods (SGD) and second-order methods
  • regularization

Statistical testing

  • Student t-test
  • Chi-squared test
  • correlation coefficients
  • paired bootstrap test

Used Python libraries

  • numpy (n-dimensional array representation and their manipulation)
  • scikit-learn (construction of machine learning pipelines)
  • matplotlib (visualization)
Entry requirements -
Last update: RNDr. Milan Straka, Ph.D. (08.10.2021)

Basic programming skills in Python and basic knowledge of differential calculus and linear algebra (working with vectors and matrices) are required; knowledge of probability and statistics is recommended.

 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html