SubjectsSubjects(version: 962)
Course, academic year 2024/2025
   Login via CAS
On Saturday 19th October 2024 there will be a shutdown of some components of the information system. Especially the work with files in Thesis modules will be particularly unavailable. Please postpone your requests for a later time.
Introduction to Machine Learning with Python - NPFL129
Title: Úvod do strojového učení v Pythonu
Guaranteed by: Institute of Formal and Applied Linguistics (32-UFAL)
Faculty: Faculty of Mathematics and Physics
Actual: from 2023
Semester: winter
E-Credits: 5
Hours per week, examination: winter s.:2/2, C+Ex [HT]
Capacity: unlimited
Min. number of students: unlimited
4EU+: no
Virtual mobility / capacity: no
State of the course: taught
Language: Czech, English
Teaching methods: full-time
Teaching methods: full-time
Additional information: http://ufal.mff.cuni.cz/courses/npfl129
Guarantor: Mgr. Jindřich Libovický, Ph.D.
Teacher(s): Ing. Zdeněk Kasner, Ph.D.
Mgr. Jindřich Libovický, Ph.D.
Mgr. Tomáš Musil
Incompatibility : NPFL054
Interchangeability : NPFL054
Is incompatible with: NPFL054
Is interchangeable with: NPFL054
Annotation -
Machine learning is reaching notable success when solving complex tasks in many fields. This course serves as in introduction to basic machine learning concepts and techniques, focusing both on the theoretical foundation, and on implementation and utilization of machine learning algorithms in Python programming language. High attention is paid to the ability of application of the machine learning techniques on practical tasks, in which the students try to devise a solution with highest performance.
Last update: Vidová Hladká Barbora, doc. Mgr., Ph.D. (15.05.2019)
Aim of the course -

After this course, students should…

  • Be able to reason about tasks/problems suitable for ML
  • Know when to use classification, regression, and clustering
  • Be able to choose from this method Linear and Logistic Regression, Multilayer Perceptron, Nearest Neighbors, Naive Bayes, Gradient Boosted Decision Trees, kk-means clustering
  • Think about learning as (mostly probabilistic) optimization on training data
  • Know how the ML methods learn, including theoretical explanation
  • Know how to properly evaluate ML
  • Think about generalization (and avoiding overfitting)
  • Be able to choose a suitable evaluation metric
  • Responsibly decide what model is better
  • Be able to implement ML algorithms on a conceptual level
  • Be able to use Scikit-learn to solve ML problems in Python
Last update: Libovický Jindřich, Mgr., Ph.D. (12.03.2024)
Course completion requirements -

Students pass the practicals by submitting sufficient number of assignments. The assignments are announced regularly through the whole semester (usually two per lecture) and are due in several weeks. Considering the rules for completing the practicals, it is not possible to retry passing it. Passing the practicals is not a requirement for going to the exam.

Last update: Straka Milan, RNDr., Ph.D. (10.05.2020)
Literature -
  • Christopher M. Bishop: Pattern Recognition and Machine Learning. Springer Verlag. 2006.
  • John Platt: Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. 1998.
  • Tianqi Chen, Carlos Guestrin: XGBoost: A Scalable Tree Boosting System. 2016.
  • https://scikit-learn.org/
Last update: Straka Milan, RNDr., Ph.D. (10.05.2020)
Requirements to the exam -

The exam is written and consists of questions randomly chosen from a publicly known list. The requirements of the exam correspond to the course syllabus, in the level of detail which was presented on the lectures.

Last update: Straka Milan, RNDr., Ph.D. (15.06.2020)
Syllabus -

Basic machine learning concepts

  • supervised learning, unsupervised learning, reinforcement learning
  • fitting, generalization, overfitting, regularization
  • data generating distribution, train/development/test set

Linear regression

  • analytical solution
  • a solution based on stochastic gradient descent (SGD)

Classification

  • binary classification via perceptron
  • binary classification using logistic regression
  • multiclass classification using logistic regression
  • deriving sigmoid and softmax functions from the maximum entropy principle
  • classification with a multilayer perceptron (MLP)
  • naive Bayes classifier
  • maximum margin binary classifiers
  • Support vector machines (SVM)

Text representation

  • TF-IDF
  • Word embeddings

Decision trees

  • classification and regression trees (CART)
  • random forests
  • gradient boosting decision trees (GBDT)

Clustering

  • K-Means algorithm

Dimensionality reduction

  • singular value decomposition
  • principal component analysis (PCA)

Training

  • dataset preparation, classification features design
  • constructing loss functions according to the maximum likelihood estimation principle
  • first-order gradient methods (SGD) and second-order methods
  • regularization

Statistical testing

  • Student t-test
  • Chi-squared test
  • correlation coefficients
  • paired bootstrap test

Used Python libraries

  • numpy (n-dimensional array representation and their manipulation)
  • scikit-learn (construction of machine learning pipelines)
  • matplotlib (visualization)

This course is also part of the inter-university programme prg.ai Minor. It pools the best of AI education in Prague to provide students with a deeper and broader insight into the field of artificial intelligence. More information is available at prg.ai/minor.

Last update: Libovický Jindřich, Mgr., Ph.D. (12.03.2024)
Entry requirements -

Basic programming skills in Python and basic knowledge of differential calculus and linear algebra (working with vectors and matrices) are required; knowledge of probability and statistics is recommended.

Last update: Straka Milan, RNDr., Ph.D. (08.10.2021)
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html