Subjects

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Introduction to Machine Learning with R - NPFL054

Title:	Úvod do strojového učení v systému R
Guaranteed by:	Institute of Formal and Applied Linguistics (32-UFAL)
Faculty:	Faculty of Mathematics and Physics
Actual:	from 2024
Semester:	summer
E-Credits:	5
Hours per week, examination:	summer s.:2/2, C+Ex [HT]
Capacity:	unlimited
Min. number of students:	unlimited
4EU+:	no
Virtual mobility / capacity:	no
State of the course:	cancelled
Language:	Czech, English
Teaching methods:	full-time
Additional information:	https://ufal.mff.cuni.cz/course/npfl054

Guarantor:	doc. Mgr. Barbora Vidová Hladká, Ph.D. RNDr. Martin Holub, Ph.D.
Class:	DS, matematická lingvistika Informatika Bc. Informatika Mgr. - Matematická lingvistika
Classification:	Informatics > Informatics, Software Applications, Computer Graphics and Geometry, Database Systems, Didactics of Informatics, Discrete Mathematics, External Subjects, General Subjects, Computer and Formal Linguistics, Optimalization, Programming, Software Engineering, Theoretical Computer Science, Computer and Formal Linguistics
Incompatibility :	NPFL129
Interchangeability :	NPFL129
Is incompatible with:	NPFL129
Is interchangeable with:	NPFL129

Opinion survey results Schedule Noticeboard

Annotation -

Lectures cover both theoretical background and practical algorithms of Machine Learning (ML). The emphasis is placed on comprehensive understanding of the ML process, which includes data analysis, choice of ML algorithm, learning parameters tuning, statistical evaluation and model assessment. Lab sessions aim at practical experience with ML tasks using existing R libraries. Homework assignments are practical exercises using R. The last assignment is the most extensive and includes comprehensive processing of a typical, not very demanding problem, and writing a report on solution variants and t

Last update: Vidová Hladká Barbora, doc. Mgr., Ph.D. (15.05.2020)

Aim of the course -

The aim of the course is to present the Machine Learning process from both theoretical and practical point of view. Students get familiar with the theoretical foundations of selected algorithms and learn to practically solve Machine Learning problems using libraries of the statistical system R. Students must be able to comprehensively solve an example machine learning problem and analyze and describe solution variants and their evaluation.

Last update: Vidová Hladká Barbora, doc. Mgr., Ph.D. (15.05.2020)

Course completion requirements -

During the term students have to 1) present easy homework, 2) submit two homework assignments so that the total score exceeds the required score limit, and 3) pass two written tests so that the total score exceeds the required score limit.

Obtaining the course credit is a prerequisite for taking the exam.

More details about homework assignments and tests are available on the course web site.

Last update: Vidová Hladká Barbora, doc. Mgr., Ph.D. (29.04.2021)

Literature -

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani: An Introduction to Statistical Learning. Springer, 2013.

Lantz, Brett: Machine Learning with R. Packt Publishing, 2013.

Last update: Vidová Hladká Barbora, doc. Mgr., Ph.D. (15.05.2020)

Requirements to the exam -

The exam is oral. However, the results of written tests and homework assignments are taken into account. Obtaining the course credit is a prerequisite for taking the exam.

The examination requirements correspond to the course syllabus. More details are available on the course web site.

Last update: Vidová Hladká Barbora, doc. Mgr., Ph.D. (15.05.2020)

Syllabus -

Machine learning - basic concepts, examples of practical applications, theoretical foundations. Supervised and unsupervised learning. Classification and regression tasks. Classification into two, or more classes. Training and test examples. Feature vectors. Target variable and prediction function. Machine learning development process. Curse of dimensionality. Clustering.

Decision tree learning. Learning algorithm, splitting criteria and pruning. Random forests.

Linear and logistic regression. Least squares methods. Discriminative classifiers.

Instance-based learning. k-NN algoritmus.

Naive Bayes classifier. Bayesian belief networks.

Support Vector Machines. Large and soft margin classifier. Kernel functions.

Ensemble methods. Unstable learning algorithms. Bagging and boosting. AdaBoost algorithm.

Parameters in machine learning. Hyperparameters tuning. Searching parameter space. Gradient descent algorithm. Maximum likelihood estimation.

Experiment evaluation. Working with development and test data. Sample error, generalization error. Cross-validation, leave-one-out method. Bootstrap method. Performance measures. Evaluation of binary classifiers. ROC curve.

Statistical tests. Statistical hypotheses, one-sample and two-sample t-tests, chi-square tests. Significance level, p-value. Using statistical tests for classifier evaluation. Confidence intervals.

Overfitting. How to recognize and avoid. Regularization. Bias-variance decomposition.

General principles of feature selection. Feature selection using information gain, greedy algorithms.

Dimensionality reduction, Principal Component Analysis.

Foundations of Neural Networks. Single Perceptron, Single Layer Perceptron. The architecture of multi-layer feed-forward models and the idea of back-propagation training. Remarks on deep learning.

Last update: Vidová Hladká Barbora, doc. Mgr., Ph.D. (15.05.2020)

Strojové učení - základní koncepty, ukázky praktických aplikací, teoretické základy. Učení s učitelem, učení bez učitele. Klasifikační a regresní úlohy. Klasifikace do dvou nebo více tříd. Trénovací a testovací příklady. Vektory příznaků. Cílový atribut a predikční funkce. Vývojový cyklus strojového učení. Prokletí dimenzionality. Metody shlukování.

Rozhodovací stromy. Algoritmus učení, kritéria větvení a prořezávání. Náhodné lesy.

Lineární a logistická regrese. Metoda nejmenších čtverců. Diskriminativní klasifikátor.

Učení založené na příkladech. Algoritmus k-NN.

Naivní Bayesův klasifikátor. Bayesovské sítě.

Metoda podpůrných vektorů. Klasifikátor pro lineárně separabilní a neseparabilní třídy. Kernelové funkce.

Metody pro kombinaci prediktorů. Nestabilní algoritmy učení. Bagging a boosting. Algoritmus AdaBoost.

Parametry ve strojového učení, ladění hyperparametrů. Prohledávání prostoru parametrů. Metoda největšího spádu. Metoda maximální věrohodnosti.

Vyhodnocování experimentů. Práce s testovacími daty. Výběrová chyba, generalizační chyba. Křížová validace, metoda leave-one-out. Metoda bootstrap. Míry úspěšnosti. Vyhodnocování binárních klasifikátorů. Křivka ROC.

Statistické testy. Statistické hypotézy, jednovýběrový a dvouvýběrový t-test, chí-kvadrát testy. Hladina významnosti, p-hodnota. Použití statistických testů pro vyhodnocování klasifikátorů. Intervaly spolehlivosti.

Přetrénování. Jak odhalit a zabránit. Regularizace. Dekompozice chyby modelu na vychýlení a rozptyl.

Obecné principy selekce příznaků. Výběr příznaků pomocí informačního zisku, hladové algoritmy. Redukce dimenze, analýza hlavních komponent.

Základy neuronových sítí. Jednoduchý perceptron. Neuronové sítě s jednou skrytou vrstvou. Vícevrstvé dopředné modely, algoritmus zpětné propagace. Poznámky k hlubokému učení.

Last update: Vidová Hladká Barbora, doc. Mgr., Ph.D. (15.05.2020)