Seminar on Data Mining - NAIL121
Title: Seminář dobývání znalostí
Guaranteed by: Department of Theoretical Computer Science and Mathematical Logic (32-KTIML)
Faculty: Faculty of Mathematics and Physics
Actual: from 2023
Semester: summer
E-Credits: 4
Hours per week, examination: summer s.:1/2, MC [HT]
Capacity: unlimited
Min. number of students: 1
4EU+: no
Virtual mobility / capacity: no
State of the course: taught
Language: Czech
Teaching methods: full-time
Teaching methods: full-time
Guarantor: Mgr. Marta Vomlelová, Ph.D.
Class: Informatika Bc.
Opinion survey results   Examination dates   SS schedule   Noticeboard   
Annotation -
Last update: doc. Mgr. Robert Šámal, Ph.D. (01.06.2018)
Lectures introduce to machine learning tools and library functions usage. Participants of the seminar analyze a given data set and submit their results as a seminar work.
Aim of the course -
Last update: Mgr. Marta Vomlelová, Ph.D. (14.05.2021)

The course provides basic experience with data preprocessing and machine learning algortithms.

Course completion requirements -
Last update: doc. Mgr. Robert Šámal, Ph.D. (01.06.2018)

Students have to analyze a given data set, present the results and submit the analysis in a written form.

Literature -
Last update: doc. Mgr. Robert Šámal, Ph.D. (01.06.2018)

Willi Richert, Luis Pedro Coelho: Building Machine Learning Systems with Python,

Packt Publishing 2013

Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani: An Introduction to Statistical Learning with Applications in R, Springer 2013

Syllabus -
Last update: doc. Mgr. Robert Šámal, Ph.D. (01.06.2018)

The seminar provides an experience in data analysis. It extends the lecture Introduction to Machine Learning.

Lectures introduce to machine learning tools and library functions usage. Participants of the seminar analyze a given dataset and submit their results as a seminar work.

The lectures cover:

  • graphs (scatter plot, box plot and basic graphs and graph annotations)
  • groupby function and group statistics
  • simple classification and regression models
  • evaluation with respect to different error functions
  • ways to identify outliers, missing data handling.

According a specific dataset we may further focus at:

  • time series,
  • text tfidf vectorization,
  • clustering and apriori algorithm.