SubjectsSubjects(version: 964)
Course, academic year 2024/2025
   Login via CAS
Seminar on Data Mining - NAIL121
Title: Seminář dobývání znalostí
Guaranteed by: Department of Theoretical Computer Science and Mathematical Logic (32-KTIML)
Faculty: Faculty of Mathematics and Physics
Actual: from 2023
Semester: summer
E-Credits: 4
Hours per week, examination: summer s.:1/2, MC [HT]
Capacity: unlimited
Min. number of students: 1
4EU+: no
Virtual mobility / capacity: no
State of the course: taught
Language: Czech, English
Teaching methods: full-time
Guarantor: Mgr. Marta Vomlelová, Ph.D.
Teacher(s): Mgr. Marta Vomlelová, Ph.D.
Class: Informatika Bc.
Annotation -
Lectures introduce to machine learning tools and library functions usage. Participants of the seminar analyze a given data set and submit their results as a seminar work.
Last update: Šámal Robert, doc. Mgr., Ph.D. (01.06.2018)
Aim of the course -

The course provides basic experience with data preprocessing and machine learning algortithms.

Last update: Vomlelová Marta, Mgr., Ph.D. (14.05.2021)
Course completion requirements -

Students have to analyze a given data set, present the results and submit the analysis in a written form.

Last update: Šámal Robert, doc. Mgr., Ph.D. (01.06.2018)
Literature -

Willi Richert, Luis Pedro Coelho: Building Machine Learning Systems with Python, Packt Publishing 2013

Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani: An Introduction to Statistical Learning with Applications in R, Springer 2013

Last update: Vomlelová Marta, Mgr., Ph.D. (15.05.2024)
Syllabus -

The seminar provides an experience in data analysis. It extends the lecture Introduction to Machine Learning.

Lectures introduce to machine learning tools and library functions usage. Participants of the seminar analyze a given dataset and submit their results as a seminar work.

The lectures cover:

  • graphs (scatter plot, box plot and basic graphs and graph annotations)
  • groupby function and group statistics
  • simple classification and regression models
  • evaluation with respect to different error functions
  • ways to identify outliers, missing data handling.

According a specific dataset we may further focus at:

  • maps (geopandas),
  • time series,
  • text tfidf vectorization,
  • clustering and apriori algorithm.

Last update: Vomlelová Marta, Mgr., Ph.D. (15.05.2024)
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html