SubjectsSubjects(version: 945)
Course, academic year 2023/2024
   Login via CAS
Seminar on Data Mining - NAIL114
Title: Seminář dobývání znalostí
Guaranteed by: Department of Theoretical Computer Science and Mathematical Logic (32-KTIML)
Faculty: Faculty of Mathematics and Physics
Actual: from 2020
Semester: summer
E-Credits: 4
Hours per week, examination: summer s.:1/2, MC [HT]
Capacity: unlimited
Min. number of students: unlimited
4EU+: no
Virtual mobility / capacity: no
State of the course: not taught
Language: Czech, English
Teaching methods: full-time
Teaching methods: full-time
Guarantor: Mgr. Marta Vomlelová, Ph.D.
Annotation -
Last update: doc. RNDr. Pavel Töpfer, CSc. (30.01.2018)
Lectures introduce to machine learning tools and library functions usage. Participants of the seminar analyze a given data set and submit their results as a seminar work.
Course completion requirements -
Last update: doc. RNDr. Pavel Töpfer, CSc. (30.01.2018)

Students have to analyze a given data set, present the results and submit the analysis in a written form.

Literature -
Last update: doc. RNDr. Pavel Töpfer, CSc. (30.01.2018)

Willi Richert,ý Luis Pedro Coelho: Building Machine Learning Systems with Python,

Packt Publishing 2013

Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani: An Introduction to Statistical Learning with Applications in R, Springer 2013

Syllabus -
Last update: doc. RNDr. Pavel Töpfer, CSc. (30.01.2018)

The seminar provides an experience in data analysis. It extends the lecture Introduction to Machine Learning.

Lectures introduce to machine learning tools and library functions usage. Participants of the seminar analyze a given dataset and submit their results as a seminar work.

The lectures cover:

  • graphs (scatter plot, box plot and basic graphs and graph annotations)
  • groupby function and group statistics
  • simple classification and regression models
  • evaluation with respect to different error functions
  • ways to identify outliers, missing data handling.

According a specific dataset we may further focus at:

  • time series,
  • text tfidf vectorization,
  • clustering and apriori algorithm.

 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html