SubjectsSubjects(version: 837)
Course, academic year 2018/2019
   Login via CAS
Seminar on Data Mining - NAIL121
Title in English: Seminář dobývání znalostí
Guaranteed by: Department of Theoretical Computer Science and Mathematical Logic (32-KTIML)
Faculty: Faculty of Mathematics and Physics
Actual: from 2017
Semester: summer
E-Credits: 4
Hours per week, examination: summer s.:1/2 MC [hours/week]
Capacity: unlimited
Min. number of students: unlimited
State of the course: taught
Language: Czech
Teaching methods: full-time
Guarantor: Mgr. Marta Vomlelová, Ph.D.
Class: Informatika Bc.
Annotation -
Last update: doc. Mgr. Robert Šámal, Ph.D. (01.06.2018)
Lectures introduce to machine learning tools and library functions usage. Participants of the seminar analyze a given data set and submit their results as a seminar work.
Course completion requirements -
Last update: doc. Mgr. Robert Šámal, Ph.D. (01.06.2018)

Students have to analyze a given data set, present the results and submit the analysis in a written form.

Literature -
Last update: doc. Mgr. Robert Šámal, Ph.D. (01.06.2018)

Willi Richert, Luis Pedro Coelho: Building Machine Learning Systems with Python,

Packt Publishing 2013

Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani: An Introduction to Statistical Learning with Applications in R, Springer 2013

Syllabus -
Last update: doc. Mgr. Robert Šámal, Ph.D. (01.06.2018)

The seminar provides an experience in data analysis. It extends the lecture Introduction to Machine Learning.

Lectures introduce to machine learning tools and library functions usage. Participants of the seminar analyze a given dataset and submit their results as a seminar work.

The lectures cover:

  • graphs (scatter plot, box plot and basic graphs and graph annotations)
  • groupby function and group statistics
  • simple classification and regression models
  • evaluation with respect to different error functions
  • ways to identify outliers, missing data handling.

According a specific dataset we may further focus at:

  • time series,
  • text tfidf vectorization,
  • clustering and apriori algorithm.

 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html