SubjectsSubjects(version: 845)
Course, academic year 2018/2019
   Login via CAS
Statistical Methods in Data Mining Systems - NDBI031
Title in English: Statistické metody v systémech pro dobývání znalostí z dat
Guaranteed by: Department of Theoretical Computer Science and Mathematical Logic (32-KTIML)
Faculty: Faculty of Mathematics and Physics
Actual: from 2016 to 2019
Semester: winter
E-Credits: 3
Hours per week, examination: winter s.:1/1 C+Ex [hours/week]
Capacity: unlimited
Min. number of students: unlimited
State of the course: taught
Language: Czech
Teaching methods: full-time
Additional information: http://www.cs.cas.cz/~martin/vyuka.html
Guarantor: doc. RNDr. Ing. Martin Holeňa, CSc.
Class: Informatika Mgr. - Matematická lingvistika
Classification: Informatics > Database Systems
Annotation -
Last update: T_KTI (05.04.2016)
Data mining relies methodologically on machine learning, statistics, and the theory of databases. This is the first of two lectures dealing with its connection to statistics. It reviews statistical methods implemented in key examples of three main kinds of commercial data mining system, as well as in one academic systems used in teaching data mining at several Czech universities, including ours. This lecture is freely continued by the summer term lecture NAIL105 Internet and Classification Methods.
Aim of the course - Czech
Last update: doc. RNDr. Ing. Martin Holeňa, CSc. (29.06.2019)

Naučit základní metody dobývání znalostí z dat založené na strojovém učení, statistice a teorii databází.

Course completion requirements - Czech
Last update: doc. RNDr. Ing. Martin Holeňa, CSc. (29.06.2019)

Předvedení vysledků cvičení.

Literature -
Last update: T_KTI (05.04.2016)

M. Berthold, D.J. Hand. Intelligent Data Analysis. Berlin, Springer, 1999

Teaching methods - Czech
Last update: HOLENA/MFF.CUNI.CZ (04.10.2008)

Jednou za 2 týdny 2 hodiny přednášky, ktere studenti se zájmem o získání zápočtu doplňují cvičeními v prostředí Matlab, s časovou náročností rovnež zhruba 2 hodiny za 2 týdny. Cvičení mohou studenti vypracovat samostatně doma a vyučujíciho podle potřeby kontaktovat kvůli konzultacím.

Requirements to the exam - Czech
Last update: HOLENA/MFF.CUNI.CZ (02.10.2008)

Předvedení vysledků cvičení.

Syllabus -
Last update: doc. RNDr. Ing. Martin Holeňa, CSc. (24.04.2006)

Data mining, which exists as a separate area at the overlap between mathematics and computer sience since the early nineties, relies methodologically on machine learning, statistics, and the theory of databases. Whereas machine learning and database methods are covered by other lectures, the present lecture is the first of two dealing with the connection between data mining and statistics. It reviews statistical methods implemented in key examples of three main kinds of commercial data mining system, as well as in one academic systems used in teaching data mining at several Czech universities, including ours. This lecture is freely continued by the summer term lecture DBI029: Statistical aspects of data mining.

  • Data mining and its connection to statistics
  • Main types of data mining systems
  • Statistical methods in Clementine, an example of a general data mining system
  • Statistical methods in DecisionSite, an example of a system for on-line decision support by means of data mining
  • Matlab as an example of a more universal system including data mining methods
  • Descriptive statistics in Matlab
  • Linear regression and its generalizations in Matlab
  • Multivariate statistical analysis in Matlab
  • Hypotheses testing in Matlab
  • 4FT-Miner - an academic data mining system combining observational logic and the analysis of four-fold tables
  • Quantifiers of observational logic based on parameter estimation
  • Quantifiers of observational logic based on hypotheses testing

 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html