SubjectsSubjects(version: 945)
Course, academic year 2023/2024
   Login via CAS
Statistical Methods in Data Mining Systems - NDBI031
Title: Statistické metody v systémech pro dobývání znalostí z dat
Guaranteed by: Department of Theoretical Computer Science and Mathematical Logic (32-KTIML)
Faculty: Faculty of Mathematics and Physics
Actual: from 2020
Semester: winter
E-Credits: 2
Hours per week, examination: winter s.:1/1, C+Ex [HT]
Capacity: unlimited
Min. number of students: unlimited
4EU+: no
Virtual mobility / capacity: no
State of the course: taught
Language: Czech
Teaching methods: full-time
Teaching methods: full-time
Additional information: http://www.cs.cas.cz/~martin/vyuka.html
Guarantor: prof. RNDr. Ing. Martin Holeňa, CSc.
Class: Informatika Mgr. - Matematická lingvistika
Classification: Informatics > Database Systems
Is incompatible with: NDBX031
Is interchangeable with: NDBX031
Annotation -
Last update: T_KTI (05.04.2016)
Data mining relies methodologically on machine learning, statistics, and the theory of databases. This is the first of two lectures dealing with its connection to statistics. It reviews statistical methods implemented in key examples of three main kinds of commercial data mining system, as well as in one academic systems used in teaching data mining at several Czech universities, including ours. This lecture is freely continued by the summer term lecture NAIL105 Internet and Classification Methods.
Aim of the course -
Last update: prof. RNDr. Ing. Martin Holeňa, CSc. (29.06.2019)

Teach the basic statistical methods for data mining.

Course completion requirements -
Last update: prof. RNDr. Ing. Martin Holeňa, CSc. (29.06.2019)

Presenting the results of homeworks from seminars.

Literature -
Last update: T_KTI (05.04.2016)

M. Berthold, D.J. Hand. Intelligent Data Analysis. Berlin, Springer, 1999

Teaching methods - Czech
Last update: HOLENA/MFF.CUNI.CZ (04.10.2008)

Jednou za 2 týdny 2 hodiny přednášky, ktere studenti se zájmem o získání zápočtu doplňují cvičeními v prostředí Matlab, s časovou náročností rovnež zhruba 2 hodiny za 2 týdny. Cvičení mohou studenti vypracovat samostatně doma a vyučujíciho podle potřeby kontaktovat kvůli konzultacím.

Requirements to the exam - Czech
Last update: HOLENA/MFF.CUNI.CZ (02.10.2008)

Předvedení vysledků cvičení.

Syllabus -
Last update: prof. RNDr. Ing. Martin Holeňa, CSc. (24.04.2006)

Data mining, which exists as a separate area at the overlap between mathematics and computer sience since the early nineties, relies methodologically on machine learning, statistics, and the theory of databases. Whereas machine learning and database methods are covered by other lectures, the present lecture is the first of two dealing with the connection between data mining and statistics. It reviews statistical methods implemented in key examples of three main kinds of commercial data mining system, as well as in one academic systems used in teaching data mining at several Czech universities, including ours. This lecture is freely continued by the summer term lecture DBI029: Statistical aspects of data mining.

  • Data mining and its connection to statistics
  • Main types of data mining systems
  • Statistical methods in Clementine, an example of a general data mining system
  • Statistical methods in DecisionSite, an example of a system for on-line decision support by means of data mining
  • Matlab as an example of a more universal system including data mining methods
  • Descriptive statistics in Matlab
  • Linear regression and its generalizations in Matlab
  • Multivariate statistical analysis in Matlab
  • Hypotheses testing in Matlab
  • 4FT-Miner - an academic data mining system combining observational logic and the analysis of four-fold tables
  • Quantifiers of observational logic based on parameter estimation
  • Quantifiers of observational logic based on hypotheses testing

 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html