Thesis (Selection of subject)Thesis (Selection of subject)(version: 368)
Thesis details
   Login via CAS
A study of applying copulas in data mining
Thesis title in Czech: Dobývání znalostí z dat pomocí kopulí
Thesis title in English: A study of applying copulas in data mining
Key words: data mining, vztahy mezi atributy, pravděpodobnostní vztahy, kopule, typy kopulí
English key words: data mining, relationships between attributes, probabilistic relationships, copulas, kinds of copulas
Academic year of topic announcement: 2012/2013
Thesis type: diploma thesis
Thesis language: angličtina
Department: Department of Theoretical Computer Science and Mathematical Logic (32-KTIML)
Supervisor: prof. RNDr. Ing. Martin Holeňa, CSc.
Author: hidden - assigned and confirmed by the Study Dept.
Date of registration: 13.03.2013
Date of assignment: 13.03.2013
Confirmed by Study dept. on: 15.03.2013
Date and time of defence: 15.05.2013 10:00
Date of electronic submission:11.04.2013
Date of submission of printed version:11.04.2013
Date of proceeded defence: 15.05.2013
Opponents: Mgr. David Hauzar, Ph.D.
 
 
 
Guidelines
At first, student will introduce himself to the copula theory with emphasis on copula families used in existing applications. Further he will study methods for fitting copulas to data and also measures used to assess the fit of copulas to data. Based on studied literature he will choose several copula families, for which he will implement standard methods for fitting copulas to data, including assessing the quality of the fit by chosen measures. The implementation will be done in Matlab environment. Using implemented methods he will test suitability of selected copula families for fitting data. Student will use at least two datasets used in publications and one dataset provided by his supervisor.
References
See http://www.cs.cas.cz/~martin/diplomka48.htm
Preliminary scope of work in English
Copulas are functions that have been used in the probability theory since the beginning of 50s to describe a relationship between a multivariate cumulative distribution function and distributions of its marginals. With increasing significance of probability approaches in computer science, copulas have found their applications in this field as well. During the last decade they have been applied in genetic algorithms for estimation of probability distributions (EDA algorithms) and also in the quickly growing area of data mining. Here copulas provide ways to find interesting relationships between attributes that can not be obtained using traditional methods. So far, the practical usage of copulas is to be found only in finance, where models created in the process of data mining are used for prediction. There are however many different kinds of copulas. This is a consequence of the fact that we often expect some interesting properties from the copulas (e.g. in case of Archimedean copulas) and also thanks to the easy creation of new copulas by parametrization. Dozens of copula families obtained by parametrization have already been described in the literature. So far almost no attention has been paid to the differences between copula families from the data mining point of view. The proposed master thesis should contribute to such research.
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html