hidden - assigned and confirmed by the Study Dept.
Date of registration:
04.11.2013
Date of assignment:
06.11.2013
Confirmed by Study dept. on:
18.11.2013
Date and time of defence:
05.09.2016 10:00
Date of electronic submission:
27.07.2016
Date of submission of printed version:
28.07.2016
Date of proceeded defence:
05.09.2016
Reviewers:
doc. RNDr. Martin Kruliš, Ph.D.
Guidelines
The thesis is targeted to efficient processing of large data, and e-science data in particular. In order to exploit modern hardware such as multi- and many-core processors and GPGPU, contemporary algorithms and methods should be redesigned and optimized for parallel processing. Author should analyze Random Decision Forests algorithms, analyze their scalability with size of the data sets and identify bottlenecks and limitations. Based on the analysis, a pilot implementation of the parallel RDFo should be designed, implemented, and optimized for a selected platform. The performance using both syntetic and real e-science data should be analyzed and compared to non-optimized approaches.
References
Tin Kam Ho: Decision Forests, Bell Laboratories, Lucent Technologies
V. Svetnik et al.: Random Forest: A Classification and Regression Tool for Compound Classification and
QSAR Modeling, J. Chem. Inf. Comput. Sci. 2003
Leo Breiman: Random Forests, Machine Learning, 2001