The thesis is targeted to efficient processing of large data, and e-science data in particular. In order to exploit modern hardware such as multi- and many-core processors and GPGPU, contemporary algorithms and methods should be redesigned and optimized for parallel processing. Author should analyze Random Decision Forests algorithms, analyze their scalability with size of the data sets and identify bottlenecks and limitations. Based on the analysis, a pilot implementation of the parallel RDFo should be designed, implemented, and optimized for a selected platform. The performance using both syntetic and real e-science data should be analyzed and compared to non-optimized approaches.
Seznam odborné literatury
Tin Kam Ho: Decision Forests, Bell Laboratories, Lucent Technologies
V. Svetnik et al.: Random Forest: A Classification and Regression Tool for Compound Classification and
QSAR Modeling, J. Chem. Inf. Comput. Sci. 2003
Leo Breiman: Random Forests, Machine Learning, 2001