Thesis (Selection of subject)Thesis (Selection of subject)(version: 368)
Thesis details
   Login via CAS
GPU-accelerated Mahalanobis-average hierarchical clustering
Thesis title in Czech: Hierarchické shlukování s Mahalanobis-average metrikou akcelerované na GPU
Thesis title in English: GPU-accelerated Mahalanobis-average hierarchical clustering
Key words: shlukování, vysokodimenzionální data, GPU
English key words: clustering, high-dimensional data, GPU
Academic year of topic announcement: 2019/2020
Thesis type: diploma thesis
Thesis language: angličtina
Department: Department of Software Engineering (32-KSI)
Supervisor: RNDr. Miroslav Kratochvíl, Ph.D.
Author: hidden - assigned and confirmed by the Study Dept.
Date of registration: 12.11.2019
Date of assignment: 13.11.2019
Confirmed by Study dept. on: 19.11.2019
Date and time of defence: 01.07.2020 09:00
Date of electronic submission:28.05.2020
Date of submission of printed version:28.05.2020
Date of proceeded defence: 01.07.2020
Opponents: RNDr. Jan Hric
 
 
 
Guidelines
Hierarchical agglomerative clustering algorithms form a common base of many data analysis methods. Recently, the metrics used to choose the agglomeration order have been improved by Fiser et al., who utilized a data-dependent Mahalanobis-style distance that overcomes many clustering deficiencies on realistic data. The quality improvement comes at a cost of increased computational complexity, which makes the algorithm application unrealistic for datasets of more than around 10^5 data points.

This thesis aims to explore the possibilities of accelerating the algorithm by offloading the computation-intensive parts to the GPU. As a main result, the thesis should describe the involved tradeoffs between achievable parallelism, complexity of supporting data structures, workload distribution between CPU and GPU, and overall efficiency, and support the claims by experimental evaluation. Using this, the thesis should propose a method for reaching optimal performance on realistic datasets originating from single-cell cytometry.
References
Fišer, K., Sieger, T., Schumich, A., Wood, B., Irving, J., Mejstříková, E., & Dworzak, M. N. (2012). Detection and monitoring of normal and leukemic cell populations with hierarchical clustering of flow cytometry data. Cytometry Part A, 81(1), 25-34.

Day, W. H., & Edelsbrunner, H. (1984). Efficient algorithms for agglomerative hierarchical clustering methods. Journal of classification, 1(1), 7-24.

Tomov, S., Nath, R., Ltaief, H., & Dongarra, J. (2010, April). Dense linear algebra solvers for multicore with GPU accelerators. In 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW) (pp. 1-8). IEEE.

Bleuse, R., Kedad‐Sidhoum, S., Monna, F., Mounié, G., & Trystram, D. (2015). Scheduling independent tasks on multi‐cores with GPU accelerators. Concurrency and Computation: Practice and Experience, 27(6), 1625-1638.

Barrachina, S., Castillo, M., Igual, F. D., Mayo, R., & Quintana-Orti, E. S. (2008, April). Evaluation and tuning of the level 3 CUBLAS for graphics processors. In 2008 IEEE International Symposium on Parallel and Distributed Processing (pp. 1-8). IEEE.
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html