GPU-accelerated Mahalanobis-average hierarchical clustering
Thesis title in Czech: | Hierarchické shlukování s Mahalanobis-average metrikou akcelerované na GPU |
---|---|
Thesis title in English: | GPU-accelerated Mahalanobis-average hierarchical clustering |
Key words: | shlukování, vysokodimenzionální data, GPU |
English key words: | clustering, high-dimensional data, GPU |
Academic year of topic announcement: | 2019/2020 |
Thesis type: | diploma thesis |
Thesis language: | angličtina |
Department: | Department of Software Engineering (32-KSI) |
Supervisor: | RNDr. Miroslav Kratochvíl, Ph.D. |
Author: | hidden - assigned and confirmed by the Study Dept. |
Date of registration: | 12.11.2019 |
Date of assignment: | 13.11.2019 |
Confirmed by Study dept. on: | 19.11.2019 |
Date and time of defence: | 01.07.2020 09:00 |
Date of electronic submission: | 28.05.2020 |
Date of submission of printed version: | 28.05.2020 |
Date of proceeded defence: | 01.07.2020 |
Opponents: | RNDr. Jan Hric |
Guidelines |
Hierarchical agglomerative clustering algorithms form a common base of many data analysis methods. Recently, the metrics used to choose the agglomeration order have been improved by Fiser et al., who utilized a data-dependent Mahalanobis-style distance that overcomes many clustering deficiencies on realistic data. The quality improvement comes at a cost of increased computational complexity, which makes the algorithm application unrealistic for datasets of more than around 10^5 data points.
This thesis aims to explore the possibilities of accelerating the algorithm by offloading the computation-intensive parts to the GPU. As a main result, the thesis should describe the involved tradeoffs between achievable parallelism, complexity of supporting data structures, workload distribution between CPU and GPU, and overall efficiency, and support the claims by experimental evaluation. Using this, the thesis should propose a method for reaching optimal performance on realistic datasets originating from single-cell cytometry. |
References |
Fišer, K., Sieger, T., Schumich, A., Wood, B., Irving, J., Mejstříková, E., & Dworzak, M. N. (2012). Detection and monitoring of normal and leukemic cell populations with hierarchical clustering of flow cytometry data. Cytometry Part A, 81(1), 25-34.
Day, W. H., & Edelsbrunner, H. (1984). Efficient algorithms for agglomerative hierarchical clustering methods. Journal of classification, 1(1), 7-24. Tomov, S., Nath, R., Ltaief, H., & Dongarra, J. (2010, April). Dense linear algebra solvers for multicore with GPU accelerators. In 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW) (pp. 1-8). IEEE. Bleuse, R., Kedad‐Sidhoum, S., Monna, F., Mounié, G., & Trystram, D. (2015). Scheduling independent tasks on multi‐cores with GPU accelerators. Concurrency and Computation: Practice and Experience, 27(6), 1625-1638. Barrachina, S., Castillo, M., Igual, F. D., Mayo, R., & Quintana-Orti, E. S. (2008, April). Evaluation and tuning of the level 3 CUBLAS for graphics processors. In 2008 IEEE International Symposium on Parallel and Distributed Processing (pp. 1-8). IEEE. |