Parallelization of Clustering Algorithms
|Thesis title in Czech:|
|Thesis title in English:||Parallelization of Clustering Algorithms|
|Key words:||dolování dat, shluková analýza, paralelizace, GPU, CUDA|
|English key words:||data mining, cluster analysis, parallelization, GPU, CUDA|
|Academic year of topic announcement:||2014/2015|
|Type of assignment:||diploma thesis|
|Department:||Department of Software Engineering (32-KSI)|
|Supervisor:||RNDr. Martin Kruliš, Ph.D.|
|Author:||Bc. Jakub Vlček - assigned and confirmed by the Study Dept.|
|Date of registration:||27.10.2014|
|Date of assignment:||28.10.2014|
|Confirmed by Study dept. on:||20.11.2014|
|The continuously increasing amounts of unstructured data also emphasize the need for effective means of extracting relevant information from these data, so the data are ready for the subsequent machine processing (e.g., storing the data in structured form or performing search queries). Some of these data mining problems can be solved by cluster analysis. It classifies given objects into groups based on selected features. The formed groups gather objects, which are considered similar in the terms of recognized characteristics. Unfortunately, the clustering process is rather time-consuming; therefore, it may be virtually useless for very large data collections.
One of the possible solutions is to parallelize the cluster analysis. A parallel implementation can take advantages of the current processor architectures that integrate multiple processing cores into a single chip. It may also be possible to utilize specialized computational accelerators that have become economically attractive even for ordinary users. For example, new graphics cards are capable of processing thousands computational threads concurrently, whilst these threads can cooperate on quite generic computational tasks.
The main objective of this work is the analysis of the clustering algorithms in the perspective of massively parallel processing on specialized computing devices, especially on the graphics cards. The project will also require design and implementation of several prototype solutions. These prototypes will be experimentally evaluated to asses the applicability of different approaches for various configurations of cluster analysis and for different data sizes.
|David B. Kirk, Wen-mei W. Hwu: Programming Massively Parallel Processors, Second Edition: A Hands-on Approach, 2012, ISBN: 0124159923
Jason Sanders, Edward Kandrot: CUDA by Example: An Introduction to General-Purpose GPU Programming, NVIDIA 2010, ISBN: 0-13-138768-5
Matthew Scarpino: OpenCL in Action: How to Accelerate Graphics and Computations, Manning Publications 2011, ISBN: 1617290173
Charu C. Aggarwal, Chandan K. Reddy: Data Clustering: Algorithms and Applications (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series), 2013, ISBN: 978-1466558212
Hong-Tao, B., Li-li, H., Dan-tong, O., Zhan-shan, L., He, L.: K-means on commodity GPUs with CUDA. In: Computer Science and Information Engineering, 2009 WRI World Congress on. Volume 3., IEEE (2009) 651–655
Zechner, M., Granitzer, M.: Accelerating k-means on the graphics processor via CUDA. In: Intensive Applications and Services, 2009. INTENSIVE’09. First International Conference on, IEEE (2009) 7–15