Thesis (Selection of subject)

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Parallelization of Clustering Algorithms

Thesis title in Czech:
Thesis title in English:	Parallelization of Clustering Algorithms
Key words:	dolování dat, shluková analýza, paralelizace, GPU, CUDA
English key words:	data mining, cluster analysis, parallelization, GPU, CUDA
Academic year of topic announcement:	2014/2015
Thesis type:	diploma thesis
Thesis language:	angličtina
Department:	Department of Software Engineering (32-KSI)
Supervisor:	doc. RNDr. Martin Kruliš, Ph.D.
Author:	Bc. Jakub Vlček - assigned and confirmed by the Study Dept.
Date of registration:	27.10.2014
Date of assignment:	28.10.2014
Confirmed by Study dept. on:	20.11.2014

Guidelines

The continuously increasing amounts of unstructured data also emphasize the need for effective means of extracting relevant information from these data, so the data are ready for the subsequent machine processing (e.g., storing the data in structured form or performing search queries). Some of these data mining problems can be solved by cluster analysis. It classifies given objects into groups based on selected features. The formed groups gather objects, which are considered similar in the terms of recognized characteristics. Unfortunately, the clustering process is rather time-consuming; therefore, it may be virtually useless for very large data collections.

One of the possible solutions is to parallelize the cluster analysis. A parallel implementation can take advantages of the current processor architectures that integrate multiple processing cores into a single chip. It may also be possible to utilize specialized computational accelerators that have become economically attractive even for ordinary users. For example, new graphics cards are capable of processing thousands computational threads concurrently, whilst these threads can cooperate on quite generic computational tasks.

The main objective of this work is the analysis of the clustering algorithms in the perspective of massively parallel processing on specialized computing devices, especially on the graphics cards. The project will also require design and implementation of several prototype solutions. These prototypes will be experimentally evaluated to asses the applicability of different approaches for various configurations of cluster analysis and for different data sizes.

References

David B. Kirk, Wen-mei W. Hwu: Programming Massively Parallel Processors, Second Edition: A Hands-on Approach, 2012, ISBN: 0124159923

Jason Sanders, Edward Kandrot: CUDA by Example: An Introduction to General-Purpose GPU Programming, NVIDIA 2010, ISBN: 0-13-138768-5

Matthew Scarpino: OpenCL in Action: How to Accelerate Graphics and Computations, Manning Publications 2011, ISBN: 1617290173

Charu C. Aggarwal, Chandan K. Reddy: Data Clustering: Algorithms and Applications (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series), 2013, ISBN: 978-1466558212

Hong-Tao, B., Li-li, H., Dan-tong, O., Zhan-shan, L., He, L.: K-means on commodity GPUs with CUDA. In: Computer Science and Information Engineering, 2009 WRI World Congress on. Volume 3., IEEE (2009) 651–655

Zechner, M., Granitzer, M.: Accelerating k-means on the graphics processor via CUDA. In: Intensive Applications and Services, 2009. INTENSIVE’09. First International Conference on, IEEE (2009) 7–15