Parallelization of Clustering Algorithms
Název práce v češtině: | |
---|---|
Název v anglickém jazyce: | Parallelization of Clustering Algorithms |
Klíčová slova: | dolování dat, shluková analýza, paralelizace, GPU, CUDA |
Klíčová slova anglicky: | data mining, cluster analysis, parallelization, GPU, CUDA |
Akademický rok vypsání: | 2014/2015 |
Typ práce: | diplomová práce |
Jazyk práce: | angličtina |
Ústav: | Katedra softwarového inženýrství (32-KSI) |
Vedoucí / školitel: | doc. RNDr. Martin Kruliš, Ph.D. |
Řešitel: | Bc. Jakub Vlček - zadáno a potvrzeno stud. odd. |
Datum přihlášení: | 27.10.2014 |
Datum zadání: | 28.10.2014 |
Datum potvrzení stud. oddělením: | 20.11.2014 |
Zásady pro vypracování |
The continuously increasing amounts of unstructured data also emphasize the need for effective means of extracting relevant information from these data, so the data are ready for the subsequent machine processing (e.g., storing the data in structured form or performing search queries). Some of these data mining problems can be solved by cluster analysis. It classifies given objects into groups based on selected features. The formed groups gather objects, which are considered similar in the terms of recognized characteristics. Unfortunately, the clustering process is rather time-consuming; therefore, it may be virtually useless for very large data collections.
One of the possible solutions is to parallelize the cluster analysis. A parallel implementation can take advantages of the current processor architectures that integrate multiple processing cores into a single chip. It may also be possible to utilize specialized computational accelerators that have become economically attractive even for ordinary users. For example, new graphics cards are capable of processing thousands computational threads concurrently, whilst these threads can cooperate on quite generic computational tasks. The main objective of this work is the analysis of the clustering algorithms in the perspective of massively parallel processing on specialized computing devices, especially on the graphics cards. The project will also require design and implementation of several prototype solutions. These prototypes will be experimentally evaluated to asses the applicability of different approaches for various configurations of cluster analysis and for different data sizes. |
Seznam odborné literatury |
David B. Kirk, Wen-mei W. Hwu: Programming Massively Parallel Processors, Second Edition: A Hands-on Approach, 2012, ISBN: 0124159923
Jason Sanders, Edward Kandrot: CUDA by Example: An Introduction to General-Purpose GPU Programming, NVIDIA 2010, ISBN: 0-13-138768-5 Matthew Scarpino: OpenCL in Action: How to Accelerate Graphics and Computations, Manning Publications 2011, ISBN: 1617290173 Charu C. Aggarwal, Chandan K. Reddy: Data Clustering: Algorithms and Applications (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series), 2013, ISBN: 978-1466558212 Hong-Tao, B., Li-li, H., Dan-tong, O., Zhan-shan, L., He, L.: K-means on commodity GPUs with CUDA. In: Computer Science and Information Engineering, 2009 WRI World Congress on. Volume 3., IEEE (2009) 651–655 Zechner, M., Granitzer, M.: Accelerating k-means on the graphics processor via CUDA. In: Intensive Applications and Services, 2009. INTENSIVE’09. First International Conference on, IEEE (2009) 7–15 |