Témata prací (Výběr práce)Témata prací (Výběr práce)(verze: 368)
Detail práce
   Přihlásit přes CAS
Parallelization of Clustering Algorithms
Název práce v češtině:
Název v anglickém jazyce: Parallelization of Clustering Algorithms
Klíčová slova: dolování dat, shluková analýza, paralelizace, GPU, CUDA
Klíčová slova anglicky: data mining, cluster analysis, parallelization, GPU, CUDA
Akademický rok vypsání: 2014/2015
Typ práce: diplomová práce
Jazyk práce: angličtina
Ústav: Katedra softwarového inženýrství (32-KSI)
Vedoucí / školitel: doc. RNDr. Martin Kruliš, Ph.D.
Řešitel: Bc. Jakub Vlček - zadáno a potvrzeno stud. odd.
Datum přihlášení: 27.10.2014
Datum zadání: 28.10.2014
Datum potvrzení stud. oddělením: 20.11.2014
Zásady pro vypracování
The continuously increasing amounts of unstructured data also emphasize the need for effective means of extracting relevant information from these data, so the data are ready for the subsequent machine processing (e.g., storing the data in structured form or performing search queries). Some of these data mining problems can be solved by cluster analysis. It classifies given objects into groups based on selected features. The formed groups gather objects, which are considered similar in the terms of recognized characteristics. Unfortunately, the clustering process is rather time-consuming; therefore, it may be virtually useless for very large data collections.

One of the possible solutions is to parallelize the cluster analysis. A parallel implementation can take advantages of the current processor architectures that integrate multiple processing cores into a single chip. It may also be possible to utilize specialized computational accelerators that have become economically attractive even for ordinary users. For example, new graphics cards are capable of processing thousands computational threads concurrently, whilst these threads can cooperate on quite generic computational tasks.

The main objective of this work is the analysis of the clustering algorithms in the perspective of massively parallel processing on specialized computing devices, especially on the graphics cards. The project will also require design and implementation of several prototype solutions. These prototypes will be experimentally evaluated to asses the applicability of different approaches for various configurations of cluster analysis and for different data sizes.
Seznam odborné literatury
David B. Kirk, Wen-mei W. Hwu: Programming Massively Parallel Processors, Second Edition: A Hands-on Approach, 2012, ISBN: 0124159923

Jason Sanders, Edward Kandrot: CUDA by Example: An Introduction to General-Purpose GPU Programming, NVIDIA 2010, ISBN: 0-13-138768-5

Matthew Scarpino: OpenCL in Action: How to Accelerate Graphics and Computations, Manning Publications 2011, ISBN: 1617290173

Charu C. Aggarwal, Chandan K. Reddy: Data Clustering: Algorithms and Applications (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series), 2013, ISBN: 978-1466558212

Hong-Tao, B., Li-li, H., Dan-tong, O., Zhan-shan, L., He, L.: K-means on commodity GPUs with CUDA. In: Computer Science and Information Engineering, 2009 WRI World Congress on. Volume 3., IEEE (2009) 651–655

Zechner, M., Granitzer, M.: Accelerating k-means on the graphics processor via CUDA. In: Intensive Applications and Services, 2009. INTENSIVE’09. First International Conference on, IEEE (2009) 7–15
 
Univerzita Karlova | Informační systém UK