SubjectsSubjects(version: 916)
Course, academic year 2022/2023
   Login via CAS
Advanced Programming in Parallel Environment - NPRG058
Title: Pokročilé programování v paralelním prostředí
Guaranteed by: Department of Software Engineering (32-KSI)
Faculty: Faculty of Mathematics and Physics
Actual: from 2020
Semester: winter
E-Credits: 6
Hours per week, examination: winter s.:2/2, C+Ex [HT]
Capacity: unlimited
Min. number of students: unlimited
Virtual mobility / capacity: no
State of the course: taught
Language: Czech, English
Teaching methods: full-time
Additional information:
Guarantor: doc. RNDr. Martin Kruliš, Ph.D.
RNDr. Jakub Yaghob, Ph.D.
Pre-requisite : NPRG042
Annotation -
Last update: T_KSI (27.04.2015)
A practical seminar, which is a continuation of Programming in Parallel Environment lectures, focuses on more advanced aspects of parallel programming. The main objective is to practically introduce more complicated problems to the students regarding programming of multiprocessor NUMA servers and employing additional parallel devices, especially the GPGPUs (CUDA) and Xeon Phi devices. The students will be given several problems, which will be analyzed during lectures and implemented by the students in their home assignments. The results will be verified and subjected to collective discussion
Literature -
Last update: T_KSI (01.05.2013)

James Reinders: Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism, O'Reilly

Benedict Gaster, Lee Howes, David R. Kaeli, Perhaad Mistry, Dana Schaa: Heterogeneous Computing with OpenCL, Morgan Kaufmann; 2 edition (November 27, 2012)

Shane Cook: CUDA Programming: A Developer's Guide to Parallel Computing with GPUs (Applications of GPU Computing Series)

OpenCL - Online Manual (

CUDA Online Documentation (

Syllabus -
Last update: T_KSI (27.04.2015)

The seminar will present the following problems:

  • Task scheduling on multicore CPUs and NUMA systems
  • Synchronization on multi-core CPUs and multiprocessor systems
  • Efficiency of the data transfers between additional devices and host memory
  • Load balancing between CPU and additional accelerators
  • Transforming problems into data parallel tasks and their mapping to GPUs
  • Shared memory access, cache-aware programming, and atomic operations on GPU
  • Solving irregular workloads on GPUs (persistent threads, dynamic parallelism)
  • Xeon Phi devices and the most important differences between Intel MIC and GPU architectures

Charles University | Information system of Charles University |