SubjectsSubjects(version: 945)
Course, academic year 2023/2024
   Login via CAS
Advanced data manipulation for modern HPC - NFPL041
Title: Pokročilé zpracování dat pro moderní HPC
Guaranteed by: Department of Condensed Matter Physics (32-KFKL)
Faculty: Faculty of Mathematics and Physics
Actual: from 2022
Semester: winter
E-Credits: 5
Hours per week, examination: winter s.:2/2, C+Ex [HT]
Capacity: unlimited
Min. number of students: unlimited
4EU+: no
Virtual mobility / capacity: no
State of the course: taught
Language: Czech, English
Teaching methods: full-time
Teaching methods: full-time
Guarantor: Ing. Dominik Legut, Ph.D.
Annotation -
Last update: Mgr. Kateřina Mikšová (10.06.2022)
This subject prepares participant for the processing and manipulation large data files. This concerns not only the work with HPC supercomputers, but also to manipulate date of daily life. Participant will learn how to work with files of million lines or million columns or files as large as several GBi.
Course completion requirements -
Last update: Mgr. Kateřina Mikšová (10.06.2022)

The course is completed by passing credits and oral exam. To participate in the exam it is necessary to complete the credit. To obtain the credit active participation is required. Each student has to solve a number of problems assigned by the supervisor. Due to these conditions it is not possible to attempt to obtain the credit for the second time in semester. Exam requirements follow the subject syllabus as presented during lectures.

Literature -
Last update: Mgr. Kateřina Mikšová (10.06.2022)

Compulsory literature

http://becksteinlab.physics.asu.edu/pages/unix/IntroUnix/vim_basics.html for unix and vi, sed etc.

http://cs.lmu.edu/~ray/notes/bash/ for bash

https://www.tutorialspoint.com/awk/index.htm for awk

Recommended literature
http://www.well.ox.ac.uk/~johnb/comp/perl/intro.html

Syllabus -
Last update: Mgr. Kateřina Mikšová (10.06.2022)

This subject prepares participant for the processing and manipulation large data files and prepares to work with

HPC supercomputers. Participatn will learn to work with files of million lines or columns or files as large as

several GBi.

1. Unix(linux) commands for data manipulation in command line prompt

2. Handling text data and editing in unix, Vi-editor, Nano, midnight commander etc.

3. Introduction to scripting in Bash, for and while loops, etc.

4. Introduction to Awk, manipulation of data

5. How to exploit simple mathematics in command line

6. Awk, formats of data I/O (formated input and output)

7. Basics of Ed and Sed, replacing strings, more complex constructions

8. Advance methods - Introduction to Perl

9. Perl II

10. Regular syntax I

11. Regular synax II

12. Data manipulation to and from HPC systems, dispaly forwarding, usage of scheduler and batch jobs

13. - 14. Practical sessions

 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html