SubjectsSubjects(version: 901)
Course, academic year 2022/2023
UNIX and work with genomic data - MB170C47
Title: UNIX and work with genomic data
Czech title: UNIX a práce s genomickými daty
Guaranteed by: Department of Zoology (31-170)
Faculty: Faculty of Science
Actual: from 2021
Semester: winter
E-Credits: 2
Examination process: winter s.:
Hours per week, examination: winter s.:0/3 C [days/semester]
Capacity: unlimited
Min. number of students: unlimited
Virtual mobility / capacity: no
State of the course: taught
Language: English
Note: enabled for web enrollment
Guarantor: RNDr. Radka Reifová, Ph.D.
Teacher(s): Mgr. Václav Janoušek, Ph.D.
RNDr. Libor Mořkovský, Ph.D.
Opinion survey results   Examination dates   Schedule   
Annotation -
Last update: Ing. Jindřiška Peterková (06.09.2021)
The course is taught in person or online, depending on the epidemiological situation and the interest of students. In case of greater interest from students, the course will take place online to provide hygiene recommendations.

The recent progress of Next Generation Sequencing (NGS) technologies led to a huge increase in the amount of data that biologists have to deal with. The usual amount of generated data is far beyond the capacity of common data analysis tools on the Microsoft Windows platform. The Unix environment provides efficient tools for handling large amounts of genomic data.

Participants of the course will gain sufficient skills and confidence in the Unix environment to use it for analysis of genomic data. The participants will explore the possibilities of the system using the examples of NGS data manipulation, analysis and visualization. The focus of the course is not one particular NGS analysis, but to teach the participants how to comfortably use any Unix tool to pursuit their scientific goals. The course is recommended for master’s degree and PhD students.

If all participants understand Czech, we will teach in Czech (otherwise in English).
Syllabus -
Last update: Mgr. Václav Janoušek, Ph.D. (15.09.2015)

I. Introduction to Unix - Learn about the Unix philosophy.

II. Basic Unix - Learn to use the basic commands (cd, ls, ll, mkdir, mv, cp, pwd, htop, screen, grep, globbing, less, head, tail, cat, cut, sort, uniq, paste, join, pipes).

III. Advanced Unix - Learn basics of awk, sed, regular expressions, shell scripting, shell variables, parallel, subshells.

IV. Introduction to Genomics - Learn how ‘genomes’ are made.

V. Data visualization - Learn how to format your data for effective visualization and how to use RStudio, tidyr, dplyr and ggplot2 to explore your data visually.

VI. Read quality assessment - Learn how to use Unix to explore FASTQ files, calculate some basic statistics, assess read quality, filter out low-quality reads.

VII. Genome assembly - Learn how to do a (small) genome assembly.

VIII. Variant calling - Learn how to use the original NGS reads and a genome assembly to call variants.

IX. Standard annotation formats - Learn how information on genes, variants and genome properties is stored (GFF, VCF, BED formats) and how to obtain quick summaries with impressive speed (bedtools, vcftools, etc.)

X. A lot of practice.

Charles University | Information system of Charles University |