SubjectsSubjects(version: 945)
Course, academic year 2019/2020
   Login via CAS
Data analysis in R and Python - MG440P44
Title: Data analysis in R and Python
Czech title: Analýza dat v prostředí R a Python
Guaranteed by: Institute of Petrology and Structural Geology (31-440)
Faculty: Faculty of Science
Actual: from 2019 to 2019
Semester: winter
E-Credits: 4
Examination process: winter s.:
Hours per week, examination: winter s.:1/2, Ex [HT]
Capacity: unlimited
Min. number of students: 3
4EU+: no
Virtual mobility / capacity: no
State of the course: taught
Language: English, Czech
Note: enabled for web enrollment
Guarantor: prof. Mgr. Vojtěch Janoušek, Ph.D.
doc. Mgr. Ondrej Lexa, Ph.D.
Teacher(s): prof. Mgr. Vojtěch Janoušek, Ph.D.
doc. Mgr. Ondrej Lexa, Ph.D.
Annotation -
Last update: prof. Mgr. Vojtěch Janoušek, Ph.D. (04.03.2019)
The course is taught in English when at least one international student is enrolled. This practical course is aimed at senior undergraduate and postgraduate students. It is intended to: a) explain fundamentals of data processing and visualization in geology as well as functioning of computing algorithms in general; b) present basics of the R and Python programming languages; c) illustrate the usability and versatility of both languages for everyday calculations, as well as for production of publication-quality graphics; d) demonstrate examples of using both languages in reproducible research (with certain structural geology and whole-rock geochemistry bias).
Literature -
Last update: doc. RNDr. Petr Jeřábek, Ph.D. (14.03.2019)

Learning materials (only for students):

https://www.natur.cuni.cz/geologie/petrologie/vyukove-materialy/analyza-dat-v-prostredi-r-a-python

Web links:

de Vries A: Using R with Jupyter Notebooks http://blog.revolutionanalytics.com/2015/09/using-r-with-jupyter-notebooks.html

Jupyter: Open source, interactive data science and scientific computing across over 40 programming languages http://jupyter.org/

The R Project for Statistical Computing https://www.r-project.org/

Dive into Python 3 http://www.diveintopython3.net/

Scientific Python Lecture Notes http://www.scipy-lectures.org

Wikipedie: R (programming language) https://en.wikipedia.org/wiki/R_(programming_language)

Wikipedie: Python (programming language) https://en.wikipedia.org/wiki/Python_(programming_language)

 

Literature:

Becker RA, Chambers JM, Wilks AR (1988) The New S Language. Chapman & Hall, London, pp 1-702

Crawley MJ (2007) The R book. John Wiley & Sons, Chichester, pp 1-942

Janoušek V, Moyen JF, Martin H, Erban V, Farrow C (2016) Geochemical Modelling of Igneous Processes - Principles and Recipes in R Language. Bringing the Power of R to a Geochemical Community. Springer-Verlag, Berlin, Heidelberg, pp 1-346

Langtangen, H P (2016) A Primer On Scientific Programming With Python, Texts in Computational Science and Engineering, pp 1-992

Maindonald J, Braun J (2003) Data Analysis and Graphics Using R. Cambridge University Press, Cambridge, pp 1-386

Murrell P (2005) R Graphics. Chapman & Hall/CRC, London, pp 1-328

Rollinson HR (1993) Using Geochemical Data: Evaluation, Presentation, Interpretation. Longman, London, pp 1-352

Rossant C (2015) Learning IPython for Interactive Computing and Data Visualization - Second Edition, Packt Publishing, pp 1-175

Requirements to the exam -
Last update: doc. RNDr. Petr Jeřábek, Ph.D. (14.03.2019)

The examination is a practical test, whereby the participants are required to write several short programs in the R and Python programming languages.

Syllabus -
Last update: prof. Mgr. Vojtěch Janoušek, Ph.D. (01.10.2020)

1 Introduction to data analysis and algorithmization I. [OL]

  • Problém zpracování rozsáhlých datových souborů v přírodních vědách
  • Proč právě Python – trocha historie a aktuální rozmach
  • Instalace software nezbytného pro zbytek kurzu

2. Introduction to data analysis and algorithmization II. [VJ]

  • Why R? – a bit of history and its current upswing
  • How does the computer programme work?
  • Fundamental data types, algorithmization, typical parts of a computer programme, object-oriented programming

3.  Základy programovacího jazyka Python I. [OL]

Úvod do interaktivního prostředí Jupyter Notebook

Úvod do Pythonu, základní vlastnosti a nástroje, konvence

  • Datové typy a jejich vlastnosti, kontejnery a (im)mutable typy, iterátory, generátory
  • Deklarace proměnné
  • Built-in funkce a operátory
  • Bloky a cykly
  • Vytváření funkcí
  • Výjimky a chyby

4.  Základy programovacího jazyka Python II. [OL]

Rozšíření Python-u

  • Úvod do NumPy – Numerický Python
  • Základní grafický výstup – Matplotlib
  • Tvorba map – Basemap
  • Pokročilá práce s NumPy a SciPy
  • Načítání a ukládání dat v Pythonu

5. Fundamentals of the R language I. [VJ]

Introduction, fundamental data types and basic operations with them

  • Interactive/batch mode
  • Help and documentation
  • Main data types, attributes
  • Vectors
  • Matrices and arrays
  • Factors
  • Lists

6. Fundamentals of the R language II. [VJ]

Programming and graphics

  • Data import and output from/to files
  • Graphical functions and their main parameters
  • Printing and exporting graphics (PDF, PostScript…)
  • Programming in R – conditional execution, loops, user-defined functions
  • R community, CRAN, mailing lists, useR! conferences
  • Expanding R by additional packages (libraries)

7. Aplikace programovacího jazyka Python I. [OL]

Výpočty a statistika

  • Základní statistika v Python-u
  • Analýza dat v prostředí Python – Pandas

8.  Aplikace programovacího jazyka Python II. [OL]

Orientační analýza- zpracování směrových dat

  • Statistika vektorových dat – průměrný vektor v ploše a na kouli
  • Statistika osních dat – matice orientace a výpočet průměrného směru orientace

9.  R applications I. [VJ]

Calculations and statistics

  • Simple geochemical recalculations
  • On usefulness of matrices
  • Descriptive statistics in R
  • Working with large and complex datasets

10. R applications II. [VJ]

  • Graphics in R – examples from whole-rock geochemistry
  • Binary diagrams and Harker plots
  • Ternary diagrams
  • Spiderplots
  • Calculating simple petrogenetic models, including graphical output
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html