SubjectsSubjects(version: 901)
Course, academic year 2022/2023
  
Data analysis in R and Python - MG440P44
Title: Data analysis in R and Python
Czech title: Analýza dat v prostředí R a Python
Guaranteed by: Institute of Petrology and Structural Geology (31-440)
Faculty: Faculty of Science
Actual: from 2020
Semester: winter
E-Credits: 4
Examination process: winter s.:
Hours per week, examination: winter s.:1/2 Ex [hours/week]
Capacity: unlimited
Min. number of students: 3
Virtual mobility / capacity: no
State of the course: taught
Language: English, Czech
Note: enabled for web enrollment
Guarantor: prof. Mgr. Vojtěch Janoušek, Ph.D.
doc. Mgr. Ondrej Lexa, Ph.D.
Teacher(s): prof. Mgr. Vojtěch Janoušek, Ph.D.
doc. Mgr. Ondrej Lexa, Ph.D.
Opinion survey results   Examination dates   Schedule   
Annotation -
Last update: prof. Mgr. Vojtěch Janoušek, Ph.D. (04.03.2019)
The course is taught in English when at least one international student is enrolled. This practical course is aimed at senior undergraduate and postgraduate students. It is intended to: a) explain fundamentals of data processing and visualization in geology as well as functioning of computing algorithms in general; b) present basics of the R and Python programming languages; c) illustrate the usability and versatility of both languages for everyday calculations, as well as for production of publication-quality graphics; d) demonstrate examples of using both languages in reproducible research (with certain structural geology and whole-rock geochemistry bias).
Literature -
Last update: doc. RNDr. Petr Jeřábek, Ph.D. (14.03.2019)

Learning materials (only for students):

https://www.natur.cuni.cz/geologie/petrologie/vyukove-materialy/analyza-dat-v-prostredi-r-a-python

Web links:

de Vries A: Using R with Jupyter Notebooks http://blog.revolutionanalytics.com/2015/09/using-r-with-jupyter-notebooks.html

Jupyter: Open source, interactive data science and scientific computing across over 40 programming languages http://jupyter.org/

The R Project for Statistical Computing https://www.r-project.org/

Dive into Python 3 http://www.diveintopython3.net/

Scientific Python Lecture Notes http://www.scipy-lectures.org

Wikipedie: R (programming language) https://en.wikipedia.org/wiki/R_(programming_language)

Wikipedie: Python (programming language) https://en.wikipedia.org/wiki/Python_(programming_language)

 

Literature:

Becker RA, Chambers JM, Wilks AR (1988) The New S Language. Chapman & Hall, London, pp 1-702

Crawley MJ (2007) The R book. John Wiley & Sons, Chichester, pp 1-942

Janoušek V, Moyen JF, Martin H, Erban V, Farrow C (2016) Geochemical Modelling of Igneous Processes - Principles and Recipes in R Language. Bringing the Power of R to a Geochemical Community. Springer-Verlag, Berlin, Heidelberg, pp 1-346

Langtangen, H P (2016) A Primer On Scientific Programming With Python, Texts in Computational Science and Engineering, pp 1-992

Maindonald J, Braun J (2003) Data Analysis and Graphics Using R. Cambridge University Press, Cambridge, pp 1-386

Murrell P (2005) R Graphics. Chapman & Hall/CRC, London, pp 1-328

Rollinson HR (1993) Using Geochemical Data: Evaluation, Presentation, Interpretation. Longman, London, pp 1-352

Rossant C (2015) Learning IPython for Interactive Computing and Data Visualization - Second Edition, Packt Publishing, pp 1-175

Requirements to the exam -
Last update: doc. RNDr. Petr Jeřábek, Ph.D. (14.03.2019)

The examination is a practical test, whereby the participants are required to write several short programs in the R and Python programming languages.

Syllabus -
Last update: prof. Mgr. Vojtěch Janoušek, Ph.D. (01.10.2020)

1 Introduction to data analysis and algorithmization I. [OL]

  • Reproducible research
  • Data Analysis in Earth Sciences
  • Why Python?
  • Let’s install out scientific computing environment

2. Introduction to data analysis and algorithmization II. [VJ]

  • Why R? – a bit of history and its current upswing
  • How does the computer programme work?
  • Fundamental data types, algorithmization, typical parts of a computer programme, object-oriented programming

3.  Fundamentals of the Python language I. [OL] 

Introduction to Jupyter Notebooks and JupyterLab

Python crash course, basics of Python programming

  • Variables and simple data types
  • Advanced datatypes
  • Built-in functions and operators
  • Blocks and loops
  • User-defined functions
  • Errors and exceptions

4.  Fundamentals of the Python language II. [OL]

Scientific Python

 

  • Introduction to NumPy

  • Visualizations with Matplotlib and Seaborn
  • Data input and output

5. Fundamentals of the R language I. [VJ]

 

Introduction, fundamental data types and basic operations with them

  • Interactive/batch mode
  • Help and documentation
  • Main data types, attributes
  • Vectors
  • Matrices and arrays
  • Factors
  • Lists

6. Fundamentals of the R language II. [VJ]

Programming and graphics

  • Data import and output from/to files
  • Graphical functions and their main parameters
  • Printing and exporting graphics (PDF, PostScript…)
  • Programming in R – conditional execution, loops, user-defined functions
  • R community, CRAN, mailing lists, useR! conferences
  • Expanding R by additional packages (libraries)

7. Python applications I. [OL]

Calculations and statistics

  • Advanced NumPy and SciPy
  • Data analysis and manipulation with Pandas

8.  Python applications II. 

Directional statistics

  • Basics of directional statistics in 2D and 3D
  • Advanced analyses of 3D orientational data – APSG

9.  R applications I. [VJ]

Calculations and statistics

  • Simple geochemical recalculations
  • On usefulness of matrices
  • Descriptive statistics in R
  • Working with large and complex datasets

10. R applications II. [VJ]

  • Graphics in R – examples from whole-rock geochemistry
  • Binary diagrams and Harker plots
  • Ternary diagrams
  • Spiderplots
  • Calculating simple petrogenetic models, including graphical output
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html