SubjectsSubjects(version: 964)
Course, academic year 2024/2025
   Login via CAS
Data analysis in R and Python - MG440P44
Title: Data analysis in R and Python
Czech title: Analýza dat v prostředí R a Python
Guaranteed by: Institute of Petrology and Structural Geology (31-440)
Faculty: Faculty of Science
Actual: from 2024
Semester: winter
E-Credits: 4
Examination process: winter s.:
Hours per week, examination: winter s.:1/2, Ex [HT]
Capacity: 30
Min. number of students: 3
4EU+: no
Virtual mobility / capacity: no
State of the course: taught
Language: English, Czech
Note: enabled for web enrollment
Guarantor: prof. Mgr. Vojtěch Janoušek, Ph.D.
doc. Mgr. Ondrej Lexa, Ph.D.
Teacher(s): prof. Mgr. Vojtěch Janoušek, Ph.D.
doc. Mgr. Ondrej Lexa, Ph.D.
Annotation -
The course is taught in English when at least one international student is enrolled. This practical course is aimed at senior undergraduate and postgraduate students. It is intended to: a) explain fundamentals of data processing and visualization in geology as well as functioning of computing algorithms in general; b) present basics of the R and Python programming languages; c) illustrate the usability and versatility of both languages for everyday calculations, as well as for production of publication-quality graphics; d) demonstrate examples of using both languages in reproducible research (with certain structural geology and whole-rock geochemistry bias).
Last update: Janoušek Vojtěch, prof. Mgr., Ph.D. (04.03.2019)
Literature -

Learning materials (only for students):

Google Classroom

Web links:

de Vries A: Using R with Jupyter Notebooks http://blog.revolutionanalytics.com/2015/09/using-r-with-jupyter-notebooks.html

Jupyter: Open source, interactive data science and scientific computing across over 40 programming languages http://jupyter.org/

The R Project for Statistical Computing https://www.r-project.org/

Dive into Python 3 http://www.diveintopython3.net/

Scientific Python Lecture Notes http://www.scipy-lectures.org

Wikipedie: R (programming language) https://en.wikipedia.org/wiki/R_(programming_language)

Wikipedie: Python (programming language) https://en.wikipedia.org/wiki/Python_(programming_language)

 

Literature:

Becker RA, Chambers JM, Wilks AR (1988) The New S Language. Chapman & Hall, London, pp 1-702

Crawley MJ (2007) The R book. John Wiley & Sons, Chichester, pp 1-942

Janoušek V, Moyen JF, Martin H, Erban V, Farrow C (2016) Geochemical Modelling of Igneous Processes - Principles and Recipes in R Language. Bringing the Power of R to a Geochemical Community. Springer-Verlag, Berlin, Heidelberg, pp 1-346

Langtangen, H P (2016) A Primer On Scientific Programming With Python, Texts in Computational Science and Engineering, pp 1-992

Maindonald J, Braun J (2003) Data Analysis and Graphics Using R. Cambridge University Press, Cambridge, pp 1-386

Murrell P (2005) R Graphics. Chapman & Hall/CRC, London, pp 1-328

Rollinson HR (1993) Using Geochemical Data: Evaluation, Presentation, Interpretation. Longman, London, pp 1-352

Rossant C (2015) Learning IPython for Interactive Computing and Data Visualization - Second Edition, Packt Publishing, pp 1-175

Last update: Janoušek Vojtěch, prof. Mgr., Ph.D. (25.09.2024)
Requirements to the exam -

The examination is a practical test, whereby the participants are required to write several short programs in the R and Python programming languages.

Last update: Jeřábek Petr, doc. RNDr., Ph.D. (14.03.2019)
Syllabus -

1 Introduction to data analysis and algorithmization I. [OL]

  • Reproducible research
  • Data Analysis in Earth Sciences
  • Why Python?
  • Let’s install out scientific computing environment

2. Introduction to data analysis and algorithmization II. [VJ]

  • Why R? – a bit of history and its current upswing
  • How does the computer programme work?
  • Fundamental data types, algorithmization, typical parts of a computer programme, object-oriented programming

3.  Fundamentals of the Python language I. [OL] 

Introduction to Jupyter Notebooks and JupyterLab

Python crash course, basics of Python programming

  • Variables and simple data types
  • Advanced datatypes
  • Built-in functions and operators
  • Blocks and loops
  • User-defined functions
  • Errors and exceptions

4.  Fundamentals of the Python language II. [OL]

Scientific Python

 

  • Introduction to NumPy

  • Visualizations with Matplotlib and Seaborn
  • Data input and output

5. Fundamentals of the R language I. [VJ]

 

Introduction, fundamental data types and basic operations with them

  • Interactive/batch mode
  • Help and documentation
  • Main data types, attributes
  • Vectors
  • Matrices and arrays
  • Factors
  • Lists

6. Fundamentals of the R language II. [VJ]

Programming and graphics

  • Data import and output from/to files
  • Graphical functions and their main parameters
  • Printing and exporting graphics (PDF, PostScript…)
  • Programming in R – conditional execution, loops, user-defined functions
  • R community, CRAN, mailing lists, useR! conferences
  • Expanding R by additional packages (libraries)

7. Python applications I. [OL]

Calculations and statistics

  • Advanced NumPy and SciPy
  • Data analysis and manipulation with Pandas

8.  Python applications II. 

Directional statistics

  • Basics of directional statistics in 2D and 3D
  • Advanced analyses of 3D orientational data – APSG

9.  R applications I. [VJ]

Calculations and statistics

  • Simple geochemical recalculations
  • On usefulness of matrices
  • Descriptive statistics in R
  • Working with large and complex datasets

10. R applications II. [VJ]

  • Graphics in R – examples from whole-rock geochemistry
  • Binary diagrams and Harker plots
  • Ternary diagrams
  • Spiderplots
  • Calculating simple petrogenetic models, including graphical output
Last update: Janoušek Vojtěch, prof. Mgr., Ph.D. (01.10.2020)
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html