SubjectsSubjects(version: 941)
Course, academic year 2022/2023
   Login via CAS
Theory of Statistical Analysis in R for Linguists - NPFL111
Title: Teoretické základy statistického vyhodnocování jazykových dat v R
Guaranteed by: Institute of Formal and Applied Linguistics (32-UFAL)
Faculty: Faculty of Mathematics and Physics
Actual: from 2022
Semester: summer
E-Credits: 3
Hours per week, examination: summer s.:2/0, Ex [HT]
Capacity: unlimited
Min. number of students: unlimited
Virtual mobility / capacity: no
State of the course: not taught
Language: Czech, English
Teaching methods: full-time
Teaching methods: full-time
Guarantor: Mgr. Silvie Cinková, Ph.D.
Class: DS, matematická lingvistika
Informatika Mgr. - Matematická lingvistika
Classification: Informatics > Computer and Formal Linguistics
Incompatibility : NPFL137
Interchangeability : NPFL137
Is incompatible with: NPFL137
Is interchangeable with: NPFL137
Annotation -
Last update: G_I (26.05.2015)
More advanced students of corpus linguistics, who have already participated in any basic corpus linguistic seminar, can use this course to deepen their competence in statistical data analysis. The course focuses on the statistical theory (in particular issues of corpus linguistics and specific distributions of language data) as well as on relevant computational skills for data analytics using R. The course requires common computer user skills (no explicit programming background).
Course completion requirements -
Last update: Mgr. Silvie Cinková, Ph.D. (12.05.2022)

active participation in the lessons (max 3 absences)

all homeworks submitted within deadlines

if DataCamp is used (free for students) the student is obliged to collect 20 000 XP during our course. These points must come from the following R courses:

Introduction to R

Intermediate R

Data Manipulation in R with dplyr

Cleaning Data in R

Data Visualization with ggplot2 Part 1

Data Visualization with ggplot2 Part 2

Working with the RStudio IDE Part 1, Part 2

Importing Data in R Part 1

In case the student has already completed these courses before, they must collect 20 000 XP from other R courses.

Any individual exceptions are up to the teachers.

Literature -
Last update: G_I (26.05.2015)

Baayen, H. R.: Analyzing Linguistic Data, Cambridge University Press, Cambridge 2008.

Baayen, H. R.: Word Frequency Distributions. Kluwer Academic Publishers. Dordrecht/Boston/London 2010.

Bartoň, T. - Cvrček, V. - Čermák, F. - Jelínek, T. - Petkevič, V. (2009): Statistiky češtiny. Nakladatelství Lidové

noviny, Praha 2009.

Gries, S. Th.: Quantitative Corpus Linguistics with R, Routledge 2009.

Gries, S. Th.: Statistics for Linguistics with R. A Practical Introduction. Mouton De Gruyter 2013 (2nd revised edition).

Oakes, M. P.: Statistics for Corpus Linguistics. Edinburgh University Press, Edinburgh 1998.

Volín, J. (2007): Statistické metody ve fonetickém výzkumu. Praha: Epocha.

Syllabus -
Last update: T_UFAL (13.05.2014)

1. Typical topics of corpus studies, relevance of quantitative methods in linguistics, hypothesis formulation

2. Basic functions of R:

  • file loading, writing, saving
  • functions and arguments
  • vectors, factors, lists, data frames: generation, loading, saving, editing
  • data navigation, regular expressions
  • Descriptive statistics: basic concepts and functions in R

Charles University | Information system of Charles University |