|
|
|
||
|
The humanities have seen an irreversible paradigm shift towards Digital Humanities, based on automatic
quantitative analysis of (big) data.
We will teach you:
- to clean and structure data into neat tables;
- to discover trends, recurring patterns, and outliers
- basics of modern data visualization
We use the open-source programming language R along with its advanced RStudio IDE and tidyverse, the
globally popular collection of professional data-scientific tools.
Last update: Kuboň Vladislav, doc. RNDr., Ph.D. (05.06.2018)
|
|
||
|
The course is completed with an examination without a final test. Instead, the grading is based on your obligation fulfillment like so: Grade C: 30,000 DataCamp XP, active participation (or equivalent: each absence increases your passing limit by 1,000 DataCamp XP), one home assignment submitted in time and approved by the teacher. Grade B: 30,000 DataCamp XP, active participation (or equivalent: each absence increases your passing limit by 1,000 DataCamp XP), two home assignments submitted in time and approved by the teacher. Grade A: 30,000 DataCamp XP, active participation (or equivalent: each absence increases your passing limit by 1,000 DataCamp XP), three home assignments submitted in time and approved by the teacher.
For your limit count only DataCamp XP that you acquire in DataCamp courses listed for home assignments and in your current term. Should you have completed them in the past, you must negotiate an alternative list of Data Camp courses with the teacher in advance.
Your free DataCamp license is valid for six months since the course start and cannot be extended. You must complete your assignments within that period. No alternative assignments can be negotiated. Last update: Cinková Silvie, Mgr., Ph.D. (23.05.2025)
|
|
||
|
Hadley Wickham and Garrett Grolemund. 2017. R for Data Science. O'Reilly. Momentálně zdarma online: http://r4ds.had.co.nz/ Garrett Grolemund. 2014. Hands-On Programming with R. O'Reilly. Nina Zumel and John Mount. 2014 Practical Data Science with R. Manning. Julia Silge and David Robinson: Text Mining with R. A tidy approach. 2017. O'Reilly. Stefan Th. Gries. 2013. Statistics for Linguistics with R. A practical introduction. De Gruyter. Stefan Th. Gries. 2009. Quantitative Corpus Linguistics with R. De Gruyter. Routledge. Matthew L. Jockers. 2014. Text Analysis with R for Students of Literature. Springer. Natalia Levshina. 2015. How to do Linguistics with R. Data exploration and statistical analysis. John Benjamins. Simon Munzert, Christian Rubba, Peter Meissner, Dominic Nyhuis: Automated Data Collection with R. A Practical Guide to Web Scraping and Text Mining. 2015. Wiley.
Last update: Kuboň Vladislav, doc. RNDr., Ph.D. (05.06.2018)
|
|
||
|
Předmět je zakončen zkouškou. Zkouška neobsahuje žádný závěrečný test, ale skládá se z hodnocení studentovy práce za celý semestr podle následujících kritérií:
Dobře: 30 000 XP na DataCampu, aktivní přítomnost v hodinách (nebo ekvivalent v DataCamp XP: zameškaná hodina = 1000 XP navíc), 1 samostatný domácí úkol odevzdaný v termínu. Velmi dobře: 30 000 XP na DataCampu, aktivní přítomnost v hodinách (nebo ekvivalent v DataCamp XP: zameškaná hodina = 1000 XP navíc), 2 samostatné domácí úkoly odevzdané v termínu. Výborně: 30 000 XP na DataCampu, aktivní přítomnost v hodinách (nebo ekvivalent v DataCamp XP: zameškaná hodina = 1000 XP navíc), 3 samostatné domácí úkoly odevzdané v termínu.
Do limitu XP z DataCampu se počítají jenom body z aktuálního semestru a z předepsaných kurzů (pokud je student již vypracoval někdy v minulosti, je povinen domluvit si individuální alternativní zadání s vyučujícím).
Termín splnění studijních povinností zadaných na platformě DataCamp je omezen platností licence (přesně 6 měsíců od první rozvržené hodiny v semestru). Náhradní plnění mimo DataCamp není možné. Last update: Cinková Silvie, Mgr., Ph.D. (23.05.2025)
|
|
||
|
1. Basic concepts of R, advantages of R in data analysis as a subdiscipline of programming 2. Tables, vectors, loading a table file, vector as a table column, variable types as vector classes, selection (subsetting) of elements, rows and columns in base R 3. ggplot2 graphics library, mapping variables to aesthetic scales, types of graphs and scales (geom_, scale_ functions) 4. Data wrangling - dplyr library: selection and manipulation of rows (filter, slice, arrange) and columns (select, rename, mutate, if_else, case_when) 5. Data wrangling - groups (group_by, across, rowwise), aggregation (count, summarize) 6. Table joins (SQL-like) 7. "tidy data" concept, conversion between "wider" and "longer" table format for use with dplyr and ggplot2, tidyr (pivot_longer, pivot_wider, unite and separate) 8. Operations on strings, regular expressions incl. "look-around" 9. The concept of iteration in R: vectorization, loop, apply family functions and map family functions from the purrr library in common user situations 10. Text mining with the help of automatic syntactic annotation, interaction with the API of the UDPipe syntactic parser
Favorite datasets: gapminder (https://www.gapminder.org/data/), built-in datasets iris, diamonds, corpora Last update: Cinková Silvie, Mgr., Ph.D. (22.05.2023)
|
|
||
|
English, basic computer literacy, frustration tolerance and discipline for regular homeworks. No programming skills required. Last update: Cinková Silvie, Mgr., Ph.D. (23.05.2025)
|