Data Science with applications in R course covering the advanced topics and following the Data Science with R I course. Data Science with R II covers clustering, text mining, support vector machines, neural networks, and networks. The main aim of the course is to train students to be able to properly analyze specific datasets with methods outside of standard econometric framework using the R programming environment.
Last update: Bednařík Petr, PhDr., Ph.D. (06.06.2020)
Data Science with applications in R course covering the advanced topics and following the Data Science with R I course. Data Science with R II covers clustering, text mining, support vector machines, neural networks, and networks.
Last update: Čuprová Michaela, Mgr. (02.02.2020)
Aim of the course -
The main aim of the course is to train students to be able to properly analyze specific datasets with methods outside of standard econometric framework using the R programming environment.
Last update: SCHNELLEROVA (02.09.2019)
Please switch to the english version.
Last update: SCHNELLEROVA (25.10.2019)
Literature -
Mandatory literature:
Ledolter, Johannes (2013): Data Mining and Business Analytics with R, Hoboken, New Jersey: John Wiley & Sons.
Toomey, Dan (2014): R for Data Science, Birmingham: Packt Publishing Ltd.
Zumel, Nina & Mount, John (2014): Practical Data Science with R, Shelter Island, New York: Manning Publications Co.
Kabacoff, Robert (2015): R in Action, Shelter Island, New York: Manning Publications Co.
Additional suggested literature:
Grolemung, Garret (2014): Hands-On Programming with R, Sebastopol: O'Reilly Media Inc.
Ojeda, Tony et al. (2014): Practical Data Science Cookbook, Birmingham: Packt Publishing Ltd.
Last update: Krištoufek Ladislav, prof. PhDr., Ph.D. (18.02.2024)
Mandatory literature:
Ledolter, Johannes (2013): Data Mining and Business Analytics with R, Hoboken, New Jersey: John Wiley & Sons.
Toomey, Dan (2014): R for Data Science, Birmingham: Packt Publishing Ltd.
Zumel, Nina & Mount, John (2014): Practical Data Science with R, Shelter Island, New York: Manning Publications Co..
Additional suggested literature:
Grolemung, Garret (2014): Hands-On Programming with R, Sebastopol: O'Reilly Media Inc.
Ojeda, Tony et al. (2014): Practical Data Science Cookbook, Birmingham: Packt Publishing Ltd.
Last update: Bednařík Petr, PhDr., Ph.D. (15.05.2020)
Teaching methods -
LECTURES & SEMINARS:
Online, pre-recorded videos -LINKS ARE AVAILABLE IN THE SYLLABUS SECTION.
Q&A SESSIONS:
Offline, in-person (Room 016, 9:30 AM CET, Wednesdays): 12 March (Week 4, IT), 2 April (Week 7, LK), 16 April (Week 9, LK), 30 April (Week 11, LK), 7 May (Week 12, LK)
These can be moved to online depending on attendance and other circumstances.
If you have questions prepared for the Q&A session, please send them beforehand if possible.
Software: R/R Studio (freeware, available on all 016 computers)
Last update: Krištoufek Ladislav, prof. PhDr., Ph.D. (18.02.2025)
Please switch to the english version.
Last update: SCHNELLEROVA (25.10.2019)
Requirements to the exam -
The final grade consists of four ingredients:
DataCamp Skill Tracks: 40 points (4 x 10)
DataCamp Projects: 40 points (4 x 10)
Paper presentation: 20 points
Grading scale (according to Dean's Provision 17/2018):
A: above 90 (not inclusive)
B: between 80 (not inclusive) and 90 (inclusive)
C: between 70 (not inclusive) and 80 (inclusive)
D: between 60 (not inclusive) and 70 (inclusive)
E: between 50 (not inclusive) and 60 (inclusive)
F: below 50 (inclusive)
DataCamp.com assignments (use this link to enroll to the DataCamp class; use your @fsv.cuni.cz email to register, exchange students should contact me for an invite):
Assignment #1 - by the end of Week #9:
Course: Unsupervised Learning in R
Assignment #2 - by the end of Week #11:
Skill Track: Text Mining
Assignment #3 - by the end of Week #13:
Skill Track: Network Analysis
Assignment #4 - by the end of Week #15:
Skill Track: MLOps Fundamentals
DataCamp.com projects (use this link to enroll to the DataCamp class; use your @fsv.cuni.cz email to register, exchange students should contact me for an invite):
Project #1 - by the end of Week #7, one of the following:
What Makes a Pokémon Legendary
Predict Taxi Fares with Random Forests
Project #2 - by the end of Week #9, one of the following:
Degrees That Pay You Back
Clustering Heart Disease Patient Data
Assignment #3 - by the end of Week #13:
A Text Analysis of Trump's Tweets
Assignment #4 - by the end of Week #15:
Partnering to Protect You from Peril
Paper presentation:
Link to the presentation video (Google Drive, DropBox, YouTube, whichever you choose) to be submitted via SIS by the end of Week #15
You may work in pairs
Presentation video:
Find a research paper on your topic of interest (topics from either Data Science with R I or II, not simple regression models, though).
Find topical literature.
Present the paper, motivation, methodology, main results, how it contributes to the literature, and, importantly, propose how you would extend the paper or if you found any issues with the paper.
Make sure that the work split as well as the presentation split is approximately uniformly distributed in the pair.
Up to 15 minutes long (strict). Longer video means penalization of 50%.
Late submission means 100% penalization (strict).
Last update: Krištoufek Ladislav, prof. PhDr., Ph.D. (18.02.2025)
Please switch to the english version.
Last update: SCHNELLEROVA (25.10.2019)
Syllabus -
All online videos (lectures and coding) via Loom (respecting the planned scheduled):
BLOCK I - What remains in Supervised learning (Week 1)