Data Science with applications in R course covering the advanced topics and following the Data Science with R I course. Data Science with R II covers clustering, text mining, support vector machines, neural networks, and networks.
Poslední úprava: Čuprová Michaela, Mgr. (02.02.2020)
Data Science with applications in R course covering the advanced topics and following the Data Science with R I course. Data Science with R II covers clustering, text mining, support vector machines, neural networks, and networks. The main aim of the course is to train students to be able to properly analyze specific datasets with methods outside of standard econometric framework using the R programming environment.
Poslední úprava: Bednařík Petr, PhDr., Ph.D. (06.06.2020)
Cíl předmětu -
Please switch to the english version.
Poslední úprava: SCHNELLEROVA (25.10.2019)
The main aim of the course is to train students to be able to properly analyze specific datasets with methods outside of standard econometric framework using the R programming environment.
Poslední úprava: SCHNELLEROVA (02.09.2019)
Literatura -
Mandatory literature:
Ledolter, Johannes (2013): Data Mining and Business Analytics with R, Hoboken, New Jersey: John Wiley & Sons.
Toomey, Dan (2014): R for Data Science, Birmingham: Packt Publishing Ltd.
Zumel, Nina & Mount, John (2014): Practical Data Science with R, Shelter Island, New York: Manning Publications Co..
Additional suggested literature:
Grolemung, Garret (2014): Hands-On Programming with R, Sebastopol: O'Reilly Media Inc.
Ojeda, Tony et al. (2014): Practical Data Science Cookbook, Birmingham: Packt Publishing Ltd.
Poslední úprava: Bednařík Petr, PhDr., Ph.D. (15.05.2020)
Mandatory literature:
Ledolter, Johannes (2013): Data Mining and Business Analytics with R, Hoboken, New Jersey: John Wiley & Sons.
Toomey, Dan (2014): R for Data Science, Birmingham: Packt Publishing Ltd.
Zumel, Nina & Mount, John (2014): Practical Data Science with R, Shelter Island, New York: Manning Publications Co.
Kabacoff, Robert (2015): R in Action, Shelter Island, New York: Manning Publications Co.
Additional suggested literature:
Grolemung, Garret (2014): Hands-On Programming with R, Sebastopol: O'Reilly Media Inc.
Ojeda, Tony et al. (2014): Practical Data Science Cookbook, Birmingham: Packt Publishing Ltd.
Poslední úprava: Krištoufek Ladislav, prof. PhDr., Ph.D. (18.02.2024)
Metody výuky -
Please switch to the english version.
Poslední úprava: SCHNELLEROVA (25.10.2019)
LECTURES & SEMINARS:
Online, pre-recorded videos -LINKS ARE AVAILABLE IN THE SYLLABUS SECTION.
Q&A SESSIONS:
Offline, in-person (Room 016, 9:30 AM CET, Wednesdays): 12 March (Week 4, IT), 2 April (Week 7, LK), 16 April (Week 9, LK), 30 April (Week 11, LK), 7 May (Week 12, LK)
These can be moved to online depending on attendance and other circumstances.
If you have questions prepared for the Q&A session, please send them beforehand if possible.
Software: R/R Studio (freeware, available on all 016 computers)
Poslední úprava: Krištoufek Ladislav, prof. PhDr., Ph.D. (18.02.2025)
Požadavky ke zkoušce -
Please switch to the english version.
Poslední úprava: SCHNELLEROVA (25.10.2019)
The final grade consists of four ingredients:
DataCamp Skill Tracks: 40 points (4 x 10)
DataCamp Projects: 40 points (4 x 10)
Paper presentation: 20 points
Grading scale (according to Dean's Provision 17/2018):
A: above 90 (not inclusive)
B: between 80 (not inclusive) and 90 (inclusive)
C: between 70 (not inclusive) and 80 (inclusive)
D: between 60 (not inclusive) and 70 (inclusive)
E: between 50 (not inclusive) and 60 (inclusive)
F: below 50 (inclusive)
DataCamp.com assignments (use this link to enroll to the DataCamp class; use your @fsv.cuni.cz email to register, exchange students should contact me for an invite):
Assignment #1 - by the end of Week #9:
Course: Unsupervised Learning in R
Assignment #2 - by the end of Week #11:
Skill Track: Text Mining
Assignment #3 - by the end of Week #13:
Skill Track: Network Analysis
Assignment #4 - by the end of Week #15:
Skill Track: MLOps Fundamentals
DataCamp.com projects (use this link to enroll to the DataCamp class; use your @fsv.cuni.cz email to register, exchange students should contact me for an invite):
Project #1 - by the end of Week #7, one of the following:
What Makes a Pokémon Legendary
Predict Taxi Fares with Random Forests
Project #2 - by the end of Week #9, one of the following:
Degrees That Pay You Back
Clustering Heart Disease Patient Data
Assignment #3 - by the end of Week #13:
A Text Analysis of Trump's Tweets
Assignment #4 - by the end of Week #15:
Partnering to Protect You from Peril
Paper presentation:
Link to the presentation video (Google Drive, DropBox, YouTube, whichever you choose) to be submitted via SIS by the end of Week #15
You may work in pairs
Presentation video:
Find a research paper on your topic of interest (topics from either Data Science with R I or II, not simple regression models, though).
Find topical literature.
Present the paper, motivation, methodology, main results, how it contributes to the literature, and, importantly, propose how you would extend the paper or if you found any issues with the paper.
Make sure that the work split as well as the presentation split is approximately uniformly distributed in the pair.
Up to 15 minutes long (strict). Longer video means penalization of 50%.
Late submission means 100% penalization (strict).
Poslední úprava: Krištoufek Ladislav, prof. PhDr., Ph.D. (18.02.2025)
Sylabus -
Please switch to the english version.
Poslední úprava: Krištoufek Ladislav, prof. PhDr., Ph.D. (18.02.2024)
All online videos (lectures and coding) via Loom (respecting the planned scheduled):
BLOCK I - What remains in Supervised learning (Week 1)