SubjectsSubjects(version: 945)
Course, academic year 2023/2024
   Login via CAS
Data Science with R II - JEM220
Title: Data Science with R II
Czech title: Data Science with R II
Guaranteed by: Institute of Economic Studies (23-IES)
Faculty: Faculty of Social Sciences
Actual: from 2021
Semester: summer
E-Credits: 6
Examination process: summer s.:combined
Hours per week, examination: summer s.:2/0, Ex [HT]
Capacity: unlimited / unknown (unknown)
Min. number of students: unlimited
4EU+: no
Virtual mobility / capacity: no
State of the course: taught
Language: English
Teaching methods: full-time
Teaching methods: full-time
Note: course can be enrolled in outside the study plan
enabled for web enrollment
priority enrollment if the course is part of the study plan
Guarantor: prof. PhDr. Ladislav Krištoufek, Ph.D.
Teacher(s): prof. PhDr. Ladislav Krištoufek, Ph.D.
Mgr. Ivan Trubelík
Class: Courses for incoming students
Pre-requisite : {Skupina prerekvizit pro JEM220 (JEM221 nebo JEM227)}
Annotation -
Last update: PhDr. Petr Bednařík, Ph.D. (06.06.2020)
Data Science with applications in R course covering the advanced topics and following the Data Science with R I course. Data Science with R II covers clustering, text mining, support vector machines, neural networks, and networks. The main aim of the course is to train students to be able to properly analyze specific datasets with methods outside of standard econometric framework using the R programming environment.
Aim of the course -
Last update: SCHNELLEROVA (02.09.2019)

The main aim of the course is to train students to be able to properly analyze specific datasets with methods outside of standard econometric framework using the R programming environment.

Literature -
Last update: prof. PhDr. Ladislav Krištoufek, Ph.D. (18.02.2024)

Mandatory literature:

  • Ledolter, Johannes (2013): Data Mining and Business Analytics with R, Hoboken, New Jersey: John Wiley & Sons.
  • Toomey, Dan (2014): R for Data Science, Birmingham: Packt Publishing Ltd.
  • Zumel, Nina & Mount, John (2014): Practical Data Science with R, Shelter Island, New York: Manning Publications Co.
  • Kabacoff, Robert (2015): R in Action, Shelter Island, New York: Manning Publications Co.

Additional suggested literature:

  • Grolemung, Garret (2014): Hands-On Programming with R, Sebastopol: O'Reilly Media Inc.
  • Ojeda, Tony et al. (2014): Practical Data Science Cookbook, Birmingham: Packt Publishing Ltd.
Teaching methods -
Last update: prof. PhDr. Ladislav Krištoufek, Ph.D. (20.02.2024)

Lectures/seminars: Online, pre-recorded videos - HERE.
Mass consultations:

  • Offline, in-person (Room 016, 9:30 AM CET): 13 March (Week 4), 10 April (Week 8), 17 April (Week 9), 1 May (Week 11), 15 May (Week 13)
  • These can be moved to online depending on attendance and other circumstances.
  • If you have questions prepared for the mass consultations, please send them beforehand if possible.

Software: R/R Studio (freeware, available on all 016 computers)

Requirements to the exam -
Last update: prof. PhDr. Ladislav Krištoufek, Ph.D. (18.02.2024)

The final grade consists of four ingredients:

  • DataCamp Skill Tracks: 40 points (4 x 10)
  • DataCamp Projects: 40 points (4 x 10)
  • Paper presentation: 20 points

Grading scale (according to Dean's Provision 17/2018):

  • A: above 90 (not inclusive)
  • B: between 80 (not inclusive) and 90 (inclusive)
  • C: between 70 (not inclusive) and 80 (inclusive)
  • D: between 60 (not inclusive) and 70 (inclusive)
  • E: between 50 (not inclusive) and 60 (inclusive)
  • F: below 50 (inclusive)

DataCamp.com assignments (use this link to enroll to the DataCamp class; use your @fsv.cuni.cz email to register, exchange students should contact me for an invite):

  • Assignment #1 - by the end of Week #9:
    • Course: Unsupervised Learning in R
  • Assignment #2 - by the end of Week #11:
    • Skill Track: Text Mining
  • Assignment #3 - by the end of Week #13:
    • Skill Track: Network Analysis
  • Assignment #4 - by the end of Week #15:
    • Skill Track: MLOps Fundamentals

DataCamp.com projects (use this link to enroll to the DataCamp class; use your @fsv.cuni.cz email to register, exchange students should contact me for an invite):

  • Project #1 - by the end of Week #7, one of the following:
    • What Makes a Pokémon Legendary
    • Predict Taxi Fares with Random Forests
  • Project #2 - by the end of Week #9, one of the following:
    • Degrees That Pay You Back
    • Clustering Heart Disease Patient Data
  • Assignment #3 - by the end of Week #13:
    • A Text Analysis of Trump's Tweets
  • Assignment #4 - by the end of Week #15:
    • Partnering to Protect You from Peril

Paper presentation:

  • Link to the presentation video (Google Drive, DropBox, YouTube, whichever you choose) to be submitted via SIS by the end of Week #15
  • You may work in pairs
  • Presentation video:
    • Find a research paper on your topic of interest (topics from either Data Science with R I or II, not simple regression models, though).
    • Find topical literature.
    • Present the paper, motivation, methodology, main results, how it contributes to the literature, and, importantly, propose how you would extend the paper or if you found any issues with the paper.
    • Make sure that the work split as well as the presentation split is approximately uniformly distributed in the pair.
    • Up to 15 minutes long (strict). Longer video means penalization of 50%.
    • Late submission means 100% penalization (strict).
Syllabus -
Last update: prof. PhDr. Ladislav Krištoufek, Ph.D. (18.02.2024)

 

All online videos (lectures and coding) via Loom (respecting the planned scheduled): link will come.

BLOCK I - What remains in Supervised learning (Week 1)

  • Neural networks (T 10, L 14)

BLOCK II - Unsupervised learning (Weeks 2-6)

  • Clustering (T1, ZM 8, L 15)
  • Association rules (ZM 8, L 16) 
  • Principal component analysis and principal regression (L 17-18)

BLOCK III - Text mining (Weeks 7-8)

  • Text mining (T 2-3, L 19)

BLOCK IV - Network analysis (Weeks 9-10)

  • Network analysis (L 20)

BLOCK V - Documentation and presentation of results (Weeks 11-13)

  • Documentation and deployment (ZM 10)
  • Creating a package (K 20)
  • Producing effective presentations (ZM 11)
  • Creating dynamic reports (K 21)
Entry requirements -
Last update: prof. PhDr. Ladislav Krištoufek, Ph.D. (26.09.2019)

Data Science with R I (JEM221) or the previous Data Science with R (JEM181) are prerequisites of the course.

Registration requirements -
Last update: prof. PhDr. Ladislav Krištoufek, Ph.D. (26.09.2019)

Data Science with R I (JEM221) or the previous Data Science with R (JEM181) are prerequisites of the course.

 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html