SubjectsSubjects(version: 901)
Course, academic year 2021/2022
  
Data Science - NDBI048
Title: Data Science
Guaranteed by: Department of Software Engineering (32-KSI)
Faculty: Faculty of Mathematics and Physics
Actual: from 2021
Semester: winter
E-Credits: 5
Hours per week, examination: winter s.:2/2 C+Ex [hours/week]
Capacity: unlimited
Min. number of students: unlimited
Virtual mobility / capacity: no
State of the course: taught
Language: Czech
Teaching methods: full-time
Additional information: https://www.ksi.mff.cuni.cz/~holubova/NDBI048/
Guarantor: doc. RNDr. Irena Holubová, Ph.D.
Class: Informatika Mgr. - Softwarové systémy
Classification: Informatics > Database Systems
Annotation -
Last update: RNDr. Filip Zavoral, Ph.D. (17.03.2021)
The course will provide a practical introduction to data science. The lectures will discuss phases of the data science project, related technologies and methods. In the practicals, the individual steps will be applied to real- world data. Part of the lectures will also focus on the specifics of Big Data. The added value will be practical experience from data science projects of the Profinit company, hardly found in textbooks. The course is intended for students of specialization Big Data Processing and also other specializations who want to gain a basic overview of the field of data science.
Course completion requirements -
Last update: RNDr. Filip Zavoral, Ph.D. (16.03.2021)

During the practicals students will receive (or choose and have approved by the instructors) a suitable real-world data set. Using them the students will gradually experiment with methods discussed in the lectures. The results of continuous data processing will be described in the form of two written reports (in the middle and at the end of the semester), which will be evaluated using points. The credit will be awarded for a required minimum amount of points. The points above the limit will be added to the points gained from the written exam text.

Literature - Czech
Last update: RNDr. Filip Zavoral, Ph.D. (16.03.2021)

Sinan Ozdemir: Principles of Data Science

Tony Ojeda, Sean Patrick Murphy, Benjamin Bengfort, Abhijit Dasgupta: Practical Data Science Cookbook

Frank Kane: Hands-On Data Science and Python Machine Learning

Syllabus -
Last update: RNDr. Filip Zavoral, Ph.D. (16.03.2021)

What is data science, typical use cases. Data science decathlon (an overview of related methods, algorithms and technologies). Map of follow-up lectures, organization of the course, requirements for credit / exam.

Motivation and problems of data science - a view from industry. Limits of statistical methods, distortion.

Technologies for data science I: overview of popular representatives (technology stack), Python and data science.

Phases of a data science project, methodology CRISP-DM. Business understanding, data understanding.

Methods of data exploration and visualization.

Creating a useful and understandable report.

Data preparation (cleaning, transformation, feature extraction, ...).

Modeling I: basic statistical models and performance evaluation.

Modeling II: applied Bayesianism.

Data science in modern database systems.

Big Data science, MapReduce and data science.

Apache Spark and data science.

Technologies for data science II: MLops versioning, documentation, ...

Business view of a data science project.

 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html