SubjectsSubjects(version: 945)
Course, academic year 2023/2024
   Login via CAS
Application of Big Data Technologies in Data Science - NDBI047
Title: Aplikace Big Data technologií v Data Science
Guaranteed by: Department of Software Engineering (32-KSI)
Faculty: Faculty of Mathematics and Physics
Actual: from 2022
Semester: summer
E-Credits: 4
Hours per week, examination: summer s.:1/2, C+Ex [HT]
Capacity: unlimited
Min. number of students: unlimited
4EU+: no
Virtual mobility / capacity: no
State of the course: cancelled
Language: Czech
Teaching methods: full-time
Teaching methods: full-time
Guarantor: doc. RNDr. Irena Holubová, Ph.D.
Class: Informatika Mgr. - volitelný
Classification: Informatics > Database Systems
Annotation -
Last update: RNDr. Michal Kopecký, Ph.D. (12.05.2018)
Practically oriented course following the introductory lecture (NDBI040) on Big Data Technologies. The aim is to teach students how to use Big Data technologies from the Hadoop and Spark family to analyze and process Big Data. The course is taught by professionals from company Profinit and it is based on their experience from real-world Data Science projects in banking, telecommunication and IoT.
Course completion requirements -
Last update: Mgr. Jan Hučín (07.02.2020)

During the semester students get access to the Hadoop Cluster Metacentrum and learn how to create large computational Map/Reduce tasks.

The credit will be granted according to combination of a theoretical test and a task based on a non-trivial analysis of a larger data set.

The oral exam includes a discussion over the task concerning theoretical fundaments of Hadoop and its components.

Literature -
Last update: RNDr. Michal Kopecký, Ph.D. (12.05.2018)
  • Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale 4th Edition, by Tom White, 4nd edition, Oreilly’s, 2015
  • Applied Predictive Analytics: Principles and Techniques for the Professional Data Analyst, Dean Abbott, Wiley 2014
  • Big Data a NoSQL databáze, Irena Holubová, Jiří Kosek, Karel Minařík, David Novák, Grada 2015

Syllabus -
Last update: doc. RNDr. Irena Holubová, Ph.D. (09.02.2021)

=======================

IMPORTANT NOTIFICATION: From SS 2021 the course will not be taught. In WS 2021 it will be replaced with a new course.

=======================

1. L: Contribution of Big Data technologies in Data Science projects

P: Hadoop basics, cluster access

2. P: Technologies and knowledge for Hadoop I. -- refreshing (Linux, regular expressions, SQL)

3. L: Storing data on Hadoop -- HDFS, Hive, formats and compression

P: Storing dat on Hadoop -- practical classes

4. P: MapReduce approach and typical tasks for it

5. L: Spark RDD

P: Technologies and knowledge for Hadoop II. -- Python and usage in Spark

6. P: Spark RDD -- practical classes

7. L: Spark SQL

P: Spark RDD a SQL -- practical classes

8. P: Data Science project and big data technologies

9. no classes (holiday)

10.--14. Data Science project and big data technologies

 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html