Subjects

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Application of Big Data Technologies in Data Science - NDBI047

Title:	Aplikace Big Data technologií v Data Science
Guaranteed by:	Department of Software Engineering (32-KSI)
Faculty:	Faculty of Mathematics and Physics
Actual:	from 2022
Semester:	summer
E-Credits:	4
Hours per week, examination:	summer s.:1/2, C+Ex [HT]
Capacity:	unlimited
Min. number of students:	unlimited
4EU+:	no
Virtual mobility / capacity:	no
State of the course:	cancelled
Language:	Czech
Teaching methods:	full-time
Teaching methods:	full-time

Guarantor:	doc. RNDr. Irena Holubová, Ph.D.
Class:	Informatika Mgr. - volitelný
Classification:	Informatics > Database Systems

Opinion survey results Examination dates Schedule Noticeboard

Annotation -

Last update: RNDr. Michal Kopecký, Ph.D. (12.05.2018)

Practically oriented course following the introductory lecture (NDBI040) on Big Data Technologies. The aim is to teach students how to use Big Data technologies from the Hadoop and Spark family to analyze and process Big Data. The course is taught by professionals from company Profinit and it is based on their experience from real-world Data Science projects in banking, telecommunication and IoT.

Course completion requirements -

Last update: Mgr. Jan Hučín (07.02.2020)

During the semester students get access to the Hadoop Cluster Metacentrum and learn how to create large computational Map/Reduce tasks.

The credit will be granted according to combination of a theoretical test and a task based on a non-trivial analysis of a larger data set.

The oral exam includes a discussion over the task concerning theoretical fundaments of Hadoop and its components.

Literature -

Last update: RNDr. Michal Kopecký, Ph.D. (12.05.2018)

Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale 4th Edition, by Tom White, 4nd edition, Oreilly’s, 2015

Applied Predictive Analytics: Principles and Techniques for the Professional Data Analyst, Dean Abbott, Wiley 2014

Big Data a NoSQL databáze, Irena Holubová, Jiří Kosek, Karel Minařík, David Novák, Grada 2015

Syllabus -

Last update: doc. RNDr. Irena Holubová, Ph.D. (09.02.2021)

=======================

IMPORTANT NOTIFICATION: From SS 2021 the course will not be taught. In WS 2021 it will be replaced with a new course.

=======================

1. L: Contribution of Big Data technologies in Data Science projects

P: Hadoop basics, cluster access

2. P: Technologies and knowledge for Hadoop I. -- refreshing (Linux, regular expressions, SQL)

3. L: Storing data on Hadoop -- HDFS, Hive, formats and compression

P: Storing dat on Hadoop -- practical classes

4. P: MapReduce approach and typical tasks for it

5. L: Spark RDD

P: Technologies and knowledge for Hadoop II. -- Python and usage in Spark

6. P: Spark RDD -- practical classes

7. L: Spark SQL

P: Spark RDD a SQL -- practical classes

8. P: Data Science project and big data technologies

9. no classes (holiday)

10.--14. Data Science project and big data technologies