Subjects

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Data Intensive Computing - NPFL102

Title:	Distribuované zpracování rozsáhlých dat
Guaranteed by:	Institute of Formal and Applied Linguistics (32-UFAL)
Faculty:	Faculty of Mathematics and Physics
Actual:	from 2016 to 2016
Semester:	summer
E-Credits:	3
Hours per week, examination:	summer s.:0/2, C [HT]
Capacity:	unlimited
Min. number of students:	unlimited
4EU+:	no
Virtual mobility / capacity:	no
State of the course:	not taught
Language:	Czech, English
Teaching methods:	full-time
Teaching methods:	full-time
Note:	course can be enrolled in outside the study plan enabled for web enrollment

Guarantor:	RNDr. Milan Straka, Ph.D.

Opinion survey results Examination dates Schedule Noticeboard

Annotation -

Last update: T_UFAL (05.05.2015)

The course introduces methods used for processing huge data sets in distributed environment. Technological difficulties occurring in such environments are explained in the introductory sessions. The presentation of the (Sun/Oracle/Son of) Grid Engine will follow. Then the MapReduce framework will be introduced. The main part of the course will be devoted to the Apache Spark framework, which is a spiritual successor to Hadoop. Depending on the audience interest, the final sessions can be devoted to OpenMPI framework or distributed machine learning algorithms (MLlib, Mahout, Vowpal Wabbit).

Literature -

Last update: T_UFAL (05.05.2015)

Data-Intensive Text Processing with MapReduce; Jimmy Lin and Chris Dyer.; Morgan & Claypool Publishers, 2010

Hadoop: The Definitive Guide; Tom White; 2010

Son of Grid Engine - https://arc.liv.ac.uk/trac/SGE

Apache Spark - https://spark.apache.org/

OpenMPI - http://www.open-mpi.org/

Apache Mahout - https://mahout.apache.org/

Vowpal Wabbit - https://github.com/JohnLangford/vowpal_wabbit/wiki

Syllabus -

Last update: T_UFAL (05.05.2015)

Technological difficulties with processing big data

(Sun/Oracle/Son of) Grid Engine - architecture, commands

MapReduce Framework - principles

Apache Spark - architecture, algorithm implementation

optionally OpenMPI - architecture, provided operations

optionally Mahout, Vowpal Wabbit - machine learning algorithms