SubjectsSubjects(version: 953)
Course, academic year 2023/2024
   Login via CAS
Streaming algorithms for Big Data - NTIN114
Title: Proudové algoritmy pro velká data
Guaranteed by: Computer Science Institute of Charles University (32-IUUK)
Faculty: Faculty of Mathematics and Physics
Actual: from 2023 to 2023
Semester: summer
E-Credits: 3
Hours per week, examination: summer s.:2/0, Ex [HT]
Capacity: unlimited
Min. number of students: unlimited
4EU+: no
Virtual mobility / capacity: no
State of the course: taught
Language: Czech
Teaching methods: full-time
Teaching methods: full-time
Guarantor: Mgr. Pavel Veselý, Ph.D.
Class: Informatika Mgr. - Teoretická informatika
Classification: Informatics > Theoretical Computer Science
Annotation -
This course focuses on streaming algorithms that are used to sequentially process massive datasets using low memory. As a result, they produce a concise summary of the dataset that approximately preserves relevant data properties. We will cover basic problems such as counting distinct elements, estimating quantiles and random sampling, as well as algorithms for graph or geometric streams. We will also discuss how to prove lower bounds in this model using communication complexity. Since streaming algorithms typically employ randomness, we will assume knowledge on the level of the course
Last update: Pangrác Ondřej, RNDr., Ph.D. (15.05.2023)
Literature
  • Graham Cormode, Ke Yi: Small summaries for Big Data, 2020.
  • Amit Chakrabarti: Lecture Notes on Data Stream Algorithms, 2020.
  • Jelani Nelson: Sketching Algorithms (lecture notes), 2020.

Last update: Pangrác Ondřej, RNDr., Ph.D. (15.05.2023)
Syllabus -
  • approximate counting with less than log(n) bits
  • counting distinct elements and frequency moments
  • estimating quantiles and distribution functions
  • random sampling based on items' frequencies
  • lower bounds using communication complexity
  • streaming algorithms for graphs: matching, counting triangles, etc.
  • geometric streaming algorithms: estimating the diameter, clustering using

coresets

  • streaming algorithms for vectors and matrices
  • adversarially robust streaming algorithms

Last update: Pangrác Ondřej, RNDr., Ph.D. (15.05.2023)
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html