Thesis (Selection of subject)Thesis (Selection of subject)(version: 285)
Assignment details
   Login via CAS
Data Compression in NoSQL Document Databases
Thesis title in Czech: Data Compression in NoSQL Document Databases
Thesis title in English: Data Compression in NoSQL Document Databases
Key words: Big Data, JSON, data compression, document databases
English key words: Big Data, JSON, data compression, document databases
Academic year of topic announcement: 2018/2019
Type of assignment: diploma thesis
Thesis language: angličtina
Department: Department of Software Engineering (32-KSI)
Supervisor: doc. RNDr. Irena Holubová, Ph.D.
Author:
Guidelines
Big Data denote a type of data sets whose high volume, velocity, variety and veracity require special, distributed NoSQL databases. One of their type, so-called document databases, enable to store semi-structured data, usually in the JSON or XML format. At the same time, there currently exist numerous approaches for compression of semi-structured (usually XML) data. Even some of the document databases (such as, e.g., MongoDB) currently try to involve a kind of data compression feature.

The aim of the thesis is to analyze the XML compression approaches with regards to the specifics of the distributed NoSQL document databases (i.e. replication, weak consistency of data, mutual references, duplicities, schemalessness etc.). On the basis of the analysis, the author will then propose and implement a respective extension of a selected document database (e.g. MongoDB) based on data compression. The features of the proposal will be demonstrated experimentally.
References
Holubová, I. - Kosek, J. - Minařík, K. - Novák, D.: Big Data a NoSQL databáze. Grada, Praha, Česká republika, říjen 2015. ISBN 978-80-247-5466-6. (http://www.ksi.mff.cuni.cz/bigdata/)

Sherif Sakr: XML compression techniques: A survey and comparison. J. Comput. Syst. Sci. 75(5), pages 303-322 (2009) (http://www.msit2005.mut.ac.th/msit_media/1_2552/ITEC0950/Materials/2009074193411OX.pdf)

New Compression Options in MongoDB 3.0. (https://www.mongodb.com/blog/post/new-compression-options-mongodb-30)
Preliminary scope of work
Cílem práce je prozkoumat aktuální využití komprese dat v oblasti NoSQL dokumentových databází, které ukládají data ve formátu JSON a/nebo XML. Jelikož je aktuální využití komprese v této oblasti poměrně malé a současně existuje množství metod pro kompresi dat, v další fázi autor navrhne vlastní vhodné využití (a pravděpodobně přizpůsobení) vybrané osvědčené kompresní metody. Návrh bude implementován a experimentálně zhodnocen.
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html