Thesis (Selection of subject)Thesis (Selection of subject)(version: 368)
Thesis details
   Login via CAS
Optimization of Processing of Data Files in System DIRAC
Thesis title in Czech: Optimization of Processing of Data Files in System DIRAC
Thesis title in English: Optimization of Processing of Data Files in System DIRAC
Key words: Systém DIRAC, NoSQL databáze, efektivní zpracování datových souborů, dotazování nad metadaty
English key words: System DIRAC, NoSQL databases, efficient processing of data files, metadata querying.
Academic year of topic announcement: 2014/2015
Thesis type: Bachelor's thesis
Thesis language: angličtina
Department: Department of Software Engineering (32-KSI)
Supervisor: doc. RNDr. Irena Holubová, Ph.D.
Author: hidden - assigned and confirmed by the Study Dept.
Date of registration: 23.05.2015
Date of assignment: 25.05.2015
Confirmed by Study dept. on: 02.06.2015
Date and time of defence: 02.02.2016 00:00
Date of electronic submission:03.12.2015
Date of submission of printed version:04.12.2015
Date of proceeded defence: 02.02.2016
Opponents: RNDr. Martin Svoboda, Ph.D.
 
 
 
Advisors: RNDr. Jiří Chudoba, Ph.D.
Dagmar Adamová
Guidelines
The aim of the thesis is to study and extend processing of data files stored in the distributed system DIRAC and in particular their related metadata. The author will analyze the current functionality and extend it with more sophisticated work with data files and their metadata, such as creating, deleting, and updating of data sets, and basic querying over metadata. The functionality will be discussed with the consultants of the thesis.
In the second phase of the work, the author will identify parts of the current metadata storage which are suitable for storing in a NoSQL database. On the basis of an analysis of existing NoSQL systems the author will select appropriate system(s), propose suitable storage strategy/strategies, and experimentally evaluate their features using real-world data. The results will also be compared with the features of the current storage strategy based on a relational database.
References
Tsaregorodtsev, A.: DIRAC Distributed Computing Services, 2014 J. Phys.: Conf. Ser. 513 (2014)

Tsaregorodtsev, A. - Poss, S.: DIRAC File Replica and Metadata Catalog. Journal of Physics: Conference Series. 2012, 396(3)

Sadalage, P.J. – Fowler, M.: NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence

Redmond, E. – Wilson, J.R.: Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html