Optimization of Processing of Data Files in System DIRAC
Thesis title in Czech: | Optimization of Processing of Data Files in System DIRAC |
---|---|
Thesis title in English: | Optimization of Processing of Data Files in System DIRAC |
Key words: | Systém DIRAC, NoSQL databáze, efektivní zpracování datových souborů, dotazování nad metadaty |
English key words: | System DIRAC, NoSQL databases, efficient processing of data files, metadata querying. |
Academic year of topic announcement: | 2014/2015 |
Thesis type: | Bachelor's thesis |
Thesis language: | angličtina |
Department: | Department of Software Engineering (32-KSI) |
Supervisor: | doc. RNDr. Irena Holubová, Ph.D. |
Author: | hidden - assigned and confirmed by the Study Dept. |
Date of registration: | 23.05.2015 |
Date of assignment: | 25.05.2015 |
Confirmed by Study dept. on: | 02.06.2015 |
Date and time of defence: | 02.02.2016 00:00 |
Date of electronic submission: | 03.12.2015 |
Date of submission of printed version: | 04.12.2015 |
Date of proceeded defence: | 02.02.2016 |
Opponents: | RNDr. Martin Svoboda, Ph.D. |
Advisors: | RNDr. Jiří Chudoba, Ph.D. |
Dagmar Adamová | |
Guidelines |
The aim of the thesis is to study and extend processing of data files stored in the distributed system DIRAC and in particular their related metadata. The author will analyze the current functionality and extend it with more sophisticated work with data files and their metadata, such as creating, deleting, and updating of data sets, and basic querying over metadata. The functionality will be discussed with the consultants of the thesis.
In the second phase of the work, the author will identify parts of the current metadata storage which are suitable for storing in a NoSQL database. On the basis of an analysis of existing NoSQL systems the author will select appropriate system(s), propose suitable storage strategy/strategies, and experimentally evaluate their features using real-world data. The results will also be compared with the features of the current storage strategy based on a relational database. |
References |
Tsaregorodtsev, A.: DIRAC Distributed Computing Services, 2014 J. Phys.: Conf. Ser. 513 (2014)
Tsaregorodtsev, A. - Poss, S.: DIRAC File Replica and Metadata Catalog. Journal of Physics: Conference Series. 2012, 396(3) Sadalage, P.J. – Fowler, M.: NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence Redmond, E. – Wilson, J.R.: Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement |