Témata prací (Výběr práce)Témata prací (Výběr práce)(verze: 368)
Detail práce
   Přihlásit přes CAS
Jednotná architektura pro skenery databází v Manta Flow
Název práce v češtině: Jednotná architektura pro skenery databází v Manta Flow
Název v anglickém jazyce: Unified architecture for database scanners in Manta Flow
Akademický rok vypsání: 2023/2024
Typ práce: diplomová práce
Jazyk práce:
Ústav: Katedra distribuovaných a spolehlivých systémů (32-KDSS)
Vedoucí / školitel: doc. RNDr. Pavel Parízek, Ph.D.
Řešitel: skrytý - zadáno a potvrzeno stud. odd.
Datum přihlášení: 23.10.2023
Datum zadání: 24.10.2023
Datum potvrzení stud. oddělením: 28.11.2023
Zásady pro vypracování
Manta Flow is a leading platform for data lineage analysis, under development for nearly a decade already. During that time, the developers of Manta Flow added modules, called scanners, that retrieve metadata from over ten different databases. While all the scanners for database systems are built using similar principles, little code is reused across their implementations, making the maintenance and conceptual changes error-prone and time consuming.

The main goal of this project is (1) to propose a new unified architecture for the extraction process of database scanners that would maximize code reuse in the metadata extractors while allowing for variability in the aspects where different database technologies require different approaches, and (2) to demonstrate the proposed architecture (i.e., validating its feasibility) by implementing a new scanner for the MySQL-related databases (i.e., MySQL, MariaDB, and SingleStore). The scanner will support, for at least one of the MySQL-related databases, the following: metadata and DDL extraction, parsing and resolving extracted DDLs, and data flow analysis.

Work on this project will include the following necessary steps.
- Analysis of the existing Manta scanners that extract metadata from databases, in order to identify (i) common functionalities but also (ii) aspects where a single common approach is not suitable.
- Analyzing the family of MySQL-related databases (MySQL, MariaDB, and SingleStore) to identify shared functionality and differences that are relevant to metadata extraction and data lineage.
- Design of the core shared functionality, and design of a mechanism for extending the core to support the extractors' inherent differences.

An important desired property of the new architecture is that changes needed to be made in existing scanners (i) must be compatible with other components of the Manta Flow platform, (ii) should not affect their overall functionality, and (iii) should not have negative effects on their performance.
Seznam odborné literatury
1. MySQL, https://www.mysql.com/
2. MariaDB, https://mariadb.org/
3. SingleStore, https://www.singlestore.com/
4. Joshua Bloch. Effective Java, 3rd edition. Addison-Wesley Professional, 2017.
5. Neal Ford, Mark Richards, Pramod Sadalage, and Zhamak Dehghani. Software Architecture: The Hard Parts. O'Reilly Media, 2021.
6. Ondřej Hlaváč. Analýza datových toků ve skriptech v SAP Hana dialektu SQL. Master thesis, Czech Technical University in Prague, Faculty of Information Technology, 2022. https://dspace.cvut.cz/handle/10467/101112
7. Kyrylo Bulat. Dataflow analysis of Google BigQuery scripts. Master thesis, Czech Technical University in Prague, Faculty of Information Technology, 2021. https://dspace.cvut.cz/handle/10467/92930
 
Univerzita Karlova | Informační systém UK