Témata prací (Výběr práce)Témata prací (Výběr práce)(verze: 368)
Detail práce
   Přihlásit přes CAS
Streamlining Usability of Enterprise Data Quality Management Tools for Data Engineers
Název práce v češtině: Zjednodušení použitelnosti nástrojů pro správu kvality dat pro datové inženýry
Název v anglickém jazyce: Streamlining Usability of Enterprise Data Quality Management Tools for Data Engineers
Klíčová slova: data quality management|data engineering|performance evaluation
Klíčová slova anglicky: data quality management|data engineering|performance evaluation
Akademický rok vypsání: 2023/2024
Typ práce: bakalářská práce
Jazyk práce: angličtina
Ústav: Katedra distribuovaných a spolehlivých systémů (32-KDSS)
Vedoucí / školitel: doc. Ing. Lubomír Bulej, Ph.D.
Řešitel: Zdeněk Tomis - zadáno vedoucím/školitelem
Datum přihlášení: 02.02.2024
Datum zadání: 10.02.2024
Zásady pro vypracování
Data quality management relies on profiling, validation, cleansing, and monitoring to ensure that data is accurate, consistent, complete, and relevant for its intended purpose. Many data quality management activities are often implemented as ETL (Extract-Transform-Load) processes, which rely on various programming languages to define complex data transformations. These include SQL for database operations, Python for advanced data processing, and other (proprietary) languages for capturing data quality rules.

One drawback of ETL-based solutions is that they are often optimized for server deployment in a cloud or in corporate data centers, making deployment in a local environment challenging due to runtime requirements and setup complexity. This in turn poses a barrier to usage of the language and related tools in a local context by users such as data engineers.

This problem also manifests in the case of the Ataccama Data Quality Expression Language and related tooling, and sets the context for this thesis. The goal of the thesis is to analyze the requirements and limitations related to local deployment of data quality management tools and the specific needs of data engineers, and design and develop a prototype solution that enables the use of a common data quality expression language and tools regardless of the deployment environment. The performance and scalability of the proposed solution must be experimentally evaluated to assess its viability in real-world scenarios.
Seznam odborné literatury
[1] Aho, A. V., Lam, M., Sethi, R., Ullman, J. Compilers: Principles, Techniques, and Tools. 2nd Edition, Pearson, 2006.
[2] Scott, M. Programming Language Pragmatics. 4th Edition, Morgan Kaufmann, 2015.
[3] Olson, J. E. Data Quality: The Accuracy Dimension. Morgan Kaufmann, 2003.
 
Univerzita Karlova | Informační systém UK