Streamlining Usability of Enterprise Data Quality Management Tools for Data Engineers
Název práce v češtině: | Zjednodušení použitelnosti nástrojů pro správu kvality dat pro datové inženýry |
---|---|
Název v anglickém jazyce: | Streamlining Usability of Enterprise Data Quality Management Tools for Data Engineers |
Klíčová slova: | data quality management|data engineering|performance evaluation |
Klíčová slova anglicky: | data quality management|data engineering|performance evaluation |
Akademický rok vypsání: | 2023/2024 |
Typ práce: | bakalářská práce |
Jazyk práce: | angličtina |
Ústav: | Katedra distribuovaných a spolehlivých systémů (32-KDSS) |
Vedoucí / školitel: | doc. Ing. Lubomír Bulej, Ph.D. |
Řešitel: | Zdeněk Tomis - zadáno vedoucím/školitelem |
Datum přihlášení: | 02.02.2024 |
Datum zadání: | 10.02.2024 |
Zásady pro vypracování |
Data quality management relies on profiling, validation, cleansing, and monitoring to ensure that data is accurate, consistent, complete, and relevant for its intended purpose. Many data quality management activities are often implemented as ETL (Extract-Transform-Load) processes, which rely on various programming languages to define complex data transformations. These include SQL for database operations, Python for advanced data processing, and other (proprietary) languages for capturing data quality rules.
One drawback of ETL-based solutions is that they are often optimized for server deployment in a cloud or in corporate data centers, making deployment in a local environment challenging due to runtime requirements and setup complexity. This in turn poses a barrier to usage of the language and related tools in a local context by users such as data engineers. This problem also manifests in the case of the Ataccama Data Quality Expression Language and related tooling, and sets the context for this thesis. The goal of the thesis is to analyze the requirements and limitations related to local deployment of data quality management tools and the specific needs of data engineers, and design and develop a prototype solution that enables the use of a common data quality expression language and tools regardless of the deployment environment. The performance and scalability of the proposed solution must be experimentally evaluated to assess its viability in real-world scenarios. |
Seznam odborné literatury |
[1] Aho, A. V., Lam, M., Sethi, R., Ullman, J. Compilers: Principles, Techniques, and Tools. 2nd Edition, Pearson, 2006.
[2] Scott, M. Programming Language Pragmatics. 4th Edition, Morgan Kaufmann, 2015. [3] Olson, J. E. Data Quality: The Accuracy Dimension. Morgan Kaufmann, 2003. |