Performance and Usability Improvements for Data Lineage Analysis of C# Programs
Thesis title in Czech: | Zlepšení použitelnosti a výkonu analýzy datových toků programů v jazyce C# |
---|---|
Thesis title in English: | Performance and Usability Improvements for Data Lineage Analysis of C# Programs |
Key words: | statická analýza|data lineage|C#|embedded code |
English key words: | static analysis|data lineage|C#|embedded code |
Academic year of topic announcement: | 2022/2023 |
Thesis type: | diploma thesis |
Thesis language: | angličtina |
Department: | Department of Distributed and Dependable Systems (32-KDSS) |
Supervisor: | doc. RNDr. Pavel Parízek, Ph.D. |
Author: | Mgr. Jan Kleprlík - assigned and confirmed by the Study Dept. |
Date of registration: | 13.12.2022 |
Date of assignment: | 13.12.2022 |
Confirmed by Study dept. on: | 20.12.2022 |
Date and time of defence: | 14.02.2024 09:00 |
Date of electronic submission: | 11.01.2024 |
Date of submission of printed version: | 11.01.2024 |
Date of proceeded defence: | 14.02.2024 |
Opponents: | RNDr. David Bednárek, Ph.D. |
Guidelines |
MANTA Flow is a data lineage platform that includes also the C# Scanner that is able to perform analysis of data flows in C# programs. The first version of the MANTA C# scanner has been developed within the scope of a student team software project at MFF CUNI, and later it has been extended by one master thesis. While the current version of C# scanner can handle all the core features of the C# language and commonly used libraries (e.g., collections, strings, and Entity Framework Core), it has limited support for important real world scenarios in which C# programs are used in the context of data management and processing systems. In addition, C# scanner lacks in speed needed especially to process embedded fragments of C# source code, and it also cannot handle large programs well.
The main goals of this master thesis project are the following. (1) Improve performance and scalability of the C# scanner by designing and implementing various optimizations at different levels of abstraction. (2) Add new features that are necessary to extend usability of the C# scanner in practice. The list of new features to be added includes also the support for ASP.NET endpoints, partial support for popular ETL tools such as SQL Server Integration Services (SSIS) and Azure Data Factory (ADF), and modification of the C# scanner to support execution of embedded C# fragments. An important related task is to improve also precision of the data lineage analysis for C#. Implementation of performance optimizations will involve many changes to core modules of the C# scanner, and should follow established design patterns used in the development of highly-efficient software. |
References |
1. MANTA C# Scanner. Team software project, Charles University, Prague, 2020.
2. Dalibor Zeman. Extending Data Lineage Analysis Towards .NET Frameworks. Master thesis, Charles University, Prague, 2021. 3. ASP.NET. https://dotnet.microsoft.com/en-us/apps/aspnet 4. SQL Server Integration Services. https://learn.microsoft.com/en-us/sql/integration-services/sql-server-integration-services?view=sql-server-ver16 5. Azure Data Factory. https://learn.microsoft.com/en-us/azure/data-factory/ |