Thesis (Selection of subject)Thesis (Selection of subject)(version: 368)
Thesis details
   Login via CAS
Performance and Usability Improvements for Data Lineage Analysis of C# Programs
Thesis title in Czech: Zlepšení použitelnosti a výkonu analýzy datových toků programů v jazyce C#
Thesis title in English: Performance and Usability Improvements for Data Lineage Analysis of C# Programs
Key words: statická analýza|data lineage|C#|embedded code
English key words: static analysis|data lineage|C#|embedded code
Academic year of topic announcement: 2022/2023
Thesis type: diploma thesis
Thesis language: angličtina
Department: Department of Distributed and Dependable Systems (32-KDSS)
Supervisor: doc. RNDr. Pavel Parízek, Ph.D.
Author: Mgr. Jan Kleprlík - assigned and confirmed by the Study Dept.
Date of registration: 13.12.2022
Date of assignment: 13.12.2022
Confirmed by Study dept. on: 20.12.2022
Date and time of defence: 14.02.2024 09:00
Date of electronic submission:11.01.2024
Date of submission of printed version:11.01.2024
Date of proceeded defence: 14.02.2024
Opponents: RNDr. David Bednárek, Ph.D.
 
 
 
Guidelines
MANTA Flow is a data lineage platform that includes also the C# Scanner that is able to perform analysis of data flows in C# programs. The first version of the MANTA C# scanner has been developed within the scope of a student team software project at MFF CUNI, and later it has been extended by one master thesis. While the current version of C# scanner can handle all the core features of the C# language and commonly used libraries (e.g., collections, strings, and Entity Framework Core), it has limited support for important real world scenarios in which C# programs are used in the context of data management and processing systems. In addition, C# scanner lacks in speed needed especially to process embedded fragments of C# source code, and it also cannot handle large programs well.

The main goals of this master thesis project are the following.
(1) Improve performance and scalability of the C# scanner by designing and implementing various optimizations at different levels of abstraction.
(2) Add new features that are necessary to extend usability of the C# scanner in practice.
The list of new features to be added includes also the support for ASP.NET endpoints, partial support for popular ETL tools such as SQL Server Integration Services (SSIS) and Azure Data Factory (ADF), and modification of the C# scanner to support execution of embedded C# fragments. An important related task is to improve also precision of the data lineage analysis for C#. Implementation of performance optimizations will involve many changes to core modules of the C# scanner, and should follow established design patterns used in the development of highly-efficient software.
References
1. MANTA C# Scanner. Team software project, Charles University, Prague, 2020.
2. Dalibor Zeman. Extending Data Lineage Analysis Towards .NET Frameworks. Master thesis, Charles University, Prague, 2021.
3. ASP.NET. https://dotnet.microsoft.com/en-us/apps/aspnet
4. SQL Server Integration Services. https://learn.microsoft.com/en-us/sql/integration-services/sql-server-integration-services?view=sql-server-ver16
5. Azure Data Factory. https://learn.microsoft.com/en-us/azure/data-factory/
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html