Thesis (Selection of subject)Thesis (Selection of subject)(version: 368)
Thesis details
   Login via CAS
Extending Data Lineage Analysis for Python with Runtime Types
Thesis title in Czech: Rozšíření analýzy datových toků pro jazyk Python o podporu běhových typů
Thesis title in English: Extending Data Lineage Analysis for Python with Runtime Types
Key words: Python|datové toky|typová inference|Manta
English key words: Python|data flow|data lineage|type inference|Manta
Academic year of topic announcement: 2022/2023
Thesis type: diploma thesis
Thesis language: angličtina
Department: Department of Distributed and Dependable Systems (32-KDSS)
Supervisor: doc. RNDr. Pavel Parízek, Ph.D.
Author: Mgr. Václav Luňák - assigned and confirmed by the Study Dept.
Date of registration: 08.06.2023
Date of assignment: 14.06.2023
Confirmed by Study dept. on: 28.06.2023
Date and time of defence: 14.02.2024 09:00
Date of electronic submission:10.01.2024
Date of submission of printed version:10.01.2024
Date of proceeded defence: 14.02.2024
Opponents: Mgr. Tomáš Petříček, Ph.D.
 
 
 
Guidelines
An important component of the data lineage analysis platform Manta Flow is the scanner for Python scripts, mainly due to high popularity and wide usage of Python in the fields of data management and data analytics. However, the current version of the Python scanner computes very approximate analysis results due to (i) the dynamic nature of Python and (ii) missing support for inferring precise information about runtime types of program variables. One particular source of this approximation is the limited ability to precisely determine the set of possible targets of a given function invocation.

The main goal of this project is to extend the Python scanner with support for computing information about runtime types of program variables (expressions) and using it within the data lineage analysis. We expect that successful completion of this project will involve the following specific tasks:
- Design and implementation of the module for processing the class hierarchy of Python applications.
- Development of an efficient algorithm for inference and tracking of runtime types of expressions used in analyzed Python code, which should enable more precise identification of possible target functions at each call.
- Adding support for analysis of callbacks and function pointers.
All the information provided by these extensions will be used to improve both precision and performance of the scanner. The actual impact of these changes on the overall precision and performance will be very thoroughly tested and empirically evaluated.
References
1. Python language, https://www.python.org/
2. Python Type Checking (Guide), https://realpython.com/python-type-checking/
3. PEP 484 - Type Hints, https://peps.python.org/pep-0484/
4. Yun Peng, Cuiyun Gao, Zongjie Li, Bowei Gao, David Lo, Qirun Zhang, and Michael R. Lyu. Static Inference Meets Deep learning: A Hybrid Type Inference Approach for Python. ICSE 2022
5. Types in Python, https://pyre-check.org/docs/types-in-python/
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html