Thesis (Selection of subject)Thesis (Selection of subject)(version: 368)
Thesis details
   Login via CAS
Abstract interpretation of pandas
Thesis title in Czech: Abstract interpretation of pandas
Thesis title in English: Abstract interpretation of pandas
Key words: pandas|python|program analysis|abstract interpretation
English key words: pandas|python|program analysis|abstract interpretation
Academic year of topic announcement: 2023/2024
Thesis type: Bachelor's thesis
Thesis language:
Department: Department of Distributed and Dependable Systems (32-KDSS)
Supervisor: Mgr. Tomáš Petříček, Ph.D.
Author: hidden - assigned and confirmed by the Study Dept.
Date of registration: 23.03.2024
Date of assignment: 25.03.2024
Confirmed by Study dept. on: 28.03.2024
Guidelines
Pandas is a popular and widely used library for data exploration in Python. The code written in pandas is highly dynamic - column names of a data frame are represented as strings and multiple operations transform the structure of the data frame. This can be a source of errors. On the one hand, data frame libraries in statically typed langauges offer greater guarantees [1, 2], but are not widely used. On the other hand, attempts to add checking to Python have so far focused on the core language [4] and not advanced libraries such as pandas.

The aim of this thesis is to design and implement a code analysis tool for the pandas library that is capable of checking common kinds of errors such as accesses to misspelt or non-existent columns. The tool will leverage the abstract interpretation framework [3]. It will model key pandas data structures such as data frames and series in a way that makes it possible to check errors related to column naming and column types. The tool can draw inspiration from PDChecker [5] and should be evaluated through a number of small but realistic case studies.
References
[1] Kuang-Chen Lu, Ben Greenman, and Shriram Krishnamurthi (2022). Types for Tables: A Language Design Benchmark. In The Art, Science, and Engineering of Programming, 2022, Vol. 6, Issue 2, Article 8
[2] Petricek, Tomas (2017). Data exploration through dot-driven development. In 31st European Conference on Object-Oriented Programming (ECOOP 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.
[3] Blanchet, Bruno (2002). Introduction to Abstract Interpretation. Available online at: https://bblanche.gitlabpages.inria.fr/absint.pdf
[4] Guido van Rossum, Jukka Lehtosalo, Łukasz Langa (2014). PEP 484 – Type Hints. Available at: https://peps.python.org/pep-0484/
[5] Zhuang, Y., & Lu, M. Y. (2022). Enabling Type Checking on Columns in Data Frame Libraries by Abstract Interpretation. IEEE Access, 10, 14418-14428.
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html