Abstract interpretation of pandas
Název práce v češtině: | Abstract interpretation of pandas |
---|---|
Název v anglickém jazyce: | Abstract interpretation of pandas |
Klíčová slova: | pandas|python|program analysis|abstract interpretation |
Klíčová slova anglicky: | pandas|python|program analysis|abstract interpretation |
Akademický rok vypsání: | 2023/2024 |
Typ práce: | bakalářská práce |
Jazyk práce: | |
Ústav: | Katedra distribuovaných a spolehlivých systémů (32-KDSS) |
Vedoucí / školitel: | Mgr. Tomáš Petříček, Ph.D. |
Řešitel: | skrytý - zadáno a potvrzeno stud. odd. |
Datum přihlášení: | 23.03.2024 |
Datum zadání: | 25.03.2024 |
Datum potvrzení stud. oddělením: | 28.03.2024 |
Zásady pro vypracování |
Pandas is a popular and widely used library for data exploration in Python. The code written in pandas is highly dynamic - column names of a data frame are represented as strings and multiple operations transform the structure of the data frame. This can be a source of errors. On the one hand, data frame libraries in statically typed langauges offer greater guarantees [1, 2], but are not widely used. On the other hand, attempts to add checking to Python have so far focused on the core language [4] and not advanced libraries such as pandas.
The aim of this thesis is to design and implement a code analysis tool for the pandas library that is capable of checking common kinds of errors such as accesses to misspelt or non-existent columns. The tool will leverage the abstract interpretation framework [3]. It will model key pandas data structures such as data frames and series in a way that makes it possible to check errors related to column naming and column types. The tool can draw inspiration from PDChecker [5] and should be evaluated through a number of small but realistic case studies. |
Seznam odborné literatury |
[1] Kuang-Chen Lu, Ben Greenman, and Shriram Krishnamurthi (2022). Types for Tables: A Language Design Benchmark. In The Art, Science, and Engineering of Programming, 2022, Vol. 6, Issue 2, Article 8
[2] Petricek, Tomas (2017). Data exploration through dot-driven development. In 31st European Conference on Object-Oriented Programming (ECOOP 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. [3] Blanchet, Bruno (2002). Introduction to Abstract Interpretation. Available online at: https://bblanche.gitlabpages.inria.fr/absint.pdf [4] Guido van Rossum, Jukka Lehtosalo, Łukasz Langa (2014). PEP 484 – Type Hints. Available at: https://peps.python.org/pep-0484/ [5] Zhuang, Y., & Lu, M. Y. (2022). Enabling Type Checking on Columns in Data Frame Libraries by Abstract Interpretation. IEEE Access, 10, 14418-14428. |