Témata prací (Výběr práce)Témata prací (Výběr práce)(verze: 368)
Detail práce
   Přihlásit přes CAS
Abstract interpretation of pandas
Název práce v češtině: Abstract interpretation of pandas
Název v anglickém jazyce: Abstract interpretation of pandas
Klíčová slova: pandas|python|program analysis|abstract interpretation
Klíčová slova anglicky: pandas|python|program analysis|abstract interpretation
Akademický rok vypsání: 2023/2024
Typ práce: bakalářská práce
Jazyk práce:
Ústav: Katedra distribuovaných a spolehlivých systémů (32-KDSS)
Vedoucí / školitel: Mgr. Tomáš Petříček, Ph.D.
Řešitel: skrytý - zadáno a potvrzeno stud. odd.
Datum přihlášení: 23.03.2024
Datum zadání: 25.03.2024
Datum potvrzení stud. oddělením: 28.03.2024
Zásady pro vypracování
Pandas is a popular and widely used library for data exploration in Python. The code written in pandas is highly dynamic - column names of a data frame are represented as strings and multiple operations transform the structure of the data frame. This can be a source of errors. On the one hand, data frame libraries in statically typed langauges offer greater guarantees [1, 2], but are not widely used. On the other hand, attempts to add checking to Python have so far focused on the core language [4] and not advanced libraries such as pandas.

The aim of this thesis is to design and implement a code analysis tool for the pandas library that is capable of checking common kinds of errors such as accesses to misspelt or non-existent columns. The tool will leverage the abstract interpretation framework [3]. It will model key pandas data structures such as data frames and series in a way that makes it possible to check errors related to column naming and column types. The tool can draw inspiration from PDChecker [5] and should be evaluated through a number of small but realistic case studies.
Seznam odborné literatury
[1] Kuang-Chen Lu, Ben Greenman, and Shriram Krishnamurthi (2022). Types for Tables: A Language Design Benchmark. In The Art, Science, and Engineering of Programming, 2022, Vol. 6, Issue 2, Article 8
[2] Petricek, Tomas (2017). Data exploration through dot-driven development. In 31st European Conference on Object-Oriented Programming (ECOOP 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.
[3] Blanchet, Bruno (2002). Introduction to Abstract Interpretation. Available online at: https://bblanche.gitlabpages.inria.fr/absint.pdf
[4] Guido van Rossum, Jukka Lehtosalo, Łukasz Langa (2014). PEP 484 – Type Hints. Available at: https://peps.python.org/pep-0484/
[5] Zhuang, Y., & Lu, M. Y. (2022). Enabling Type Checking on Columns in Data Frame Libraries by Abstract Interpretation. IEEE Access, 10, 14418-14428.
 
Univerzita Karlova | Informační systém UK