Thesis (Selection of subject)Thesis (Selection of subject)(version: 368)
Thesis details
   Login via CAS
Návrh LLM promptu pro iterativní data exploraci
Thesis title in Czech: Návrh LLM promptu pro iterativní data exploraci
Thesis title in English: Design of LLM prompts for iterative data exploration
Key words: velký jazykový model|LLM prompting|iterativní data explorace|asistované generování dotazů|porovnání promptovacích strategií
English key words: large language model|LLM prompting|iterative data exploration|assisted query generation|prompting strategies comparison
Academic year of topic announcement: 2023/2024
Thesis type: Bachelor's thesis
Thesis language:
Department: Department of Distributed and Dependable Systems (32-KDSS)
Supervisor: Mgr. Tomáš Petříček, Ph.D.
Author: Mikoláš Fromm - assigned and confirmed by the Study Dept.
Date of registration: 25.09.2023
Date of assignment: 26.09.2023
Confirmed by Study dept. on: 23.11.2023
Date of electronic submission:07.05.2024
Opponents: doc. Mgr. Martin Nečaský, Ph.D.
 
 
 
Guidelines
On the one hand, large language models (LLMs) [4] are increasingly used to create data exploration scripts [3]. However, generating an entire script in a single step makes it difficult for the users to understand and validate the generated scripts. On the other hand, "iterative prompting" [1, 5] makes it possible to build programmatic data exploration tool where the user is repeatedly offered a range of options and constructs a script by repeatedly choosing one of the offered options. However, doing so is not as convenient as specifying a query in natural language.

The aim of the thesis is to combine the two approaches. It will design an example integration of an LLM with an iterative prompting data exploration system. The integration will be subject to design evaluation and performance benchmarking that will compare several approaches how to build such a system. In the resulting system, the user will write a query in a natural language and the system developed for the thesis will use an LLM (with an appropriately constructed prompt, possibly inspired by emerging prompt patterns [2]) to iteratively advise the user which of the options offered by the "iterative prompting" system to choose. As with other conversational agents [6], this may increase user understanding of the problem [7]. The thesis work will consist of developing a system for data exploration (focusing on tabular data) based on iterative prompting and integrating it with an LLM. It will then explore and evaluate different ways of constructing LLM prompts for obtaining recommendations to control the system.
References
[1] Petricek, T. The Gamma: Programmatic Data Exploration for Non-programmers. In 2022 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pp. 1-7. IEEE, 2022.
[2] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J. and Schmidt, D.C., 2023. A prompt pattern catalog to enhance prompt engineering with ChatGPT. Available at: https://arxiv.org/pdf/2302.11382.pdf
[3] Maddigan, P. and Susnjak, T., 2023. Chat2vis: Generating data visualisations via natural language using chatgpt, codex and gpt-3 large language models. IEEE Access.
[4] OpenAI. ChatGPT API documentation. Available at: https://platform.openai.com/docs/introduction, Accessed 9/2023
[5] Petricek, T., 2017. Data exploration through dot-driven development. In 31st European Conference on Object-Oriented Programming (ECOOP 2017).
[6] Fast, E., Chen, B., Mendelsohn, J., Bassen, J. and Bernstein, M.S., 2018, April. Iris: A conversational agent for complex tasks. In Proceedings of the 2018 CHI conference on human factors in computing systems (pp. 1-12).
[7] Reicherts, L. and Rogers, Y., 2020, July. Do make me think! How CUIs can support cognitive processes. In Proceedings of the 2nd Conference on Conversational User Interfaces (pp. 1-4).
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html