Témata prací (Výběr práce)

Váš prohlížeč nepodporuje JavaScript nebo je jeho podpora vypnutá. Některé funkce nemusejí být dostupné.

Multilingual multidomain generation of school tests that are hard to solve automatically

Název práce v češtině:	Vícejazyčné víceoborové generování školních testů, které se obtížně řeší automaticky
Název v anglickém jazyce:	Multilingual multidomain generation of school tests that are hard to solve automatically
Klíčová slova:	generování přirozeného jazyka\|velké jazykové modely
Klíčová slova anglicky:	natural language generation\|large language models
Akademický rok vypsání:	2023/2024
Typ práce:	diplomová práce
Jazyk práce:	angličtina
Ústav:	Ústav formální a aplikované lingvistiky (32-UFAL)
Vedoucí / školitel:	Mgr. Rudolf Rosa, Ph.D.
Řešitel:	skrytý - zadáno a potvrzeno stud. odd.
Datum přihlášení:	22.04.2024
Datum zadání:	22.04.2024
Datum potvrzení stud. oddělením:	22.04.2024
Datum a čas obhajoby:	10.09.2024 09:00
Datum odevzdání elektronické podoby:	18.07.2024
Datum odevzdání tištěné podoby:	22.07.2024
Datum proběhlé obhajoby:	10.09.2024
Oponenti:	RNDr. David Mareček, Ph.D.

Zásady pro vypracování

With the advance of Large Language Models (LLMs), it has been found that many tests commonly used at schools and universities can be solved automatically to a certain degree by providing the test question as the input prompt for a LLM and using the generated output as the answer.

The goal of the thesis is to explore ways of automatically generating test questions that are hard to solve automatically in this way.

The suggested approach is to iteratively generate test questions with a LLM and verify whether the LLM can generate a satisfactory answer.

The work on the thesis is expected to include the following components:
- reviewing existing literature on employing LLMs for test generation and test answer generation
- gathering and processing data representing various school test questions and answers
- designing a setup utilizing LLMs to generate test questions
- designing a setup utilizing LLMs to generate answers to the test questions
- devising and implementing methods to evaluate the quality of the generated questions
- devising and implementing methods to evaluate the correctness of the generated answers
- comparison of the performance on open-book and closed-book questions and answers
- comparison of the performance across multiple domains (e.g. mathematics, geography, history)
- comparison of the performance across multiple levels (e.g. primary school, secondary school, university)
- comparison of the performance across multiple languages
- comparison of the performance across multiple available LLMs

Seznam odborné literatury

- Denny, P., Khosravi, H., Hellas, A., Leinonen, J., & Sarsa, S. (2023). Can We Trust AI-Generated Educational Content? Comparative Analysis of Human and AI-Generated Learning Resources (arXiv:2306.10509). arXiv. https://doi.org/10.48550/arXiv.2306.10509
- Doughty, J., Wan, Z., Bompelli, A., Qayum, J., Wang, T., Zhang, J., Zheng, Y., Doyle, A., Sridhar, P., Agarwal, A., Bogart, C., Keylor, E., Kultur, C., Savelka, J., & Sakr, M. (2024). A Comparative Study of AI-Generated (GPT-4) and Human-crafted MCQs in Programming Education. Proceedings of the 26th Australasian Computing Education Conference, 114–123. https://doi.org/10.1145/3636243.3636256
- Elkins, S., Kochmar, E., Cheung, J. C. K., & Serban, I. (2023). How Useful are Educational Questions Generated by Large Language Models? (arXiv:2304.06638). arXiv. https://doi.org/10.48550/arXiv.2304.06638
- Perkoff, E. M., Bhattacharyya, A., Cai, J., & Cao, J. (2023). Comparing Neural Question Generation Architectures for Reading Comprehension. In E. Kochmar, J. Burstein, A. Horbach, R. Laarmann-Quante, N. Madnani, A. Tack, V. Yaneva, Z. Yuan, & T. Zesch (Eds.), Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023) (pp. 556–566). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.bea-1.47
- Savelka, J., Agarwal, A., Bogart, C., & Sakr, M. (2023). Large Language Models (GPT) Struggle to Answer Multiple-Choice Questions about Code (arXiv:2303.08033). arXiv. https://doi.org/10.48550/arXiv.2303.08033