Kontextové učení velkých jazykových modelů v úlohách zpracování jazyka
Název práce v češtině: | Kontextové učení velkých jazykových modelů v úlohách zpracování jazyka |
---|---|
Název v anglickém jazyce: | In-Context Learning in Large Language Models for NLP Tasks |
Akademický rok vypsání: | 2025/2026 |
Typ práce: | disertační práce |
Jazyk práce: | |
Ústav: | Ústav formální a aplikované lingvistiky (32-UFAL) |
Vedoucí / školitel: | doc. RNDr. Ondřej Bojar, Ph.D. |
Řešitel: |
Zásady pro vypracování |
In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including machine translation. One key factor behind this success is the massive size of their training data. An interesting capability of these models is to carry out tasks based on instructions provided in plain language as part of their input, known as in-context or few-shot learning.
Despite the empirical success of in-context learning (ICL), its underlying mechanisms remain an active area of research. ICL allows LLMs to adapt to new tasks at inference time solely by conditioning on examples presented in the prompt. This phenomenon has sparked questions about whether models are truly learning or merely locating more relevant patterns known from the training data and mimicking them in a shallow way. Recent studies have revealed architectural components such as induction heads which are specialized attention patterns within Transformer layers that appear to be responsible for this behavior. The full extent of how and why ICL works in LLMs is still an open question, with implications on model trustworthiness, explainability and, at the technical level, in the design of model post-training strategies. In the field of Machine Translation (MT), LLMs have begun to replace traditional neural machine translation (NMT) systems. Furthermore, they are able to perform other tasks present in translation workflows such as named-entity-recognition (NER), automatic-post-editing (APE), terminology extraction or terminology-aware-MT, etc. Usually, adapting an LLM for MT relies on iterative processes such as supervised fine-tuning (SFT) or continual pre-training (CP). Recent work has begun to study the relationship between SFT and ICL showing that by using few-shot examples, good responses to users’ requests can be also obtained without relying on costly SFT techniques. Several challenges however remain, e.g. achieving comparable performance in less-resourced languages or the ability to understand language theoretically, from the linguistic point of view, as documented by the low scores in test sets derived from the International Linguistics Olympiad. The goal of the thesis is to explore ICL for language processing tasks, with a primary focus on translation and linguistic reasoning. This involves studying how LLMs generalize from a limited number of examples and whether they can infer and apply structural language rules. The primary focus will be devoted to less-resourced languages. The set of experiments in the thesis will be naturally limited by the availability of data and computing resources. For this reason, the thesis will also experiment with a small number of languages, either in individual or in multilingual models. Highly multilingual models can serve as a good starting point but the further training can damage model abilities in other than target languages. |
Seznam odborné literatury |
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gómez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (pp. 5998–6008).
Xu, H., Sharaf, A., Chen, Y., Tan, W., Shen, L., Van Durme, B., Murray, K., & Kim, Y. J. (2024). Contrastive preference optimization: Pushing the boundaries of LLM performance in machine translation. In Proceedings of the 41st International Conference on Machine Learning (Vol. 235, Article 2275, pp. 55204–55224). JMLR.org. Zhu, H., Liang, Y., Xu, W., & Xu, H. (2025). Evaluating large language models for in-context learning of linguistic patterns in unseen low resource languages. In Proceedings of the First Workshop on Language Models for Low-Resource Languages (pp. 414–426). Association for Computational Linguistics. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901. Duarte M. Alves, José Pombal, Nuno M. Guerreiro, Pedro H. Martins, João Alves, Amin Farajian, Ben Peters, Ricardo Rei, Patrick Fernandes, Sweta Agrawal, Pierre Colombo, José G. C. de Souza, and André F. T. Martins. (2024). Tower: An open multilingual large language model for translation-related tasks. Preprint, arXiv:2402.17733. Bill Yuchen Lin, Abhilasha Ravichander, Ximing Lu, Nouha Dziri, Melanie Sclar, Khyathi Chandu, Chandra Bhagavatula, and Yejin Choi. (2023). The unlocking spell on base llms: Rethinking alignment via in-context learning. In The Twelfth International Conference on Learning Representations. Sánchez, E., Alastruey, B., Ropers, C., Stenetorp, P., Artetxe, M., and Costa-jussà, M. R. (2024). Linguini: A benchmark for language-agnostic linguistic reasoning. arXiv preprint arXiv:2409.12126. |