Thesis (Selection of subject)

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Evaluace genderové zaujatosti velkých jazykových modelů v přirozených kontextech

Thesis title in Czech:	Evaluace genderové zaujatosti velkých jazykových modelů v přirozených kontextech
Thesis title in English:	Evaluation of gender bias of Large Language Models in natural contexts
Key words:	genderová zaujatost\|velké jazykové modely\|evaluace
English key words:	gender bias\|large language models\|evaluation
Academic year of topic announcement:	2023/2024
Thesis type:	diploma thesis
Thesis language:
Department:	Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor:	RNDr. David Mareček, Ph.D.
Author:	hidden - assigned and confirmed by the Study Dept.
Date of registration:	08.01.2024
Date of assignment:	08.01.2024
Confirmed by Study dept. on:	08.01.2024

Guidelines

Large Language Models are trained on vast amounts of data collected from the internet, and this data often reflects the biases present in society. As a result, language models can inadvertently perpetuate and even amplify biases. For example, the models often learn and reproduce stereotypes about gender roles i.e. they may associate certain professions or qualities with a specific gender.

There exist many evaluation datasets measuring the amount of gender biases in language models. Almost all of them are created artificially, either by filling words into the templates or asking annotators to write sentences that may contain stereotypical gender biases. Also, they usually evaluate bias of only specific groups of words such as professions (e.g. doctor vs. nurse).

The goal of this thesis is to build a new evaluation dataset for detecting gender bias, which would be based on real texts and which would evaluate biases across the whole dictionary (we suppose, that the words like ‘yoga’, ‘children’, ‘clamp’, ‘tire’ are also sources of a stereotypical bias).

The outputs of the thesis could be:
- analysis of gender bias on different types of words
- causal tracing of such gender bias in Transformers
- use the new dataset in the existing methods for mitigation of gender bias in large language models

References

1. Vig et al: Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender Bias, ArXiv, 2020. (https://arxiv.org/pdf/2004.12265.pdf)

2. Stanczak and Augenstein: A Survey on Gender Bias in Natural Language Processing. ArXiV, 2021 (https://arxiv.org/pdf/2112.14168.pdf)

3. Meng et al: Locating and Editing Factual Associations in GPT. 36th Conference on Neural Information Processing Systems, 2022 (https://proceedings.neurips.cc/paper_files/paper/2022/file/6f1d43d5a82a37e89b0665b33bf3a182-Paper-Conference.pdf)