Evaluace genderové zaujatosti velkých jazykových modelů v přirozených kontextech
Název práce v češtině: | Evaluace genderové zaujatosti velkých jazykových modelů v přirozených kontextech |
---|---|
Název v anglickém jazyce: | Evaluation of gender bias of Large Language Models in natural contexts |
Klíčová slova: | genderová zaujatost|velké jazykové modely|evaluace |
Klíčová slova anglicky: | gender bias|large language models|evaluation |
Akademický rok vypsání: | 2023/2024 |
Typ práce: | diplomová práce |
Jazyk práce: | |
Ústav: | Ústav formální a aplikované lingvistiky (32-UFAL) |
Vedoucí / školitel: | RNDr. David Mareček, Ph.D. |
Řešitel: | skrytý - zadáno a potvrzeno stud. odd. |
Datum přihlášení: | 08.01.2024 |
Datum zadání: | 08.01.2024 |
Datum potvrzení stud. oddělením: | 08.01.2024 |
Zásady pro vypracování |
Large Language Models are trained on vast amounts of data collected from the internet, and this data often reflects the biases present in society. As a result, language models can inadvertently perpetuate and even amplify biases. For example, the models often learn and reproduce stereotypes about gender roles i.e. they may associate certain professions or qualities with a specific gender.
There exist many evaluation datasets measuring the amount of gender biases in language models. Almost all of them are created artificially, either by filling words into the templates or asking annotators to write sentences that may contain stereotypical gender biases. Also, they usually evaluate bias of only specific groups of words such as professions (e.g. doctor vs. nurse). The goal of this thesis is to build a new evaluation dataset for detecting gender bias, which would be based on real texts and which would evaluate biases across the whole dictionary (we suppose, that the words like ‘yoga’, ‘children’, ‘clamp’, ‘tire’ are also sources of a stereotypical bias). The outputs of the thesis could be: - analysis of gender bias on different types of words - causal tracing of such gender bias in Transformers - use the new dataset in the existing methods for mitigation of gender bias in large language models |
Seznam odborné literatury |
1. Vig et al: Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender Bias, ArXiv, 2020. (https://arxiv.org/pdf/2004.12265.pdf)
2. Stanczak and Augenstein: A Survey on Gender Bias in Natural Language Processing. ArXiV, 2021 (https://arxiv.org/pdf/2112.14168.pdf) 3. Meng et al: Locating and Editing Factual Associations in GPT. 36th Conference on Neural Information Processing Systems, 2022 (https://proceedings.neurips.cc/paper_files/paper/2022/file/6f1d43d5a82a37e89b0665b33bf3a182-Paper-Conference.pdf) |