[en] Large Language Models (LLMs) may portray discrimination towards certain
individuals, especially those characterized by multiple attributes (aka
intersectional bias). Discovering intersectional bias in LLMs is challenging,
as it involves complex inputs on multiple attributes (e.g. race and gender). To
address this challenge, we propose HInter, a test technique that
synergistically combines mutation analysis, dependency parsing and metamorphic
oracles to automatically detect intersectional bias in LLMs. HInter generates
test inputs by systematically mutating sentences using multiple mutations,
validates inputs via a dependency invariant and detects biases by checking the
LLM response on the original and mutated sentences. We evaluate HInter using
six LLM architectures and 18 LLM models (GPT3.5, Llama2, BERT, etc) and find
that 14.61% of the inputs generated by HInter expose intersectional bias.
Results also show that our dependency invariant reduces false positives
(incorrect test inputs) by an order of magnitude. Finally, we observed that
16.62% of intersectional bias errors are hidden, meaning that their
corresponding atomic cases do not trigger biases. Overall, this work emphasize
the importance of testing LLMs for intersectional bias.
Disciplines :
Computer science
Author, co-author :
SOUANI, Badr ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)