Article (Scientific journals)
HInter: Exposing Hidden Intersectional Bias in Large Language Models
SOUANI, Badr; SOREMEKUN, Ezekiel; PAPADAKIS, Mike et al.
n.d.In arxiv
 

Files


Full Text
2503.11962v1.pdf
Author postprint (7.9 MB)
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
Computer Science - Computation and Language; Computer Science - Artificial Intelligence; 68T50, 68T05
Abstract :
[en] Large Language Models (LLMs) may portray discrimination towards certain individuals, especially those characterized by multiple attributes (aka intersectional bias). Discovering intersectional bias in LLMs is challenging, as it involves complex inputs on multiple attributes (e.g. race and gender). To address this challenge, we propose HInter, a test technique that synergistically combines mutation analysis, dependency parsing and metamorphic oracles to automatically detect intersectional bias in LLMs. HInter generates test inputs by systematically mutating sentences using multiple mutations, validates inputs via a dependency invariant and detects biases by checking the LLM response on the original and mutated sentences. We evaluate HInter using six LLM architectures and 18 LLM models (GPT3.5, Llama2, BERT, etc) and find that 14.61% of the inputs generated by HInter expose intersectional bias. Results also show that our dependency invariant reduces false positives (incorrect test inputs) by an order of magnitude. Finally, we observed that 16.62% of intersectional bias errors are hidden, meaning that their corresponding atomic cases do not trigger biases. Overall, this work emphasize the importance of testing LLMs for intersectional bias.
Disciplines :
Computer science
Author, co-author :
SOUANI, Badr ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
SOREMEKUN, Ezekiel ;  University of Luxembourg
PAPADAKIS, Mike ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
Yokoyama, Setsuko
Chattopadhyay, Sudipta
Le Traon, Yves
External co-authors :
yes
Language :
English
Title :
HInter: Exposing Hidden Intersectional Bias in Large Language Models
Original title :
[en] HInter: Exposing Hidden Intersectional Bias in Large Language Models
Publication date :
n.d.
Journal title :
arxiv
Available on ORBilu :
since 19 April 2025

Statistics


Number of views
122 (4 by Unilu)
Number of downloads
101 (2 by Unilu)

Bibliography


Similar publications



Contact ORBilu