Bias; Domain-Specific Language; Ethics; Large Language Models; Model-Driven Engineering; Red Teaming; Testing; Development teams; Domains specific languages; Enhanced software; Ethical concerns; Language model; Large language model; Model-driven Engineering; Red teaming; Software-systems; Modeling and Simulation
Résumé :
[en] Large language models (LLMs) are increasingly integrated into software systems to enhance them with generative AI capabilities. But LLMs may reflect a biased behavior, resulting in systems that could discriminate against gender, age or ethnicity, among other ethical concerns. Society and upcoming regulations will force companies and development teams to ensure their AI-enhanced software is ethically fair. To facilitate such ethical assessment, we propose LangBiTe, a model-driven solution to specify ethical requirements, and customize and automate the testing of ethical biases in LLMs. The evaluation can raise awareness on the biases of the LLM-based components of the system and/or trigger a change in the LLM of choice based on the requirements of that particular application. The model-driven approach makes both the requirements specification and the test generation platform-independent, and provides end-to-end traceability between the requirements and their assessment. We have implemented an open-source tool set, available on GitHub, to support the application of our approach.
Disciplines :
Sciences informatiques
Auteur, co-auteur :
Morales, Sergio ; Universitat Oberta de Catalunya, Barcelona, Spain
Clarisó, Robert ; Universitat Oberta de Catalunya, Barcelona, Spain
CABOT, Jordi ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > PI Cabot
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
A DSL for Testing LLMs for Fairness and Bias
Date de publication/diffusion :
22 septembre 2024
Nom de la manifestation :
Proceedings of the ACM/IEEE 27th International Conference on Model Driven Engineering Languages and Systems
Lieu de la manifestation :
Linz, Aut
Date de la manifestation :
22-09-2024 => 27-09-2024
Manifestation à portée :
International
Titre de l'ouvrage principal :
Proceedings - MODELS 2024: ACM/IEEE 27th International Conference on Model Driven Engineering Languages and Systems
FNR16544475 - Better Smart Software Faster (Besser) - An Intelligent Low-code Infrastructure For Smart Software, 2020 (01/01/2022-...) - Jordi Cabot
Subventionnement (détails) :
This work has been partially funded by the AIDOaRt project (ECSEL Joint Undertaking, grant agreement 101007350); the research network RED2022-134647-T (MCIN/AEI/10.13039/501100011033); the Luxembourg National Research Fund (FNR) PEARL program (grant agreement 16544475); and the Spanish government (PID2020-114615RB-I00/AEI/10.13039/501100011033, project LOCOSS).
Sarah Alnegheimish, Alicia Guo, and Yi Sun. 2022. Using natural sentence prompts for understanding biases in language models. In Human Language Technologies. ACL, 2824–2830.
Jacqui Ayling and Adriane Chapman. 2022. Putting AI ethics to work: Are the tools fit for purpose? AI and Ethics 2, 3 (2022), 405–429.
Christine Basta, Marta R. Costa-Jussà, and Noe Casas. 2019. Evaluating the underlying gender bias in contextualized word embeddings. In Gender Bias in NLP. ACL, 33–39.
Rishabh Bhardwaj, Navonil Majumder, and Soujanya Poria. 2021. Investigating gender bias in BERT. Cognitive Computation 13, 4 (2021), 1008–1018.
Federico Bianchi, Pratyusha Kalluri, Esin Durmus, Faisal Ladhak, Myra Cheng, Debora Nozza, Tatsunori Hashimoto, Dan Jurafsky, James Zou, and Aylin Caliskan. 2023. Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT’23). ACM, 1493–1504.
Charlotte Bird, Eddie Ungless, and Atoosa Kasirzadeh. 2023. Typology of Risks of Generative Text-to-Image Models. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society (AIES’23). ACM, 396–410.
Tolga Bolukbasi, Kai-Wei Chang, James Y Zou, Venkatesh Saligrama, and Adam T Kalai. 2016. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. Advances in NeurIPS 29 (2016).
Myra Cheng, Esin Durmus, and Dan Jurafsky. 2023. Marked personas: Using natural language prompts to measure stereotypes in language models. arXiv preprint arXiv:2305.18189 (2023).
European Comission. 2019. Ethics guidelines for trustworthy AI. Retrieved February 29, 2024 from https://ec.europa.eu/digital-single-market/en/news/ethicsguidelines-trustworthy-ai
S. R. Dalal, A. Jain, N. Karunanithi, J. M. Leaton, C. M. Lott, G. C. Patton, and B. M. Horowitz. 1999. Model-based testing in practice. In Proceedings of the 21st International Conference on Software Engineering (ICSE’99). Association for Computing Machinery, 285–294.
Jwala Dhamala, Tony Sun, Varun Kumar, Satyapriya Krishna, Yada Pruksachatkun, Kai-Wei Chang, and Rahul Gupta. 2021. Bold: dataset and metrics for measuring biases in open-ended language generation. In Conf. on Fairness, Accountability, and Transparency. ACM, 862–872.
Ray Eitel-Porter. 2021. Beyond the promise: Implementing ethical AI. AI and Ethics 1 (2021), 73–80.
European Union. 2024. The artificial intelligence act. Retrieved February 29, 2024 from https://artificialintelligenceact.eu
Jessica Fjeld, Nele Achten, Hannah Hilligoss, Adam Nagy, and Madhulika Srikumar. 2020. Principled artificial intelligence: Mapping consensus in ethical and rights-based approaches to principles for AI. Berkman Klein Center Research Publication 2020-1 (2020).
Luciano Floridi, Josh Cowls, Monica Beltrametti, et al. 2018. AI4People - An ethical framework for a good AI society: Opportunities, risks, principles, and recommendations. Minds and Machines 28 (2018), 689–707. Issue 4.
Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, and Noah A. Smith. 2020. RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In EMNLP. ACL, 3356–3369.
Joan Giner-Miguelez, Abel Gómez, and Jordi Cabot. 2023. A domain-specific language for describing machine learning datasets. Journal of Computer Languages 76 (2023), 101209.
Delaram Golpayegani, Harshvardhan J Pandit, and Dave Lewis. 2022. AIRO: An ontology for representing AI risks based on the proposed EU AI act and ISO risk management standards. In Int. Conf. on Semantic Systems, Vol. 55. 51.
Renata Guizzardi, Glenda Amaral, Giancarlo Guizzardi, and John Mylopoulos. 2023. An ontology-based approach to engineering ethicality requirements. SoSyM (2023), 1–27.
Thilo Hagendorff. 2020. The ethics of AI ethics: An evaluation of guidelines. Minds and machines 30, 1 (2020), 99–120.
Andrew Harrison, Dayana Spagnuelo, and Ilaria Tiddi. 2021. An ontology for ethical AI principles. Semantic Web Journal (2021).
Javier Camacho Ibáñez and Mónica Villas Olmeda. 2022. Operationalising AI ethics: How are companies bridging the gap between practice and principles? An exploratory study. AI & Society 37, 4 (2022), 1663–1687.
Abu Zafer Javed, Paul A. Strooper, and Geoffrey N. Watson. 2007. Automated generation of test cases using model-driven architecture. In AST’07. IEEE, 3–3.
Anna Jobin, Marcello Ienca, and Effy Vayena. 2019. The global landscape of AI ethics guidelines. Nature Machine Intelligence 1, 9 (2019), 389–399.
Keita Kurita, Nidhi Vyas, Ayush Pareek, Alan W Black, and Yulia Tsvetkov. 2019. Measuring bias in contextualized word representations. arXiv preprint arXiv:1906.07337 (2019).
Qinghua Lu, Liming Zhu, Xiwei Xu, Jon Whittle, David Douglas, and Conrad Sanderson. 2022. Software engineering for responsible AI: An empirical study and operationalised patterns. In ICSE-SEIP. ACM, 241–242.
Jessica Morley, Anat Elhalal, Francesca Garcia, Libby Kinsey, Jakob Mökander, and Luciano Floridi. 2021. Ethics as a service: A pragmatic operationalisation of AI ethics. Minds and Machines 31, 2 (2021), 239–256.
Jessica Morley, Libby Kinsey, Anat Elhalal, Francesca Garcia, Marta Ziosi, and Luciano Floridi. 2021. Operationalising AI ethics: Barriers, enablers and next steps. AI & Society (2021), 1–13.
Mohamed Mussa, Samir Ouchani, Waseem Al Sammane, and Abdelwahab Hamou-Lhadj. 2009. A survey of model-driven testing techniques. In International Conference on Quality Software. 167–172.
Moin Nadeem, Anna Bethke, and Siva Reddy. 2021. StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the AACL and the 11th International Joint Conference on NLP. ACL, 5356–5371.
Iman Naja, Milan Markovic, Peter Edwards, and Caitlin Cottrill. 2021. A semantic framework to support AI system accountability and audit. In ESWC. Springer, 160–176.
Arshaan Nazir, Thadaka Kalyan Chakravarthy, David Amore Cecchini, Rakshit Khajuria, Prikshit Sharma, Ali Tarik Mirik, Veysel Kocaman, and David Talby. 2024. LangTest: A comprehensive evaluation library for custom LLM and NLP models. Software Impacts 19 (2024), 100619.
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 1, Article 140 (jan 2020), 67 pages.
Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, and Nanyun Peng. 2019. The woman worked as a babysitter: On biases in language generation. In EMNLPIJCNLP. ACL, 3407–3412.
Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, et al. 2022. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615 (2022).
Bernd Carsten Stahl, Job Timmermans, and Brent Daniel Mittelstadt. 2016. The Ethics of Computing: A Survey of the Computing-Oriented Literature. ACM Comput. Surv. 48, 4 (2016).
The White House 2023. Executive order on the safe, secure, and trustworthy development and use of artificial intelligence. Retrieved February 29, 2024 from https://www.whitehouse.gov/briefing-room/presidentialactions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthydevelopment-and-use-of-artificial-intelligence
UNESCO. 2021. Recommendation on the ethics of artificial intelligence. Retrieved February 29, 2024 from https://unesdoc.unesco.org/ark:/48223/pf0000380455
Yuxuan Wan, Wenxuan Wang, Pinjia He, Jiazhen Gu, Haonan Bai, and Michael Lyu. 2023. BiasAsker: Measuring the bias in conversational AI system. arXiv preprint arXiv:2305.12434 (2023).
Laura Weidinger, John Mellor, Maribeth Rauh, et al. 2021. Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359 (2021).
Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. 2018. Gender bias in coreference resolution: Evaluation and debiasing methods. arXiv preprint arXiv:1804.06876 (2018).
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, Hao Zhang, Joseph E Gonzalez, and Ion Stoica. 2023. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. In Advances in Neural Information Processing Systems, Vol. 36. Curran Associates, Inc., 46595–46623.
Xi Zhiheng, Zheng Rui, and Gui Tao. 2023. Safety and ethical concerns of large language models. In Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 4: Tutorial Abstracts). 9–16.