Paper published in a book (Scientific congresses, symposiums and conference proceedings)
Can Contributing More Put You at a Higher Leakage Risk? The Relationship Between Shapley Value and Training Data Leakage Risks in Federated Learning
EL MESTARI, Soumia Zohra; Zuziak, Maciej; LENZINI, Gabrieleet al.
2025 • In Can Contributing More Put You at a Higher Leakage Risk? The Relationship Between Shapley Value and Training Data Leakage Risks in Federated Learning
[en] Federated Learning (FL) is a crucial approach for training large-scale AI models while preserving data locality, eliminating the need for centralised data storage. In collaborative learning settings, ensuring data quality is essential, and in FL, maintaining privacy requires limiting the knowledge accessible to the central orchestrator, which evaluates and manages client contributions. Accurately measuring and regulating the marginal impact of each client’s contribution needs specialised techniques. This work examines the relationship between one such technique—Shapley Values—and a client’s vulnerability to Membership inference attacks (MIAs). Such a correlation would suggest that the contribution index could reveal high-risk participants, potentially allowing a malicious orchestrator to identify and exploit the most vulnerable clients. Conversely, if no such relationship is found, it would indicate that contribution metrics do not inherently expose information exploitable for powerf ul privacy attacks. Our empirical analysis in a cross-silo FL setting demonstrates that leveraging contribution metrics in federated environments does not substantially amplify privacy risks.
Research center :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > IRiSC - Socio-Technical Cybersecurity
Disciplines :
Computer science
Author, co-author :
EL MESTARI, Soumia Zohra ✱; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > IRiSC
Zuziak, Maciej ✱; National Research Council, Pisa, Italy
LENZINI, Gabriele ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > IRiSC
Rinzivillo, Salvatore; National Research Council, Pisa, Italy
✱ These authors have contributed equally to this work.
External co-authors :
yes
Language :
English
Title :
Can Contributing More Put You at a Higher Leakage Risk? The Relationship Between Shapley Value and Training Data Leakage Risks in Federated Learning
Original title :
[en] Can Contributing More Put You at a Higher Leakage Risk? The Relationship Between Shapley Value and Training Data Leakage Risks in Federated Learning
Publication date :
2025
Event name :
22nd International Conference on Security and Cryptography - SECRYPT
Event organizer :
INSTICC
Event place :
Bilbao, Spain
Event date :
June 2025
Event number :
22
Audience :
International
Main work title :
Can Contributing More Put You at a Higher Leakage Risk? The Relationship Between Shapley Value and Training Data Leakage Risks in Federated Learning
Publisher :
SciTePress
ISBN/EAN :
978-989-758-760-3
Pages :
275-286
Peer reviewed :
Peer reviewed
Focus Area :
Computational Sciences
European Projects :
H2020 - 956562 - LeADS - Legality Attentive Data Scientists
Funders :
European Union. Marie Skłodowska-Curie Actions European Union
Abadi, M., Chu, A., Goodfellow, I., and et al. (2016). Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pages 308–318.
Cohen, G., Afshar, S., Tapson, J., and et al. (2017). Emnist: Extending mnist to handwritten letters. In 2017 international joint conference on neural networks (IJCNN), pages 2921–2926. IEEE.
Dickey, D. A. and Fuller, W. A. (1979). Distribution of the Estimators for Autoregressive Time Series With a Unit Root. Journal of the American Statistical Association, 74(366):427–431. Publisher: [American Statistical Association, Taylor & Francis, Ltd.].
El Mestari, S. Z., Lenzini, G., and Demirci, H. (2024). Preserving data privacy in machine learning systems. Computers & Security, 137:103605.
Ghorbani, A. and Zou, J. (2019). Data Shapley: Equitable Valuation of Data for Machine Learning. arXiv:1904.02868 [cs, stat].
Granger, C. W. J. (1969). Investigating Causal Relations by Econometric Models and Cross-spectral Methods. Econometrica, 37(3):424–438. Publisher: [Wiley, Econometric Society].
Gu, Y., Bai, Y., and Xu, S. (2022). Cs-mia: Membership inference attack based on prediction confidence series in federated learning. Journal of Information Security and Applications, 67:103201.
Guo, W., Wang, Y., and Jiang, P. (2023). Incentive mechanism design for federated learning with stackelberg game perspective in the industrial scenario. Comput. Ind. Eng., 184(C).
Hestness, J., Narang, S., and et al. (2017). Deep learning scaling is predictable, empirically. arXiv preprint arXiv:1712.00409.
Jain, A., Patel, H., and et al. (2020). Overview and importance of data quality for machine learning tasks. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pages 3561–3562.
Jia, R., Dao, D., and et al. (2019). Towards Efficient Data Valuation Based on the Shapley Value. In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, pages 1167–1176. PMLR. ISSN: 2640-3498.
Kairouz, P., McMahan, H. B., and et al. (2021). Advances and Open Problems in Federated Learning. arXiv:1912.04977 [cs, stat]. arXiv: 1912.04977.
Koh, P. W. and Liang, P. (2017). Understanding black-box predictions via influence functions. In International conference on machine learning, pages 1885–1894. PMLR.
Le, Y. and Yang, X. (2015). Tiny imagenet visual recognition challenge. CS 231N, 7(7):3.
Li, H., Meng, D., and et al. (2020a). Knowledge federation: A unified and hierarchical privacy-preserving ai framework. In 2020 IEEE International Conference on Knowledge Graph (ICKG), pages 84–91. IEEE.
Li, Z., Lin, T., Shang, X., and Wu, C. (2023). Revisiting Weighted Aggregation in Federated Learning with Neural Networks. arXiv:2302.10911 [cs].
Li, Z., Sharma, V., and P. Mohanty, S. (2020b). Preserving Data Privacy via Federated Learning: Challenges and Solutions. IEEE Consumer Electronics Magazine, 9(3):8–16. Conference Name: IEEE Consumer Electronics Magazine.
Liu, Z., Chen, Y., and et al. (2021). GTG-Shapley: Efficient and Accurate Participant Contribution Evaluation in Federated Learning. arXiv:2109.02053 [cs].
Long, Y., Bindschaedler, V., and et al. (2018). Understanding membership inferences on well-generalized learning models. arXiv preprint arXiv:1802.04889.
McMahan, H. and Moore, E. a. e. (2017). Communicationefficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR.
Melis, L., Song, C., and et al. (2018). Inference attacks against collaborative learning. CoRR, abs/1805.04049.
Nasr, M., Shokri, R., and Houmansadr, A. (2019). Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning. In 2019 IEEE Symposium on Security and Privacy (SP), pages 739–753.
Pearson, K. (1895). Note on Regression and Inheritance in the Case of Two Parents. Proceedings of the Royal Society of London, 58:240–242. Publisher: The Royal Society.
Shapley, L. S. (1952). A Value for N-Person Games. Technical report, RAND Corporation.
Shokri, R., Stronati, M., and et al. (2017). Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP), pages 3–18. IEEE.
Song, M., Wang, Z., Zhang, Z., and et al. (2020). Analyzing user-level privacy attack against federated learning. IEEE Journal on Selected Areas in Communications, 38(10):2430–2444.
Song, T., Tong, Y., and Wei, S. (2019). Profit Allocation for Federated Learning. In 2019 IEEE International Conference on Big Data (Big Data), pages 2577–2586.
Spearman, C. (1904). The Proof and Measurement of Association between Two Things. The American Journal of Psychology, 15(1):72–101. Publisher: University of Illinois Press.
Thakkar, O. D., Ramaswamy, S., and et al. (2021). Understanding unintended memorization in language models under federated learning. In Proceedings of the Third Workshop on Privacy in Natural Language Processing, pages 1–10, Online. Association for Computational Linguistics.
Wang, G., Dang, C. X., and Zhou, Z. (2019a). Measure contribution of participants in federated learning. In 2019 IEEE international conference on big data (Big Data), pages 2597–2604. IEEE.
Wang, G., Dang, C. X., and Zhou, Z. (2019b). Measure Contribution of Participants in Federated Learning. arXiv:1909.08525 [cs, stat].
Wang, J., Charles, Z., and et al. (2021). A Field Guide to Federated Optimization. arXiv:2107.06917 [cs]. arXiv: 2107.06917.
Wang, T., Rausch, J., and et al. (2020). A Principled Approach to Data Valuation for Federated Learning. In Yang, Q., Fan, L., and Yu, H., editors, Federated Learning: Privacy and Incentive, Lecture Notes in Computer Science, pages 153–167. Springer International Publishing, Cham.
Wei, S., Tong, Y., and et al. (2020). Efficient and Fair Data Valuation for Horizontal Federated Learning. In Yang, Q., Fan, L., and Yu, H., editors, Federated Learning: Privacy and Incentive, Lecture Notes in Computer Science, pages 139–152. Springer International Publishing, Cham.
Yang, Q., Liu, Y., Chen, T., and Tong, Y. (2019). Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST), 10(2):1–19.
Yousefpour, A., Shilov, I., and et al. (2021). Opacus: Userfriendly differential privacy library in pytorch. arXiv preprint arXiv:2109.12298.
Zhang, J., Zhang, J., and et al. (2020). Gan enhanced membership inference: A passive local attack in federated learning. In ICC 2020 - 2020 IEEE International Conference on Communications (ICC), pages 1–6.
Zheng, S., Cao, Y., and Yoshikawa, M. (2023). Secure Shapley Value for Cross-Silo Federated Learning. arXiv:2209.04856 [cs].