Article (Scientific journals)
When Fine-Tuning LLMs Meets Data Privacy: An Empirical Study of Federated Learning in LLM-Based Program Repair
Luo, Wenqiang; Keung, Jacky; Yang, Boyang et al.
2025In ACM Transactions on Software Engineering and Methodology
Peer Reviewed verified by ORBi
 

Files


Full Text
3733599.pdf
Author postprint (6.97 MB)
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
privacy; federated learning; empirical study; LLM; fine tuning
Abstract :
[en] Software systems have been evolving rapidly and inevitably introducing bugs at an increasing rate, leading to significant maintenance costs. While large language models (LLMs) have demonstrated remarkable potential in enhancing software development and maintenance practices, particularly in automated program repair (APR), they rely heavily on high-quality code repositories. Most code repositories are proprietary assets that capture the diversity and nuances of real-world industry software practices, which public datasets cannot fully represent. However, obtaining such data from various industries is hindered by data privacy concerns, as companies are reluctant to share their proprietary codebases. There has also been no in-depth investigation of collaborative software development by learning from private and decentralized data while preserving data privacy for program repair. To address the gap, we investigate federated learning as a privacy-preserving method for fine-tuning LLMs on proprietary and decentralized data to boost collaborative software development and maintenance. We use the private industrial dataset TutorCode for fine-tuning and the EvalRepair-Java benchmark for evaluation, and assess whether federated fine-tuning enhances program repair. We then further explore how code heterogeneity (i.e., variations in coding style, complexity, and embedding) and different federated learning algorithms affect bug fixing to provide practical implications for real-world software development collaboration. Our evaluation reveals that federated fine-tuning can significantly enhance program repair, achieving increases of up to 16.67% for Top@10 and 18.44% for Pass@10, even comparable to the bug-fixing capabilities of centralized learning. Moreover, the negligible impact of code heterogeneity implies that industries can effectively collaborate despite diverse data distributions. Different federated algorithms also demonstrate unique strengths across LLMs, suggesting that tailoring the optimization process to specific LLM characteristics can further improve program repair.
Disciplines :
Computer science
Author, co-author :
Luo, Wenqiang ;  Department of Computer Science, City University of Hong Kong, China
Keung, Jacky ;  Department of Computer Science, City University of Hong Kong, China
Yang, Boyang ;  Jisuan Institute of Technology, Beijing JudaoYouda Network Technology Co. Ltd., China
Ye, He ;  School of Computer Science, Carnegie Mellon University, USA
Le Goues, Claire ;  School of Computer Science, Carnegie Mellon University, USA
BISSYANDE, Tegawendé  ;  University of Luxembourg
Tian, Haoye ;  School of Computing and Information Systems, University of Melbourne, Australia
Le, Xuan Bach D. ;  School of Computing and Information Systems, University of Melbourne, Australia
External co-authors :
yes
Language :
English
Title :
When Fine-Tuning LLMs Meets Data Privacy: An Empirical Study of Federated Learning in LLM-Based Program Repair
Publication date :
May 2025
Journal title :
ACM Transactions on Software Engineering and Methodology
ISSN :
1049-331X
Publisher :
Association for Computing Machinery (ACM)
Peer reviewed :
Peer Reviewed verified by ORBi
Available on ORBilu :
since 15 December 2025

Statistics


Number of views
0 (0 by Unilu)
Number of downloads
0 (0 by Unilu)

OpenCitations
 
0
OpenAlex citations
 
0

Bibliography


Similar publications



Contact ORBilu