When Fine-Tuning LLMs Meets Data Privacy: An Empirical Study of Federated Learning in LLM-Based Program Repair

Luo, Wenqiang; Keung, Jacky; Yang, Boyang; Ye, He; Le Goues, Claire; BISSYANDE, Tegawendé; Tian, Haoye; Le, Xuan Bach D.

doi:10.1145/3733599

Download

Article (Scientific journals)

When Fine-Tuning LLMs Meets Data Privacy: An Empirical Study of Federated Learning in LLM-Based Program Repair

Luo, Wenqiang; Keung, Jacky; Yang, Boyang et al.

2025 • In ACM Transactions on Software Engineering and Methodology

Peer Reviewed verified by ORBi

Permalink
https://hdl.handle.net/10993/66853

DOI
10.1145/3733599

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

3733599.pdf

Author postprint (6.97 MB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

privacy; federated learning; empirical study; LLM; fine tuning

Abstract :

[en] Software systems have been evolving rapidly and inevitably introducing bugs at an increasing rate, leading to significant maintenance costs. While large language models (LLMs) have demonstrated remarkable potential in enhancing software development and maintenance practices, particularly in automated program repair (APR), they rely heavily on high-quality code repositories. Most code repositories are proprietary assets that capture the diversity and nuances of real-world industry software practices, which public datasets cannot fully represent. However, obtaining such data from various industries is hindered by data privacy concerns, as companies are reluctant to share their proprietary codebases. There has also been no in-depth investigation of collaborative software development by learning from private and decentralized data while preserving data privacy for program repair. To address the gap, we investigate federated learning as a privacy-preserving method for fine-tuning LLMs on proprietary and decentralized data to boost collaborative software development and maintenance. We use the private industrial dataset TutorCode for fine-tuning and the EvalRepair-Java benchmark for evaluation, and assess whether federated fine-tuning enhances program repair. We then further explore how code heterogeneity (i.e., variations in coding style, complexity, and embedding) and different federated learning algorithms affect bug fixing to provide practical implications for real-world software development collaboration. Our evaluation reveals that federated fine-tuning can significantly enhance program repair, achieving increases of up to 16.67% for Top@10 and 18.44% for Pass@10, even comparable to the bug-fixing capabilities of centralized learning. Moreover, the negligible impact of code heterogeneity implies that industries can effectively collaborate despite diverse data distributions. Different federated algorithms also demonstrate unique strengths across LLMs, suggesting that tailoring the optimization process to specific LLM characteristics can further improve program repair.

Disciplines :

Computer science

Author, co-author :

Luo, Wenqiang ; Department of Computer Science, City University of Hong Kong, China

Keung, Jacky ; Department of Computer Science, City University of Hong Kong, China

Yang, Boyang ; Jisuan Institute of Technology, Beijing JudaoYouda Network Technology Co. Ltd., China

Ye, He ; School of Computer Science, Carnegie Mellon University, USA

Le Goues, Claire ; School of Computer Science, Carnegie Mellon University, USA

BISSYANDE, Tegawendé ; University of Luxembourg

Tian, Haoye ; School of Computing and Information Systems, University of Melbourne, Australia

Le, Xuan Bach D. ; School of Computing and Information Systems, University of Melbourne, Australia

External co-authors :

yes

Language :

English

Title :

When Fine-Tuning LLMs Meets Data Privacy: An Empirical Study of Federated Learning in LLM-Based Program Repair

Publication date :

May 2025

Journal title :

ACM Transactions on Software Engineering and Methodology

ISSN :

1049-331X

Publisher :

Association for Computing Machinery (ACM)

Peer reviewed :

Peer Reviewed verified by ORBi

Additional URL :

https://dl.acm.org/doi/pdf/10.1145/3733599

Available on ORBilu :

since 15 December 2025

Statistics

Number of views

20 (2 by Unilu)

Number of downloads

7 (0 by Unilu)

More statistics

OpenCitations

OpenAlex citations