Article (Scientific journals)
Open Source AI-based SE Tools: Opportunities and Challenges of Collaborative Software Learning
Lin, Zhihao; Ma, Wei; Lin, Tao et al.
2025In ACM Transactions on Software Engineering and Methodology, 34 (5), p. 1-24
Peer Reviewed verified by ORBi
 

Files


Full Text
2024-TOSEM2024-1.pdf
Author postprint (15 MB)
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
Data Privacy; Federated Learning; Open Source Code Model; Software Engineering; Code understanding; Collaborative softwares; Engineering tasks; Language model; Open source code model; Open-source; Open-source code; Software engineering model; Software engineering tools; Software Quality; Software
Abstract :
[en] Large language models (LLMs) have become instrumental in advancing software engineering (SE) tasks, showcasing their efficacy in code understanding and beyond. AI code models have demonstrated their value not only in code generation but also in defect detection, enhancing security measures and improving overall software quality. They are emerging as crucial tools for both software development and maintaining software quality. Like traditional SE tools, open source collaboration is key in realizing the excellent products. However, with AI models, the essential need is in data. The collaboration of these AI-based SE models hinges on maximizing the sources of high-quality data. However, data, especially of high quality, often hold commercial or sensitive value, making them less accessible for open source AI-based SE projects. This reality presents a significant barrier to the development and enhancement of AI-based SE tools within the SE community. Therefore, researchers need to find solutions for enabling open source AI-based SE models to tap into resources by different organizations. Addressing this challenge, our position article investigates one solution to facilitate access to diverse organizational resources for open source AI models, ensuring that privacy and commercial sensitivities are respected. We introduce a governance framework centered on federated learning (FL), designed to foster the joint development and maintenance of open source AI code models while safeguarding data privacy and security. Additionally, we present guidelines for developers on AI-based SE tool collaboration, covering data requirements, model architecture, updating strategies, and version control. Given the significant influence of data characteristics on FL, our research examines the effect of code data heterogeneity on FL performance. We consider six different scenarios of data distributions and include four code models. We also include four most common FL algorithms. Our experimental findings highlight the potential for employing FL in the collaborative development and maintenance of AI-based SE models. We also discuss the key issues to be addressed in the co-construction process and future research directions.
Disciplines :
Computer science
Author, co-author :
Lin, Zhihao ;  Beihang University, Beijing, China
Ma, Wei ;  Singapore Management University, Singapore, Singapore
Lin, Tao ;  Westlake University, Hangzhou, China
Zheng, Yaowen ;  Nanyang Technological University, Singapore, Singapore
Ge, Jingquan ;  Nanyang Technological University, Singapore, Singapore
Wang, Jun ;  University of Luxembourg, Esch-sur-Alzette, Luxembourg
KLEIN, Jacques  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX
BISSYANDE, Tegawendé  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX
Liu, Yang ;  Nanyang Technological University, Singapore, Singapore
Li, Li ;  Beihang University, Beijing, China
External co-authors :
yes
Language :
English
Title :
Open Source AI-based SE Tools: Opportunities and Challenges of Collaborative Software Learning
Publication date :
24 May 2025
Journal title :
ACM Transactions on Software Engineering and Methodology
ISSN :
1049-331X
Publisher :
Association for Computing Machinery
Volume :
34
Issue :
5
Pages :
1-24
Peer reviewed :
Peer Reviewed verified by ORBi
Available on ORBilu :
since 15 December 2025

Statistics


Number of views
0 (0 by Unilu)
Number of downloads
0 (0 by Unilu)

Scopus citations®
 
1
Scopus citations®
without self-citations
1
OpenCitations
 
0
OpenAlex citations
 
4
WoS citations
 
1

Bibliography


Similar publications



Contact ORBilu