Open Source AI-based SE Tools: Opportunities and Challenges of Collaborative Software Learning

Data Privacy; Federated Learning; Open Source Code Model; Software Engineering; Code understanding; Collaborative softwares; Engineering tasks; Language model; Open source code model; Open-source; Open-source code; Software engineering model; Software engineering tools; Software Quality; Software

Abstract :

[en] Large language models (LLMs) have become instrumental in advancing software engineering (SE) tasks, showcasing their efficacy in code understanding and beyond. AI code models have demonstrated their value not only in code generation but also in defect detection, enhancing security measures and improving overall software quality. They are emerging as crucial tools for both software development and maintaining software quality. Like traditional SE tools, open source collaboration is key in realizing the excellent products. However, with AI models, the essential need is in data. The collaboration of these AI-based SE models hinges on maximizing the sources of high-quality data. However, data, especially of high quality, often hold commercial or sensitive value, making them less accessible for open source AI-based SE projects. This reality presents a significant barrier to the development and enhancement of AI-based SE tools within the SE community. Therefore, researchers need to find solutions for enabling open source AI-based SE models to tap into resources by different organizations. Addressing this challenge, our position article investigates one solution to facilitate access to diverse organizational resources for open source AI models, ensuring that privacy and commercial sensitivities are respected. We introduce a governance framework centered on federated learning (FL), designed to foster the joint development and maintenance of open source AI code models while safeguarding data privacy and security. Additionally, we present guidelines for developers on AI-based SE tool collaboration, covering data requirements, model architecture, updating strategies, and version control. Given the significant influence of data characteristics on FL, our research examines the effect of code data heterogeneity on FL performance. We consider six different scenarios of data distributions and include four code models. We also include four most common FL algorithms. Our experimental findings highlight the potential for employing FL in the collaborative development and maintenance of AI-based SE models. We also discuss the key issues to be addressed in the co-construction process and future research directions.

Disciplines :

Computer science

Author, co-author :

Lin, Zhihao ; Beihang University, Beijing, China

Ma, Wei ; Singapore Management University, Singapore, Singapore

Lin, Tao ; Westlake University, Hangzhou, China

Zheng, Yaowen ; Nanyang Technological University, Singapore, Singapore

Ge, Jingquan ; Nanyang Technological University, Singapore, Singapore

Wang, Jun ; University of Luxembourg, Esch-sur-Alzette, Luxembourg

KLEIN, Jacques ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX

BISSYANDE, Tegawendé ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX

Liu, Yang ; Nanyang Technological University, Singapore, Singapore

Li, Li ; Beihang University, Beijing, China

External co-authors :

yes

Language :

English

Title :

Open Source AI-based SE Tools: Opportunities and Challenges of Collaborative Software Learning

Publication date :

24 May 2025

Journal title :

ACM Transactions on Software Engineering and Methodology

ISSN :

1049-331X

Publisher :

Association for Computing Machinery

Volume :

Issue :

Pages :

1-24

Peer reviewed :

Peer Reviewed verified by ORBi

Additional URL :

https://dl.acm.org/doi/10.1145/3708529

Available on ORBilu :

since 15 December 2025

Statistics

Number of views

28 (0 by Unilu)

Number of downloads

4 (0 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

WoS citations^™

Bibliography

ChatGPT. 2022-11. ChatGpt: Optimizing Language Models for Dialogue. Retrieved from https://chat.openai.com
Hugging Face. 2024. Huggingface: The AI Community Building the Future. Retrieved from https://huggingface.co/
Pekka Abrahamsson, Outi Salo, Jussi Ronkainen, and Juhani Warsta. 2017. Agile software development methods: Review and analysis. arXiv:1709.08439. Retrieved from https://arxiv.org/abs/1709.08439
Sadi Alawadi, Khalid Alkharabsheh, Fahed Alkhabbas, Victor Kebande, Feras M Awaysheh, and Fabio Palomba. 2023. Fedcsd: A federated learning based approach for code-smell detection. arXiv:2306.00038. Retrieved from https://arxiv.org/abs/2306.00038
Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 291–300.
Andrea Arcuri and Xin Yao. 2008. A novel co-evolutionary approach to automatic software bug fixing. In 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence). IEEE, 162–168.
Cristian Bucilua, Rich Caruana, and Alexandru Niculescu-Mizil. 2006. Model compression. In 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 535–541.
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating large language models trained on code. arXiv:2107.03374. Retrieved from https://arxiv.org/abs/2107.03374
Ting-Wu Chin, Ruizhou Ding, Cha Zhang, and Diana Marculescu. 2020. Towards efficient model compression via learned global ranking. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1518–1528.
Olivia Choudhury, Aris Gkoulalas-Divanis, Theodoros Salonidis, Issa Sylla, Yoonyoung Park, Grace Hsu, and Amar Das. 2019. Differential privacy-enabled federated learning for sensitive health data. arXiv:1910.02578. Retrieved from https://arxiv.org/abs/1910.02578
Kutluyil Dogancay and Oguz Tanrikulu. 2001. Adaptive filtering algorithms with selective partial updates. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing 48, 8 (2001), 762–769.
Rudresh Dwivedi, Devam Dave, Het Naik, Smiti Singhal, Rana Omer, Pankesh Patel, Bin Qian, Zhenyu Wen, Tejal Shah, Graham Morgan, et al. 2023. Explainable AI (XAI): Core ideas, techniques, and solutions. Computing Surveys 55, 9 (2023), 1–33.
Cynthia Dwork. 2006. Differential privacy. In International Colloquium on Automata, Languages, and Programming (ICALP 2006). Michele Bugliesi, Bart Preneel, Vladimiro Sassone, and Ingo Wegener (Eds.), Vol. 4052, Springer, 1–12. DOI: https://doi.org/10.1007/11787006_1
Youssef El Faqir, Javier Arroyo, and Samer Hassan. 2020. An overview of decentralized autonomous organizations on the blockchain. In 16th International Symposium on Open Collaboration, 1–8.
Angela Fan, Beliz Gokkaya, Mark Harman, Mitya Lyubarskiy, Shubho Sengupta, Shin Yoo, and Jie M. Zhang. 2023. Large language models for software engineering: Survey and open problems. arXiv:2310.03533. Retrieved from https://arxiv.org/abs/2310.03533
Amir Feder, Nadav Oved, Uri Shalit, and Roi Reichart. 2021. CausaLM: Causal model explanation through counterfactual language models. Computational Linguistics 47, 2 (2021), 333–386.
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, et al. 2020. Codebert: A pre-trained model for programming and natural languages. arXiv:2002.08155. Retrieved from https://arxiv.org/abs/2002.08155
Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. 2021. Knowledge distillation: A survey. International Journal of Computer Vision 129, 6 (2021), 1789–1819.
Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, et al. 2020. Graphcodebert: Pre-training code representations with data flow. arXiv:2009.08366. Retrieved from https://arxiv.org/abs/2009.08366
Sirui Hong, Xiawu Zheng, Jonathan Chen, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, et al. 2023. Metagpt: Meta programming for multi-agent collaborative framework. arXiv:2308.00352. Retrieved from https://arxiv.org/abs/2308.00352
Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. 2024. Large language models for software engineering: A systematic literature review. ACM Transactions on Software Engineering and Methodology 33 8 (2024), 1–39. DOI: https://doi.org/10.1145/3695988
Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning. PMLR, 2790–2799.
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2021. Lora: Low-rank adaptation of large language models. arXiv:2106.09685. Retrieved from https://arxiv.org/abs/2106.09685
Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. Codesearchnet challenge: Evaluating the state of semantic code search. arXiv:1909.09436. Retrieved from https://arxiv.org/abs/1909.09436
Johannes Rude Jensen, Victor von Wachter, and Omri Ross. 2021. How decentralized is the governance of blockchain-based finance: Empirical evidence from four governance token distributions. arXiv:2102.10096. Retrieved from https://arxiv.org/abs/2102.10096
Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. 2022. Visual prompt tuning. In European Conference on Computer Vision. Springer, 709–727.
Nan Jiang, Thibaud Lutellier, and Lin Tan. 2021. Cure: Code-aware neural machine translation for automatic program repair. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 1161–1173.
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. Scaling laws for neural language models. arXiv:2001.08361. Retrieved from https://arxiv.org/abs/2001.08361
Donghyun Kim, Kaihong Wang, Stan Sclaroff, and Kate Saenko. 2022. A broad study of pre-training for domain generalization and adaptation. In European Conference on Computer Vision. Springer, 621–638.
Yeongwoo Kim, Ezeddin Al Hakim, Johan Haraldson, Henrik Eriksson, José Mairton B. da Silva, and Carlo Fischione. 2021. Dynamic clustering in federated learning. In ICC 2021-IEEE International Conference on Communications. IEEE, 1–6.
Jakub Konecnỳ, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated learning: Strategies for improving communication efficiency. arXiv:1610.05492. Retrieved from https://arxiv.org/abs/1610.05492
Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, et al. 2023. Starcoder: May the source be with you! arXiv:2305.06161. Retrieved from https://arxiv.org/abs/2305.06161
Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. 2020. Federated optimization in heterogeneous networks. Proceedings of Machine Learning and Systems 2 (2020), 429–450.
Yuzheng Li, Chuan Chen, Nan Liu, Huawei Huang, Zibin Zheng, and Qiang Yan. 2020. A blockchain-based decentralized federated learning framework with committee consensus. IEEE Network 35, 1 (2020), 234–241.
Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. Vuldeepecker: A deep learning-based system for vulnerability detection. arXiv:1801.01681. Retrieved from https://arxiv.org/abs/1801.01681
Shangqing Liu, Yanzhou Li, Xiaofei Xie, and Yang Liu. 2022. Commitbart: A large pre-trained model for github commits. arXiv:2208.08100. Retrieved from https://arxiv.org/abs/2208.08100
Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, et al. 2021. Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv:2102.04664. Retrieved from https://arxiv.org/abs/2102.04664
Ziyang Luo, Can Xu, Pu Zhao, Qingfeng Sun, Xiubo Geng, Wenxiang Hu, Chongyang Tao, Jing Ma, Qingwei Lin, and Daxin Jiang. 2023. Wizardcoder: Empowering code large language models with evol-instruct. arXiv:2306.08568. Retrieved from https://arxiv.org/abs/2306.08568
Wei Ma, Shangqing Liu, Zhihao Lin, Wenhan Wang, Qiang Hu, Ye Liu, Cen Zhang, Liming Nie, Li Li, and Yang Liu. 2023. LMs: Understanding code syntax and semantics for code analysis. arXiv:2305.12138. Retrieved from https://arxiv.org/abs/2305.12138
Wei Ma, Shangqing Liu, Mengjie Zhao, Xiaofei Xie, Wenhang Wang, Qiang Hu, Jie Zhang, and Yang Liu. 2024. Unveiling code pre-trained models: Investigating syntax and semantics capacities. ACM Transactions on Software Engineering Methodology 33, 7, Article 169 (Aug. 2024), 29 pages. DOI: https://doi.org/10.1145/3664606
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. In 20th International Conference on Artificial Intelligence and Statistics. PMLR, 1273–1282.
H. Brendan McMahan, Eider Moore, Daniel Ramage, and Blaise Agüera y Arcas. 2016. Federated learning of deep networks using model averaging. arXiv:1602.05629. Retrieved from https://arxiv.org/abs/1602.05629
Milad Nasr, Reza Shokri, and Amir Houmansadr. 2019. Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning. In 2019 IEEE Symposium on Security and Privacy (SP). IEEE, 739–753.
Henry H. T. Ngan, Grantham K. H. Pang, and Nelson H. C. Yung. 2019. Automated fabric defect detection—A review. Image and Vision Computing 29, 7, (2019), 442–458.
Eirini Ntoutsi, Pavlos Fafalios, Ujwal Gadiraju, Vasileios Iosifidis, Wolfgang Nejdl, Maria-Esther Vidal, Salvatore Ruggieri, Franco Turini, Symeon Papadopoulos, Emmanouil Krasanakis, et al. 2020. Bias in data-driven artificial intelligence systems—An introductory survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 10, 3 (2020), e1356.
Open-Source AI Models. 2024. Open-Source AI Models. Retrieved from https://github.com/mathieu0905/collaborative_software_learning
Sinno Jialin Pan and Qiang Yang. 2009. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 10 (2009), 1345–1359.
Lakshmish Ramaswamy, Bugra Gedik and Ling Liu. 2005. A distributed approach to node clustering in decentralized peer-to-peer networks. IEEE Transactions on Parallel and Distributed Systems 16, 9 (2005), 814–829.
David Rolnick, Priya L. Donti, Lynn H. Kaack, Kelly Kochanski, Alexandre Lacoste, Kris Sankaran, Andrew Slavin Ross, Nikola Milojevic-Dupont, Natasha Jaques, Anna Waldman-Brown, et al. 2022. Tackling climate change with machine learning. ACM Computing Surveys 55, 2 (2022), 1–96.
Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, et al. 2023. Code llama: Open foundation models for code. arXiv:2308.12950. Retrieved from https://arxiv.org/abs/2308.12950
Snehal Satish, Geeta Sandeep Nadella, Karthik Meduri, and Hari Gonaygunta. 2022. Collaborative machine learning without centralized training data for federated learning. International Machine Learning Journal and Computer Engineering 5, 5 (2022), 1–14.
Sarwar Sayeed, Hector Marco-Gisbert, and Tom Caira. 2020. Smart contract: Attacks and protections. IEEE Access 8 (2020), 24416–24427. DOI: https://doi.org/10.1109/ACCESS.2020.2970495
Koustuv Sinha, Robin Jia, Dieuwke Hupkes, Joelle Pineau, Adina Williams, and Douwe Kiela. 2021. Masked language modeling and the distributional hypothesis: Order word matters pre-training for little. arXiv:2104.06644. Retrieved from https://arxiv.org/abs/2104.06644
Emma Strubell, Ananya Ganesh, and Andrew McCallum. 2020. Energy and policy considerations for modern deep learning research. In AAAI Conference on Artificial Intelligence, Vol. 34, 13693–13696.
Evi Suryawati and Kamisah Osman. 2017. Contextual learning: Innovative approach towards the development of students0 scientific attitude and natural Science performance. Eurasia Journal of Mathematics, Science and Technology Education 14, 1 (2017), 61–76.
Edna Chebet Too, Li Yujian, Sam Njuki, and Liu Yingchun. 2019. A comparative study of fine-tuning deep learning models for plant disease identification. Computers and Electronics in Agriculture 161 (2019), 272–279.
Mueen Uddin and Azizah Abdul Rahman. 2012. Energy efficiency and low carbon enabler green IT framework for data centers considering green metrics. Renewable and Sustainable Energy Reviews 16, 6 (2012), 4078–4094.
Aimee Van Wynsberghe. 2021. Sustainable AI: AI for sustainability and the sustainability of AI. AI and Ethics 1, 3 (2021), 213–218.
Julian Von der Mosel, Alexander Trautsch, and Steffen Herbold. 2022. On the validity of pre-trained transformers for natural language processing in the software engineering domain. IEEE Transactions on Software Engineering 49, 4 (2022), 1487–1507.
Junjie Wang, Yuchao Huang, Chunyang Chen, Zhe Liu, Song Wang, and Qing Wang. 2024. Software testing with large language models: Survey, landscape, and vision. IEEE Transactions on Software Engineering 50 (2024), 911–936.
Lidong Wang. 2017. Heterogeneous data and big data analytics. Automatic Control and Information Sciences 3, 1 (2017), 8–15.
Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi D. Q. Bui, Junnan Li, and Steven C. H. Hoi. 2023. Codet5+: Open code large language models for code understanding and generation. arXiv:2305.07922. Retrieved from https://arxiv.org/abs/2305.07922
Yue Wang, Weishi Wang, Shafiq Joty, and Steven C. H. Hoi. 2021. CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In 2021 Conference on Empirical Methods in Natural Language Processing, Marie-Francine Moens, XuanjingHuang, LuciaSpecia, and ScottWen-tau Yih (Eds.). Association for Computational Linguistics, Online and Punta Cana, 8696–8708. DOI: https://doi.org/10.18653/v1/2021.emnlpmain.685
Zhilin Wang and Qin Hu. 2021. Blockchain-based federated learning: A comprehensive survey. arXiv:2110.02182. Retrieved from https://arxiv.org/abs/2110.02182
Erik Wibbels. 2005. Decentralized governance, constitution formation, and redistribution. Constitutional Political Economy 16 (2005), 161–188.
Carole-Jean Wu, Ramya Raghavendra, Udit Gupta, Bilge Acun, Newsha Ardalani, Kiwan Maeng, Gloria Chang, Fiona Aga, Jinshi Huang, Charles Bai, et al. 2022. Sustainable AI: Environmental implications, challenges and opportunities. Proceedings of Machine Learning and Systems 4 (2022), 795–813.
Yue Wu, Yinpeng Chen, Lijuan Wang, Yuancheng Ye, Zicheng Liu, Yandong Guo, and Yun Fu. 2019. Large scale incremental learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 374–382.
Miao Xiong, Zhiyuan Hu, Xinyang Lu, Yifei Li, Jie Fu, Junxian He, and Bryan Hooi. 2023. Can llms express their uncertainty? An empirical evaluation of confidence elicitation in llms. arXiv:2306.13063. Retrieved from https://arxiv.org/abs/2306.13063
Frank F. Xu, Uri Alon, Graham Neubig, and Vincent Josua Hellendoorn. 2022. A systematic evaluation of large language models of code. In 6th ACM SIGPLAN International Symposium on Machine Programming, 1–10.
Hanxiang Xu, Wei Ma, Ting Zhou, Yanjie Zhao, Kai Chen, Qiang Hu, Yang Liu, and Haoyu Wang. 2024. A code knowledge graph-enhanced system for LLM-based fuzz driver generation. arXiv:2411.11532. Retrieved from https://arxiv.org/abs/2411.11532
John Yang, Carlos E. Jimenez, Alexander Wettig, Shunyu Yao, Karthik Narasimhan, and Ofir Press. 2024. SWE-agent: Agent computer interfaces enable software engineering language models. arXiv.2405.15793. Retrieved from https://arxiv.org/abs/2405.15793
Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology 10, 2 (2019), 1–19.
Yanming Yang, Xing Hu, Zhipeng Gao, Jinfu Chen, Chao Ni, Xin Xia, and David Lo. 2024. Federated learning for software engineering: A case study of code clone detection and defect prediction. IEEE Transactions on Software Engineering 50, 2 (Jan. 2024), 296–321. DOI: https://doi.org/10.1109/TSE.2023.3347898
Yifan Yao, Jinhao Duan, Kaidi Xu, Yuanfang Cai, Zhibo Sun, and Yue Zhang. 2024. A survey on large language model (LLM) security and privacy: The good, the bad, and the ugly. High-Confidence Computing (2024), 100211.
Dong Yin, Yudong Chen, Ramchandran Kannan, and Peter Bartlett. 2018. Byzantine-robust distributed learning: Towards optimal statistical rates. In International Conference on Machine Learning. PMLR, 5650–5659.
Zhiqiang Yuan, Mingwei Liu, Shiji Ding, Kaixin Wang, Yixuan Chen, Xin Peng, and Yiling Lou. 2024. Evaluating and improving ChatGPT for unit test generation. Proceedings of the ACM on Software Engineering 1, FSE, Article 76 (July 2024), 24 pages. DOI: https://doi.org/10.1145/3660783
Chen Zhang, Yu Xie, Hang Bai, Bin Yu, Weihong Li, and Yuan Gao. 2021. A survey on federated learning. Knowledge-Based Systems 216 (2021),106775.
Cen Zhang, Yaowen Zheng, Mingqiang Bai, Yeting Li, Wei Ma, Xiaofei Xie, Yuekang Li, Limin Sun, and Yang Liu. 2024. How effective are they? Exploring large language model based fuzz driver generation. In 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’24). ACM, New York, NY, 1223–1235. DOI: https://doi.org/10.1145/3650212.3680355
Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. Advances in Neural Information Processing Systems 32 (2019), 1–12.
Hangyu Zhu, Jinjin Xu, Shiqing Liu, and Yaochu Jin. 2021. Federated learning on non-IID data: A survey. Neurocomputing 465 (2021), 371–390.