Neural Network; Software Evolution Analysis; Bill of Material
Abstract :
[en] Neural networks have become integral to many fields due to their exceptional performance. The open-source community has witnessed a rapid influx of neural network (NN) repositories with fast-paced iterations, making it crucial for practitioners to analyze their evolution to guide development and stay ahead of trends. While extensive research has explored traditional software evolution using Software Bill of Materials (SBOMs), these are illsuited for NN software, which relies on pre-defined modules and pre-trained models (PTMs) with distinct component structures and reuse patterns. Conceptual AI Bills of Materials (AIBOMs) also lack practical implementations for large-scale evolutionary analysis. To fill this gap, we introduce the Neural Network Bill of Material (NNBOM), a comprehensive dataset construct tailored for NN software. We create a large-scale NNBOM database from 55,997 curated PyTorch GitHub repositories, cataloging their TPLs, PTMs, and modules. Leveraging this database, we conduct a comprehensive empirical study of neural network software evolution across software scale, component reuse, and inter-domain dependency, providing maintainers and developers with a holistic view of its long-term trends. Building on these findings, we develop two prototype applications, Multi repository Evolution Analyzer and Single repository Component Assessor and Recommender, to demonstrate the practical value of our analysis.
Disciplines :
Computer science
Author, co-author :
Ren, Xiaoning; University of Science and Technology of China, China
Ye, Yuhang; University of Science and Technology of China, China
WU, Xiongfei ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
Wu, Yueming; Huazhong University of Science and Technology, China
Xue, Yinxing; Institute of AI for Industries, China
External co-authors :
yes
Language :
English
Title :
Demystifying the Evolution of Neural Networks with BOM Analysis: Insights from a Large-Scale Study of 55,997 GitHub Repositories
Publication date :
2025
Event name :
AUTOMATED SOFTWARE ENGINEERING
Event place :
Seoul, South Korea
Event date :
16-20 November 2025
Audience :
International
Main work title :
Proceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering (ASE 2025)
Publisher :
IEEE Computer Society, Los Alamitos, CA, United States
S. Carmody, A. Coravos, G. Fahs, A. Hatch, J. Medina, B. Woods, and J. Corman, "Building resilient medical technology supply chains with a software bill of materials," npj Digital Medicine, vol. 4, no. 1, pp. 1-6, 2021.
J. Levinson, J. Askeland, J. Becker, J. Dolson, D. Held, S. Kammel, J. Z. Kolter, D. Langer, O. Pink, V. Pratt et al., "Towards fully autonomous driving: Systems and algorithms," in 2011 IEEE intelligent vehicles symposium (IV). IEEE, 2011, pp. 163-168.
M. Kim and D. Notkin, "Using a clone genealogy extractor for understanding and supporting evolution of code clones," ACM SIGSOFT Software Engineering Notes, vol. 30, no. 4, pp. 1-5, 2005.
E. Juergens, F. Deissenboeck, B. Hummel, and S. Wagner, "Do code clones matter" in 2009 IEEE 31st International Conference on Software Engineering. IEEE, 2009, pp. 485-495.
L. Barbour, L. An, F. Khomh, Y. Zou, and S. Wang, "An investigation of the fault-proneness of clone evolutionary patterns," Software Quality Journal, vol. 26, pp. 1187-1222, 2018.
P. Thongtanunam, W. Shang, and A. E. Hassan, "Will this clone be shortlived towards a better understanding of the characteristics of short-lived clones," Empirical Software Engineering, vol. 24, pp. 937-972, 2019.
M. Assi, S. Hassan, and Y. Zou, "Unraveling code clone dynamics in deep learning frameworks," arXiv preprint arXiv:2404.17046, 2024.
R. Kikas, G. Gousios, M. Dumas, and D. Pfahl, "Structure and evolution of package dependency networks," in 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). IEEE, 2017, pp. 102-112.
E. Wittern, P. Suter, and S. Rajagopalan, "A look at the dynamics of the javascript package ecosystem," in Proceedings of the 13th International Conference on Mining Software Repositories, 2016, pp. 351-361.
A. Decan, T. Mens, and P. Grosjean, "An empirical comparison of dependency network evolution in seven software packaging ecosystems," Empirical Software Engineering, vol. 24, no. 1, pp. 381-416, 2019.
J. Yang, H. Jin, R. Tang, X. Han, Q. Feng, H. Jiang, S. Zhong, B. Yin, and X. Hu, "Harnessing the power of llms in practice: A survey on chatgpt and beyond," ACM Transactions on Knowledge Discovery from Data, vol. 18, no. 6, pp. 1-32, 2024.
B. K. Chan, "Artificial intelligence bill of materials (aibom)," https://minddata.org/ bill-of-artificial-intelligence-materials-boaim-Brian-Ka-Chan-AI, 2024.
B. Xia, T. Bi, Z. Xing, Q. Lu, and L. Zhu, "An empirical study on software bill of materials: Where we stand and the road ahead," in 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2023, pp. 2630-2642.
T. Stalnaker, N. Wintersgill, O. Chaparro, M. Di Penta, D. M. German, and D. Poshyvanyk, "Boms away! inside the minds of stakeholders: A comprehensive study of bills of materials for software systems," in Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, 2024, pp. 1-13.
S. Carmody, A. Coravos, G. Fahs, A. Hatch, J. Medina, B. Woods, and J. Corman, "Building resilient medical technology supply chains with a software bill of materials," npj Digital Medicine, vol. 4, no. 1, pp. 1-6, 2021.
B. Xia, D. Zhang, Y. Liu, Q. Lu, Z. Xing, and L. Zhu, "Trust in software supply chains: Blockchain-enabled sbom and the aibom future," arXiv preprint arXiv:2307.02088, 2023.
Wiz.io, "AI BOM: AI Bill of Materials," https://www.wiz.io/academy/ ai-bom-ai-bill-of-materials, 2024, accessed: 2025-05-19.
Manifest Cyber, "AIBOM-The AI Bill of Materials," https://www. manifestcyber.com/aibom, 2024, accessed: 2025-05-19.
K. Bennet, G. K. Rajbahadur, A. Suriyawongkul, and K. Stewart, "Implementing ai bill of materials (ai bom) with spdx 3.0: A comprehensive guide to creating ai and dataset bill of materials," arXiv preprint arXiv:2504.16743, 2025.
B. Ray, D. Posnett, V. Filkov, and P. Devanbu, "A large scale study of programming languages and code quality in github," in Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, 2014, pp. 155-165.
H. Borges, A. Hora, and M. T. Valente, "Understanding the factors that impact the popularity of github repositories," in 2016 IEEE international conference on software maintenance and evolution (ICSME). IEEE, 2016, pp. 334-344.
N. Munaiah, S. Kroh, C. Cabrey, and M. Nagappan, "Curating github for engineered software projects," Empirical Software Engineering, vol. 22, pp. 3219-3253, 2017.
J. Wu, Z. Xu, W. Tang, L. Zhang, Y. Wu, C. Liu, K. Sun, L. Zhao, and Y. Liu, "Ossfp: Precise and scalable c/c++ third-party library detection using fingerprinting functions," in 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2023, pp. 270-282.
Y. Hao, X. Zhao, B. Bao, D. Berard, W. Constable, A. Aziz, and X. Liu, "Torchbench: Benchmarking pytorch with high api surface coverage," arXiv preprint arXiv:2304.14226, 2023.
PyTorch Team, "2024 year in review," https://pytorch.org/blog/ 2024-year-in-review/, 2024, accessed: 2025-03-30.
O. S. S. Foundation, "Criticality score," 2023, accessed: 2023-12-10. [Online]. Available: https://github.com/ossf/criticality score/tree/main
Y. Peng, R. Hu, R. Wang, C. Gao, S. Li, and M. R. Lyu, "Less is more an empirical study on configuration issues in python pypi ecosystem," in Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, 2024, pp. 1-12.
W. Jiang, N. Synovic, R. Sethi, A. Indarapu, M. Hyatt, T. R. Schorlemmer, G. K. Thiruvathukal, and J. C. Davis, "An empirical study of artifacts and security risks in the pre-trained model supply chain," in Proceedings of the 2022 ACM Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses, 2022, pp. 105-114.
H. Face, "Hugging face-the ai community building the future." https: //huggingface.co/, 2021.
N. Lambaria and T. Cerny, "A data analysis study of code smells within JAVA repositories," Annals of Computer Science and Information Systems, vol. 32, 2022.
S. Idowu, Y. Sens, T. Berger, J. Krüger, and M. Vierhauser, "A largescale study of ml-related python projects," in Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing, 2024, pp. 1272-1281.
E. Sülün, "An empirical analysis of issue templates on github," Ph.D. dissertation, PhD thesis. bilkent university, 2023.
G. Zhang, X. Peng, Z. Xing, S. Jiang, H. Wang, and W. Zhao, "Towards contextual and on-demand code clone management by continuous monitoring," in 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2013, pp. 497-507.
M. Jahanshahi, D. Reid, and A. Mockus, "Beyond dependencies: The role of copy-based reuse in open source software development," ACM Transactions on Software Engineering and Methodology, 2024.
S. Feng, W. Suo, Y. Wu, D. Zou, Y. Liu, and H. Jin, "Machine learning is all you need: A simple token-based approach for effective code clone detection," in Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, 2024, pp. 1-13.
C. K. Roy and J. R. Cordy, "Benchmarks for software clone detection: A ten-year retrospective," in 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2018, pp. 26-37.
H. Yu, W. Lam, L. Chen, G. Li, T. Xie, and Q. Wang, "Neural detection of semantic code clones via tree-based convolution," in 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC). IEEE, 2019, pp. 70-80.
S. Feng, W. Suo, Y. Wu, D. Zou, Y. Liu, and H. Jin, "Machine learning is all you need: A simple token-based approach for effective code clone detection," in Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024. ACM, 2024, pp. 222:1-222:13. [Online]. Available: https://doi.org/10.1145/3597503.3639114
Y. Wang, Y. Ye, Y. Wu, W. Zhang, Y. Xue, and Y. Liu, "Comparison and evaluation of clone detection techniques with different code representations," in 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023. IEEE, 2023, pp. 332-344. [Online]. Available: https://doi.org/10.1109/ICSE48619.2023.00039
H. Sajnani, V. Saini, J. Svajlenko, C. K. Roy, and C. V. Lopes, "Sourcerercc: Scaling code clone detection to big-code," in Proceedings of the 38th international conference on software engineering, 2016, pp. 1157-1168.
C. K. Roy and J. R. Cordy, "NICAD: Accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization," in The 16th IEEE International Conference on Program Comprehension, ICPC 2008, Amsterdam, The Netherlands, June 10-13, 2008, R. L. Krikhaar, R. Lämmel, and C. Verhoef, Eds. IEEE Computer Society, 2008, pp. 172-181. [Online]. Available: https://doi.org/10.1109/ICPC.2008.41
V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, "Fast unfolding of communities in large networks," Journal of statistical mechanics: Theory and experiment, vol. 2008, no. 10, p. P10008, 2008.