[en] Studying and exposing software vulnerabilities is important to ensure software security, safety, and reliability. Software engineers often inject vulnerabilities into their programs to test the reliability of their test suites, vulnerability detectors, and security measures. However, state-of-the-art vulnerability injection methods only capture code syntax/patterns, they do not learn the intent of the vulnerability and are limited to the syntax of the original dataset. To address this challenge, we propose the first intent-based vulnerability injection method that learns both the program syntax and vulnerability intent. Our approach applies a combination of NLP methods and semantic-preserving program mutations (at the bytecode level) to inject code vulnerabilities. Given a dataset of known vulnerabilities (containing benign and vulnerable code pairs), our approach proceeds by employing semantic-preserving program mutations to transform the existing dataset to semantically similar code. Then, it learns the intent of the vulnerability via neural machine translation (Seq2Seq) models. The key insight is to employ Seq2Seq to learn the intent (context) of the vulnerable code in a manner that is agnostic of the specific program instance. We evaluate the performance of our approach using 1275 vulnerabilities belonging to five (5) CWEs from the Juliet test suite. We examine the effectiveness of our approach in producing compilable and vulnerable code. Our results show that INTJECT is effective, almost all (99%) of the code produced by our approach is vulnerable and compilable. We also demonstrate that the vulnerable programs generated by INTJECT are semantically similar to the withheld original vulnerable code. Finally, we show that our mutation-based data transformation approach outperforms its alternatives, namely data obfuscation and using the original data.
Author, co-author :
PETIT, Benjamin; University of Namur, Namur, Belgium
Khanfir, Ahmed ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
Soremekun, Ezekiel ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
Perrouin, Gilles; University of Namur, Namur, Belgium
Papadakis, Michail; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Life Sciences and Medicine (DLSM)
External co-authors :
IntJect: Vulnerability Intent Bug Seeding
Publication date :
Event name :
22nd IEEE International Conference on Software Quality, Reliability, and Security
Event date :
Main work title :
22nd IEEE International Conference on Software Quality, Reliability, and Security
B. Dolan-Gavitt, P. Hulin, E. Kirda, T. Leek, A. Mambretti, W. Robertson, F. Ulrich, and R. Whelan, "Lava: Large-scale automated vulnerability addition, " in 2016 IEEE Symposium on Security and Privacy (SP), 2016, pp. 110-121.
J. Voas and G. McGraw, Software Fault Injection: Inoculating Programs Against Errors. John Wiley & Sons, 1997.
M. Papadakis, C. Henard, M. Harman, Y. Jia, and Y. L. Traon, "Threats to the validity of mutation-based test assessment, " in Proceedings of the 25th International Symposium on Software Testing and Analysis, ISSTA 2016, Saarbrücken, Germany, July 18-20, 2016, A. Zeller and A. Roychoudhury, Eds. ACM, 2016, pp. 354-365. [Online]. Available: Https://doi.org/10.1145/2931037.2931040
J. Fonseca, M. Vieira, and H. Madeira, "Vulnerability & attack injection for web applications, " in 2009 IEEE/IFIP International Conference on Dependable Systems & Networks. IEEE, 2009, pp. 93-102.
M. Papadakis, M. Kintis, J. Zhang, Y. Jia, Y. L. Traon, and M. Harman, "Chapter six -mutation testing advances: An analysis and survey, " Adv. Comput., vol. 112, pp. 275-378, 2019. [Online]. Available: Https://doi.org/10.1016/bs.adcom.2018.03.015
M. Kintis, M. Papadakis, Y. Jia, N. Malevris, Y. L. Traon, and M. Harman, "Detecting trivial mutant equivalences via compiler optimisations, " IEEE Trans. Software Eng., vol. 44, no. 4, pp. 308-333, 2018. [Online]. Available: Https://doi.org/10.1109/TSE.2017.2684805
T. Loise, X. Devroey, G. Perrouin, M. Papadakis, and P. Heymans, "Towards security-aware mutation testing, " in 2017 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), 2017, pp. 97-102.
L. Liu, Z. Li, Y. Wen, and P. Chen, "Investigating the impact of vulnerability datasets on deep learning-based vulnerability detectors, " PeerJ Computer Science, vol. 8, p. e975, 2022.
Z. Chen, S. Kommrusch, M. Tufano, L.-N. Pouchet, D. Poshyvanyk, and M. Monperrus, "Sequencer: Sequence-to-sequence learning for endto-end program repair, " IEEE Transactions on Software Engineering, vol. 47, no. 9, pp. 1943-1959, 2021.
T. Boland and P. E. Black, "Juliet 1.1 C/C++ and java test suite, " Computer, vol. 45, no. 10, pp. 88-90, 2012. [Online]. Available: Https://doi.org/10.1109/MC.2012.345
D. Hovemeyer andW. Pugh, "Finding bugs is easy, " Acm sigplan notices, vol. 39, no. 12, pp. 92-106, 2004.
V. Kashyap, J. Ruchti, L. Kot, E. Turetsky, R. Swords, S. A. Pan, J. Henry, D. Melski, and E. Schulte, "Automated customized bugbenchmark generation, " in 2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM), 2019, pp. 103-114.
A. Khanfir, A. Koyuncu, M. Papadakis, M. Cordy, T. F. Bissyandé, J. Klein, and Y. Le Traon, "Ibir: Bug report driven fault injection, " ACM Trans. Softw. Eng. Methodol., may 2022, just Accepted. [Online]. Available: Https://doi-org.proxy.bnl.lu/10.1145/3542946
W. Bonnaventure, A. Khanfir, A. Bartel, M. Papadakis, and Y. Le Traon, "Confuzzion: A java virtual machine fuzzer for type confusion vulnerabilities, " in IEEE 21st International Conference on Software Quality, Reliability and Security (QRS), 12 2021, pp. 586-597.
C. Calcagno, D. Distefano, J. Dubreil, D. Gabi, P. Hooimeijer, M. Luca, P. O'Hearn, I. Papakonstantinou, J. Purbrick, and D. Rodriguez, "Moving fast with software verification, " in NASA Formal Methods, K. Havelund, G. Holzmann, and R. Joshi, Eds. Cham: Springer International Publishing, 2015, pp. 3-11.
C. K. Behera and D. L. Bhaskari, "Different obfuscation techniques for code protection, " Procedia Computer Science, vol. 70, pp. 757-763, 2015, proceedings of the 4th International Conference on Ecofriendly Computing and Communication Systems. [Online]. Available: Https://www.sciencedirect.com/science/article/pii/S1877050915032780
S. Designs, "Java source code obfuscator, " http://www.semdesigns.com/products/obfuscators/JavaObfuscator.html, 5 2022.
P. Black, "Juliet 1.3 test suite: Changes from 1.2, " 06 2018.
M. Corporation, "The common weakness enumeration (cwe) initiative, " http://cwe.mitre.org/, 5 2022.
K. Goseva-Popstojanova and A. Perhinschi, "On the capability of static code analysis to detect security vulnerabilities, " Information and Software Technology, vol. 68, pp. 18-33, 2015.
S. M. Ghaffarian and H. R. Shahriari, "Software vulnerability analysis and discovery using machine-learning and data-mining techniques: A survey, " ACM Computing Surveys (CSUR), vol. 50, no. 4, pp. 1-36, 2017.
M. Tufano, C. Watson, G. Bavota, M. D. Penta, M. White, and D. Poshyvanyk, "An empirical study on learning bug-fixing patches in the wild via neural machine translation, " ACM Trans. Softw. Eng. Methodol., vol. 28, no. 4, pp. 19:1-19:29, 2019. [Online]. Available: Https://doi.org/10.1145/3340544
A. Dann, B. Hermann, and E. Bodden, "Sootdiff: Bytecode comparison across different java compilers, " in Proceedings of the 8th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis, ser. SOAP 2019. New York, NY, USA: Association for Computing Machinery, 2019, p. 14-19. [Online]. Available: Https://doi.org/10.1145/3315568.3329966
V. I. Levenshtein et al., "Binary codes capable of correcting deletions, insertions, and reversals, " in Soviet physics doklady, vol. 10, no. 8. Soviet Union, 1966, pp. 707-710.
S. Hochreiter and J. Schmidhuber, "Long short-term memory, " Neural computation, vol. 9, pp. 1735-80, 12 1997.
M. Tufano, C. Watson, G. Bavota, M. Di Penta, M. White, and D. Poshyvanyk, "Learning how to mutate source code from bug-fixes, " in 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME), 2019, pp. 301-312.
R. Gupta, S. Pal, A. Kanade, and S. Shevade, "Deepfix: Fixing common c language errors by deep learning, " in Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, ser. AAAI'17. AAAI Press, 2017, p. 1345-1351.
M. Tufano, C. Watson, G. Bavota, M. D. Penta, M. White, and D. Poshyvanyk, "An empirical study on learning bug-fixing patches in the wild via neural machine translation, " ACM Trans. Softw. Eng. Methodol., vol. 28, no. 4, sep 2019. [Online]. Available: Https://doi.org/10.1145/3340544
J. Fonseca and M. Vieira, "Mapping software faults with web security vulnerabilities, " in 2008 IEEE international conference on dependable systems and networks With FTCS and DCC (DSN). IEEE, 2008, pp. 257-266.
H. Coles, T. Laurent, C. Henard, M. Papadakis, and A. Ventresque, "Pit: A practical mutation testing tool for java, " in Proceedings of the 25th international symposium on software testing and analysis, 2016, pp. 449-452.
J. Pewny and T. Holz, "Evilcoder: Automated bug insertion, " CoRR, vol. abs/2007.02326, 2020. [Online]. Available: Https://arxiv.org/abs/2007.02326
Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang et al., "Codebert: A pre-trained model for programming and natural languages, " arXiv preprint arXiv:2002.08155, 2020.
R. Natella, D. Cotroneo, J. A. Duraes, and H. S. Madeira, "On fault representativeness of software fault injection, " IEEE Transactions on Software Engineering, vol. 39, no. 1, pp. 80-96, 2012.
M. Ojdanic, A. Garg, A. Khanfir, R. Degiovanni, M. Papadakis, and Y. L. Traon, "Syntactic vs. semantic similarity of artificial and real faults in mutation testing studies, " arXiv preprint arXiv:2112.14508, 2021.
R. Degiovanni and M. Papadakis, "μbert: Mutation testing using pre-trained language models, " 2022. [Online]. Available: Https://arxiv.org/abs/2203.03289
A. Khanfir, M. Jimenez, M. Papadakis, and Y. L. Traon, "Codebert-nt: Code naturalness via codebert, " arXiv preprint arXiv:2208.06042, 2022.
S. Neuhaus, T. Zimmermann, C. Holler, and A. Zeller, "Predicting vulnerable software components, " in Proceedings of the 14th ACM Conference on Computer and Communications Security, ser. CCS '07. New York, NY, USA: Association for Computing Machinery, 2007, p. 529-540. [Online]. Available: Https://doi.org/10.1145/1315245.1315311
Y. Shin and L. Williams, "Can traditional fault prediction models be used for vulnerability prediction?" Empirical Software Engineering, vol. 18, no. 1, pp. 25-59, Feb. 2013.
Y. Shin, A. Meneely, L. Williams, and J. A. Osborne, "Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities, " IEEE Trans. Softw. Eng., vol. 37, no. 6, p. 772-787, Nov. 2011. [Online]. Available: Https://doi.org/10.1109/TSE. 2010.81
I. Chowdhury and M. Zulkernine, "Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities, " J. Syst. Archit., vol. 57, no. 3, p. 294-313, Mar. 2011. [Online]. Available: Https://doi.org/10.1016/j.sysarc.2010.06.003
R. Scandariato, J. Walden, A. Hovsepyan, and W. Joosen, "Predicting vulnerable software components via text mining, " IEEE Transactions on Software Engineering, vol. 40, no. 10, pp. 993-1006, 2014.
Z. Li, D. Zou, S. Xu, X. Ou, H. Jin, S. Wang, Z. Deng, and Y. Zhong, "Vuldeepecker: A deep learning-based system for vulnerability detection, " in 25th Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California, USA, February 18-21, 2018, 2018. [Online]. Available: Http://wp.internetsociety.org/ndss/wp-content/uploads/sites/25/2018/02/ndss2018 03A-2 Li paper.pdf