IntJect: Vulnerability Intent Bug Seeding

PETIT, Benjamin; KHANFIR, Ahmed; SOREMEKUN, Ezekiel; Perrouin, Gilles; Papadakis, Michail

Download

Paper published in a book (Scientific congresses, symposiums and conference proceedings)

IntJect: Vulnerability Intent Bug Seeding

PETIT, Benjamin; KHANFIR, Ahmed; SOREMEKUN, Ezekiel et al.

2022 • In 22nd IEEE International Conference on Software Quality, Reliability, and Security

Peer reviewed

Permalink
https://hdl.handle.net/10993/53858

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

Vulnerability_Injection_QRS_2022_3.pdf

Publisher postprint (608.84 kB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Software Vulnerabilities; Vulnerability injection; Software Security

Abstract :

[en] Studying and exposing software vulnerabilities is important to ensure software security, safety, and reliability. Software engineers often inject vulnerabilities into their programs to test the reliability of their test suites, vulnerability detectors, and security measures. However, state-of-the-art vulnerability injection methods only capture code syntax/patterns, they do not learn the intent of the vulnerability and are limited to the syntax of the original dataset. To address this challenge, we propose the first intent-based vulnerability injection method that learns both the program syntax and vulnerability intent. Our approach applies a combination of NLP methods and semantic-preserving program mutations (at the bytecode level) to inject code vulnerabilities. Given a dataset of known vulnerabilities (containing benign and vulnerable code pairs), our approach proceeds by employing semantic-preserving program mutations to transform the existing dataset to semantically similar code. Then, it learns the intent of the vulnerability via neural machine translation (Seq2Seq) models. The key insight is to employ Seq2Seq to learn the intent (context) of the vulnerable code in a manner that is agnostic of the specific program instance. We evaluate the performance of our approach using 1275 vulnerabilities belonging to five (5) CWEs from the Juliet test suite. We examine the effectiveness of our approach in producing compilable and vulnerable code. Our results show that INTJECT is effective, almost all (99%) of the code produced by our approach is vulnerable and compilable. We also demonstrate that the vulnerable programs generated by INTJECT are semantically similar to the withheld original vulnerable code. Finally, we show that our mutation-based data transformation approach outperforms its alternatives, namely data obfuscation and using the original data.

Disciplines :

Computer science

Author, co-author :

PETIT, Benjamin; University of Namur, Namur, Belgium

KHANFIR, Ahmed ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal

SOREMEKUN, Ezekiel ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal

Perrouin, Gilles; University of Namur, Namur, Belgium

Papadakis, Michail; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Life Sciences and Medicine (DLSM)

External co-authors :

yes

Language :

English

Title :

IntJect: Vulnerability Intent Bug Seeding

Publication date :

2022

Event name :

22nd IEEE International Conference on Software Quality, Reliability, and Security

Event date :

2022

Main work title :

22nd IEEE International Conference on Software Quality, Reliability, and Security

Peer reviewed :

Peer reviewed

Focus Area :

Security, Reliability and Trust

Available on ORBilu :

since 16 January 2023

Statistics

Number of views

166 (3 by Unilu)

Number of downloads

181 (2 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

Bibliography

B. Dolan-Gavitt, P. Hulin, E. Kirda, T. Leek, A. Mambretti, W. Robertson, F. Ulrich, and R. Whelan, "Lava: Large-scale automated vulnerability addition, " in 2016 IEEE Symposium on Security and Privacy (SP), 2016, pp. 110-121.
J. Voas and G. McGraw, Software Fault Injection: Inoculating Programs Against Errors. John Wiley & Sons, 1997.
M. Papadakis, C. Henard, M. Harman, Y. Jia, and Y. L. Traon, "Threats to the validity of mutation-based test assessment, " in Proceedings of the 25th International Symposium on Software Testing and Analysis, ISSTA 2016, Saarbrücken, Germany, July 18-20, 2016, A. Zeller and A. Roychoudhury, Eds. ACM, 2016, pp. 354-365. [Online]. Available: Https://doi.org/10.1145/2931037.2931040
J. Fonseca, M. Vieira, and H. Madeira, "Vulnerability & attack injection for web applications, " in 2009 IEEE/IFIP International Conference on Dependable Systems & Networks. IEEE, 2009, pp. 93-102.
M. Papadakis, M. Kintis, J. Zhang, Y. Jia, Y. L. Traon, and M. Harman, "Chapter six -mutation testing advances: An analysis and survey, " Adv. Comput., vol. 112, pp. 275-378, 2019. [Online]. Available: Https://doi.org/10.1016/bs.adcom.2018.03.015
M. Kintis, M. Papadakis, Y. Jia, N. Malevris, Y. L. Traon, and M. Harman, "Detecting trivial mutant equivalences via compiler optimisations, " IEEE Trans. Software Eng., vol. 44, no. 4, pp. 308-333, 2018. [Online]. Available: Https://doi.org/10.1109/TSE.2017.2684805
T. Loise, X. Devroey, G. Perrouin, M. Papadakis, and P. Heymans, "Towards security-aware mutation testing, " in 2017 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), 2017, pp. 97-102.
L. Liu, Z. Li, Y. Wen, and P. Chen, "Investigating the impact of vulnerability datasets on deep learning-based vulnerability detectors, " PeerJ Computer Science, vol. 8, p. e975, 2022.
Z. Chen, S. Kommrusch, M. Tufano, L.-N. Pouchet, D. Poshyvanyk, and M. Monperrus, "Sequencer: Sequence-to-sequence learning for endto-end program repair, " IEEE Transactions on Software Engineering, vol. 47, no. 9, pp. 1943-1959, 2021.
T. Boland and P. E. Black, "Juliet 1.1 C/C++ and java test suite, " Computer, vol. 45, no. 10, pp. 88-90, 2012. [Online]. Available: Https://doi.org/10.1109/MC.2012.345
D. Hovemeyer andW. Pugh, "Finding bugs is easy, " Acm sigplan notices, vol. 39, no. 12, pp. 92-106, 2004.
V. Kashyap, J. Ruchti, L. Kot, E. Turetsky, R. Swords, S. A. Pan, J. Henry, D. Melski, and E. Schulte, "Automated customized bugbenchmark generation, " in 2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM), 2019, pp. 103-114.
A. Khanfir, A. Koyuncu, M. Papadakis, M. Cordy, T. F. Bissyandé, J. Klein, and Y. Le Traon, "Ibir: Bug report driven fault injection, " ACM Trans. Softw. Eng. Methodol., may 2022, just Accepted. [Online]. Available: Https://doi-org.proxy.bnl.lu/10.1145/3542946
W. Bonnaventure, A. Khanfir, A. Bartel, M. Papadakis, and Y. Le Traon, "Confuzzion: A java virtual machine fuzzer for type confusion vulnerabilities, " in IEEE 21st International Conference on Software Quality, Reliability and Security (QRS), 12 2021, pp. 586-597.
C. Calcagno, D. Distefano, J. Dubreil, D. Gabi, P. Hooimeijer, M. Luca, P. O'Hearn, I. Papakonstantinou, J. Purbrick, and D. Rodriguez, "Moving fast with software verification, " in NASA Formal Methods, K. Havelund, G. Holzmann, and R. Joshi, Eds. Cham: Springer International Publishing, 2015, pp. 3-11.
C. K. Behera and D. L. Bhaskari, "Different obfuscation techniques for code protection, " Procedia Computer Science, vol. 70, pp. 757-763, 2015, proceedings of the 4th International Conference on Ecofriendly Computing and Communication Systems. [Online]. Available: Https://www.sciencedirect.com/science/article/pii/S1877050915032780
S. Designs, "Java source code obfuscator, " http://www.semdesigns.com/products/obfuscators/JavaObfuscator.html, 5 2022.
OpenNMT, "An open source neural machine translation system, " https://opennmt.net/, 5 2022.
P. Black, "Juliet 1.3 test suite: Changes from 1.2, " 06 2018.
M. Corporation, "The common weakness enumeration (cwe) initiative, " http://cwe.mitre.org/, 5 2022.
K. Goseva-Popstojanova and A. Perhinschi, "On the capability of static code analysis to detect security vulnerabilities, " Information and Software Technology, vol. 68, pp. 18-33, 2015.
S. M. Ghaffarian and H. R. Shahriari, "Software vulnerability analysis and discovery using machine-learning and data-mining techniques: A survey, " ACM Computing Surveys (CSUR), vol. 50, no. 4, pp. 1-36, 2017.
M. Tufano, C. Watson, G. Bavota, M. D. Penta, M. White, and D. Poshyvanyk, "An empirical study on learning bug-fixing patches in the wild via neural machine translation, " ACM Trans. Softw. Eng. Methodol., vol. 28, no. 4, pp. 19:1-19:29, 2019. [Online]. Available: Https://doi.org/10.1145/3340544
A. Dann, B. Hermann, and E. Bodden, "Sootdiff: Bytecode comparison across different java compilers, " in Proceedings of the 8th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis, ser. SOAP 2019. New York, NY, USA: Association for Computing Machinery, 2019, p. 14-19. [Online]. Available: Https://doi.org/10.1145/3315568.3329966
V. I. Levenshtein et al., "Binary codes capable of correcting deletions, insertions, and reversals, " in Soviet physics doklady, vol. 10, no. 8. Soviet Union, 1966, pp. 707-710.
S. Hochreiter and J. Schmidhuber, "Long short-term memory, " Neural computation, vol. 9, pp. 1735-80, 12 1997.
M. Tufano, C. Watson, G. Bavota, M. Di Penta, M. White, and D. Poshyvanyk, "Learning how to mutate source code from bug-fixes, " in 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME), 2019, pp. 301-312.
R. Gupta, S. Pal, A. Kanade, and S. Shevade, "Deepfix: Fixing common c language errors by deep learning, " in Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, ser. AAAI'17. AAAI Press, 2017, p. 1345-1351.
M. Tufano, C. Watson, G. Bavota, M. D. Penta, M. White, and D. Poshyvanyk, "An empirical study on learning bug-fixing patches in the wild via neural machine translation, " ACM Trans. Softw. Eng. Methodol., vol. 28, no. 4, sep 2019. [Online]. Available: Https://doi.org/10.1145/3340544
J. Fonseca and M. Vieira, "Mapping software faults with web security vulnerabilities, " in 2008 IEEE international conference on dependable systems and networks With FTCS and DCC (DSN). IEEE, 2008, pp. 257-266.
H. Coles, T. Laurent, C. Henard, M. Papadakis, and A. Ventresque, "Pit: A practical mutation testing tool for java, " in Proceedings of the 25th international symposium on software testing and analysis, 2016, pp. 449-452.
J. Pewny and T. Holz, "Evilcoder: Automated bug insertion, " CoRR, vol. abs/2007.02326, 2020. [Online]. Available: Https://arxiv.org/abs/2007.02326
Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang et al., "Codebert: A pre-trained model for programming and natural languages, " arXiv preprint arXiv:2002.08155, 2020.
R. Natella, D. Cotroneo, J. A. Duraes, and H. S. Madeira, "On fault representativeness of software fault injection, " IEEE Transactions on Software Engineering, vol. 39, no. 1, pp. 80-96, 2012.
M. Ojdanic, A. Garg, A. Khanfir, R. Degiovanni, M. Papadakis, and Y. L. Traon, "Syntactic vs. semantic similarity of artificial and real faults in mutation testing studies, " arXiv preprint arXiv:2112.14508, 2021.
R. Degiovanni and M. Papadakis, "μbert: Mutation testing using pre-trained language models, " 2022. [Online]. Available: Https://arxiv.org/abs/2203.03289
A. Khanfir, M. Jimenez, M. Papadakis, and Y. L. Traon, "Codebert-nt: Code naturalness via codebert, " arXiv preprint arXiv:2208.06042, 2022.
S. Neuhaus, T. Zimmermann, C. Holler, and A. Zeller, "Predicting vulnerable software components, " in Proceedings of the 14th ACM Conference on Computer and Communications Security, ser. CCS '07. New York, NY, USA: Association for Computing Machinery, 2007, p. 529-540. [Online]. Available: Https://doi.org/10.1145/1315245.1315311
Y. Shin and L. Williams, "Can traditional fault prediction models be used for vulnerability prediction?" Empirical Software Engineering, vol. 18, no. 1, pp. 25-59, Feb. 2013.
Y. Shin, A. Meneely, L. Williams, and J. A. Osborne, "Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities, " IEEE Trans. Softw. Eng., vol. 37, no. 6, p. 772-787, Nov. 2011. [Online]. Available: Https://doi.org/10.1109/TSE. 2010.81
I. Chowdhury and M. Zulkernine, "Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities, " J. Syst. Archit., vol. 57, no. 3, p. 294-313, Mar. 2011. [Online]. Available: Https://doi.org/10.1016/j.sysarc.2010.06.003
R. Scandariato, J. Walden, A. Hovsepyan, and W. Joosen, "Predicting vulnerable software components via text mining, " IEEE Transactions on Software Engineering, vol. 40, no. 10, pp. 993-1006, 2014.
Z. Li, D. Zou, S. Xu, X. Ou, H. Jin, S. Wang, Z. Deng, and Y. Zhong, "Vuldeepecker: A deep learning-based system for vulnerability detection, " in 25th Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California, USA, February 18-21, 2018, 2018. [Online]. Available: Http://wp.internetsociety.org/ndss/wp-content/uploads/sites/25/2018/02/ndss2018 03A-2 Li paper.pdf