Learning from what we know: How to perform vulnerability prediction using noisy historical data

GARG, Aayush; DEGIOVANNI, Renzo Gaston; JIMENEZ, Matthieu; CORDY, Maxime; PAPADAKIS, Mike; LE TRAON, Yves

doi:10.1007/s10664-022-10197-4

Download

Article (Scientific journals)

Learning from what we know: How to perform vulnerability prediction using noisy historical data

GARG, Aayush; DEGIOVANNI, Renzo Gaston; JIMENEZ, Matthieu et al.

2022 • In Empirical Software Engineering

Peer Reviewed verified by ORBi

Permalink
https://hdl.handle.net/10993/45529

DOI
10.1007/s10664-022-10197-4

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

s10664-022-10197-4.pdf

Publisher postprint (1.92 MB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Disciplines :

Computer science

Author, co-author :

GARG, Aayush ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)

DEGIOVANNI, Renzo Gaston ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal

JIMENEZ, Matthieu ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)

CORDY, Maxime ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal

PAPADAKIS, Mike ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)

LE TRAON, Yves ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal

External co-authors :

Language :

English

Title :

Learning from what we know: How to perform vulnerability prediction using noisy historical data

Publication date :

20 September 2022

Journal title :

Empirical Software Engineering

ISSN :

1573-7616

Publisher :

Springer, Netherlands

Peer reviewed :

Peer Reviewed verified by ORBi

Focus Area :

Security, Reliability and Trust

Additional URL :

https://github.com/garghub/TROVON

Available on ORBilu :

since 15 January 2021

Statistics

Number of views

243 (48 by Unilu)

Number of downloads

58 (6 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

WoS citations^™

Bibliography

Abadi M, et al. (2015) TensorFlow: large-scale machine learning on heterogeneous systems. Software available from tensorflow.org
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate
Britz D, Goldie A, Luong T, Le Q (2017) Massive exploration of neural machine translation architectures. arXiv e-prints
Brownlee J (2021) When to use mlp, cnn, and rnn neural networks. https://machinelearningmastery.com/when-to-use-mlp-cnn-and-rnn-neural-networks. Accessed 1 May 2018
Brownlee J (2022) Encoder-decoder recurrent neural network models for neural machine translation. https://machinelearningmastery.com/encoder-decoder-recurrent-neural-network-models-neural-machine-translation/. Accessed 1 Feb 2018
Chowdhury I, Zulkernine M (2011) Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities. J Syst Archit 57(3):294–313 DOI: 10.1016/j.sysarc.2010.06.003
Collard ML, Maletic JI (2016) srcml 1.0: explore, analyze, and manipulate source code. In: 2016 IEEE International conference on software maintenance and evolution (ICSME), pp 649–649
Dam HK, Tran T, Pham T T M, Ng SW, Grundy J, Ghose A (2018) Automatic feature learning for predicting vulnerable software components. IEEE Trans Softw Eng 1–1
D’Ambros M, Lanza M, Robbes R (2012) Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empir Softw Eng 17(4–5):531–577 DOI: 10.1007/s10664-011-9173-9
Definition of vulnerability (2021) https://cve.mitre.org/about/terminology.html. Accessed 1 May 2021
Falleri J-R, Morandat F, Blanc X, Martinez M, Monperrus M (2018) Fine-grained and accurate source code differencing. In: Proceedings of the international conference on automated software engineering. Update for oadoi on Nov 02 2018, Västeras, pp 313–324
Garg A, Ojdanic M, Degiovanni R, Chekam TT, Papadakis M, Le Traon Y (2022) Cerebro: static subsuming mutant selection. IEEE Trans Softw Eng 1–1
Gu X, Zhang H, Zhang D, Kim S (2016) Deep api learning. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering, FSE 2016. Association for Computing Machinery, New York, pp 631–642
Gupta R, Pal S, Kanade A, Shevade S (2017) Deepfix: fixing common c language errors by deep learning. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, AAAI’17. AAAI Press, pp 1345–1351
Hall T, Beecham S, Bowes D, Gray D, Counsell S (2012) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276–1304 DOI: 10.1109/TSE.2011.103
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780 DOI: 10.1162/neco.1997.9.8.1735
Huo X, Li M, Zhou Z-H (2016) Learning unified features from natural and programming languages for locating buggy source code. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence, IJCAI’16. AAAI Press, pp 1606–1612
Jimenez M, Papadakis M, Le Traon Y (2016) An empirical analysis of vulnerabilities in openssl and the linux kernel. In: 2016 23rd Asia-pacific software engineering conference (APSEC). IEEE, pp 105–112
Jimenez M, Papadakis M, Le Traon Y (2018) Enabling the continous analysis of security vulnerabilities with vuldata7. In: Proceedings of the 18th IEEE international working conference on source code analysis and manipulation SCAM 2018, Madrid, Spain, September 23–24, 2018
Jimenez M, Rwemalika R, Papadakis M, Sarro F, Le Traon Y, Harman M (2019) The importance of accounting for real-world labelling when predicting software vulnerabilities. In: Proceedings of the 2019 27th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, ESEC/FSE 2019. Association for Computing Machinery, New York, pp 695–705
Kononenko I (1995) On biases in estimating multi-valued attributes. In: Proceedings of the 14th international joint conference on artificial intelligence, vol 2, IJCAI’95. Morgan Kaufmann Publishers Inc, San Francisco, pp 1034–1040
Li Z, Zou D, Xu S, Ou X, Jin H, Wang S, Deng Z, Zhong Y (2018) Vuldeepecker: a deep learning-based system for vulnerability detection. In: 25th Annual network and distributed system security symposium, NDSS 2018, San Diego, California, USA, February 18–21, 2018
Linux in 2020 (2020) 27.8 million lines of code in the kernel. https://www.linux.com/news/linux-in-2020-27-8-million-lines-of-code-in-the-kernel-1-3-million-in-systemd/. Accessed 1 May 2021
Linux kernal (2021) https://www.kernel.org. Accessed 1 May 2021
Matthews B W (1975) Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et Biophysica Acta (BBA)—Protein Structure 405(2):442–451 DOI: 10.1016/0005-2795(75)90109-9
Morrison P, Herzig K, Murphy B, Williams L (2015) Challenges with applying vulnerability prediction models. In: Proceedings of the 2015 symposium and bootcamp on the science of security, HotSoS ’15. Association for Computing Machinery, New York
Moshtari S, Sami A (2016) Evaluating and comparing complexity, coupling and a new proposed set of coupling metrics in cross-project vulnerability prediction. In: Ossowski S (ed) Proceedings of the 31st annual ACM symposium on applied computing, Pisa, Italy, April 4–8, 2016. ACM, pp 1415–1421
National vulnerability database (2021) https://nvd.nist.gov. Accessed 1 May 2021
Neuhaus S, Zimmermann T, Holler C, Zeller A (2007) Predicting vulnerable software components. In: Proceedings of the 14th ACM conference on computer and communications security, CCS ’07. Association for Computing Machinery, New York, pp 529–540
Openssl (2021) https://www.openssl.org. Accessed 1 May 2021
Potter B, McGraw G (2004) Software security testing. IEEE Security Privacy 2(5):81–85 DOI: 10.1109/MSP.2004.84
Scandariato R, Walden J, Hovsepyan A, Joosen W (2014) Predicting vulnerable software components via text mining. IEEE Trans Softw Eng 40(10):993–1006 DOI: 10.1109/TSE.2014.2340398
Shepperd M, Bowes D, Hall T (2014) Researcher bias: the use of machine learning in software defect prediction. IEEE Trans Softw Eng 40(6):603–616 DOI: 10.1109/TSE.2014.2322358
Shewalkar A, Nyavanandi D, Ludwig S (2019) Performance evaluation of deep neural networks applied to speech recognition Rnn, lstm and gru. J Artif Intell Soft Comput Res 9:235–245 DOI: 10.2478/jaiscr-2019-0006
Shin Y, Williams L (2008) An empirical model to predict security vulnerabilities using code complexity metrics. In: Proceedings of the second ACM-IEEE international symposium on empirical software engineering and measurement, ESEM ’08. Association for Computing Machinery, New York, pp 315–317
Shin Y, Williams L (2013) Can traditional fault prediction models be used for vulnerability prediction? Empir Softw Eng 18(1):25–59 DOI: 10.1007/s10664-011-9190-8
Shin Y, Meneely A, Williams L, Osborne JA (2011) Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Trans Softw Eng 37(6):772–787 DOI: 10.1109/TSE.2010.81
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks
Tang Y, Zhao F, Yang Y, Lu H, Zhou Y, Xu B (2015) Predicting vulnerable components via text mining or software metrics? an effort-aware perspective. In: QRS. IEEE, pp 27–36
The heartbleed bug (2021) https://heartbleed.com/. Accessed 1 May 2021
Theisen C, Williams LA (2020) Better together: comparing vulnerability prediction models. Inf Softw Technol 119
Tufano M, Watson C, Bavota G, Di Penta M, White M, Poshyvanyk D (2019a) Learning how to mutate source code from bug-fixes. In: 2019 IEEE International conference on software maintenance and evolution (ICSME)
Tufano M, Watson C, Bavota G, Di Penta M, White M, Poshyvanyk D (2019b) An empirical study on learning bug-fixing patches in the wild via neural machine translation. ACM Trans Softw Eng Methodol 28(4):19:1–19:29 DOI: 10.1145/3340544
Vargha A, Delaney HD (2000) A critique and improvement of the “cl” common language effect size statistics of Mcgraw and Wong. J Educ Behav Stat 25 (2):101–132
Vulnerabilities (2021) https://owasp.org/www-community/vulnerabilities/. Accessed 1 May 2021
Wang S, Liu T, Tan L (2016) Automatically learning semantic features for defect prediction. In: Proceedings of the 38th international conference on software engineering, ICSE ’16. Association for Computing Machinery, New York, pp 297–308
White M, Tufano M, Vendome C, Poshyvanyk D (2016) Deep learning code fragments for code clone detection. In: 2016 31st IEEE/ACM international conference on automated software engineering (ASE), pp 87–98
Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics Bull 1(6):80–83 DOI: 10.2307/3001968
Wireshark (2021) https://www.wireshark.org. Accessed 1 May 2021
Yang X, Lo D, Xia X, Zhang Y, Sun J (2015) Deep learning for just-in-time defect prediction. In: 2015 IEEE International conference on software quality, reliability and security, pp 17–26
Zhou Y, Liu S, Siow J, Du X, Liu Y (2019) Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks
Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, ESEC/FSE ’09. Association for Computing Machinery, New York, pp 91–100