Web Application Vulnerability Prediction using Hybrid Program Analysis and Machine Learning

SHAR, Lwin Khin; BRIAND, Lionel; Tan, Hee Beng Kuan

doi:10.1109/TDSC.2014.2373377

Request a copy

Article (Scientific journals)

Web Application Vulnerability Prediction using Hybrid Program Analysis and Machine Learning

SHAR, Lwin Khin; BRIAND, Lionel; Tan, Hee Beng Kuan

2015 • In IEEE Transactions on Dependable and Secure Computing, 12 (6), p. 688-707

Peer reviewed

Permalink
https://hdl.handle.net/10993/18549

DOI
10.1109/TDSC.2014.2373377

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

Web Application Vulnerability Prediction using Hybrid Program Analysis and Machine Learning.pdf

Author preprint (33.91 MB)

Request a copy

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Vulnerability prediction; security measures; input validation and sanitization

Abstract :

[en] Due to limited time and resources, web software engineers need support in identifying vulnerable code. A practical approach to predicting vulnerable code would enable them to prioritize security auditing efforts. In this paper, we propose using a set of hybrid (static+dynamic) code attributes that characterize input validation and input sanitization code patterns and are expected to be significant indicators of web application vulnerabilities. Because static and dynamic program analyses complement each other, both techniques are used to extract the proposed attributes in an accurate and scalable way. Current vulnerability prediction techniques rely on the availability of data labeled with vulnerability information for training. For many real world applications, past vulnerability data is often not available or at least not complete. Hence, to address both situations where labeled past data is fully available or not, we apply both supervised and semi-supervised learning when building vulnerability predictors based on hybrid code attributes. Given that semi-supervised learning is entirely unexplored in this domain, we describe how to use this learning scheme effectively for vulnerability prediction. We performed empirical case studies on seven open source projects where we built and evaluated supervised and semi-supervised models. When cross validated with fully available labeled data, the supervised models achieve an average of 77% recall and 5% probability of false alarm for predicting SQL injection, cross site scripting, remote code execution and file inclusion vulnerabilities. With a low amount of labeled data, when compared to the supervised model, the semi- supervised model showed an average improvement of 24% higher recall and 3% lower probability of false alarm, thus suggesting semi-supervised learning may be a preferable solution for many real world applications where vulnerability data is missing.

Disciplines :

Computer science

Author, co-author :

SHAR, Lwin Khin ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

BRIAND, Lionel ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) ; University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)

Tan, Hee Beng Kuan; Nanyang Technological University > School of Electrical and Electronic Engineering

External co-authors :

yes

Language :

English

Title :

Web Application Vulnerability Prediction using Hybrid Program Analysis and Machine Learning

Publication date :

2015

Journal title :

IEEE Transactions on Dependable and Secure Computing

ISSN :

1545-5971

Publisher :

IEEE

Volume :

Issue :

Pages :

688-707

Peer reviewed :

Peer reviewed

Available on ORBilu :

since 28 October 2014

Statistics

Number of views

530 (30 by Unilu)

Number of downloads

10 (4 by Unilu)

More statistics

Scopus citations^®

122

Scopus citations^®
without self-citations

122

OpenAlex citations

136

WoS citations^™