Inputs from Hell: Learning Input Distributions for Grammar-Based Test Generation

SOREMEKUN, Ezekiel; Pavese, Esteban; Havrikov, Nikolas; Grunske, Lars; Zeller, Andreas

doi:10.1109/TSE.2020.3013716

Download

Article (Scientific journals)

Inputs from Hell: Learning Input Distributions for Grammar-Based Test Generation

SOREMEKUN, Ezekiel; Pavese, Esteban; Havrikov, Nikolas et al.

2022 • In IEEE Transactions on Software Engineering, 48 (4), p. 1138 - 1153

Peer Reviewed verified by ORBi

Permalink
https://hdl.handle.net/10993/46098

DOI
10.1109/TSE.2020.3013716

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

inputs-from-hell.pdf

Author preprint (599.31 kB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

test case generation; probabilistic grammars; input samples

Abstract :

[en] Grammars can serve as producers for structured test inputs that are syntactically correct by construction. A probabilistic grammar assigns probabilities to individual productions, thus controlling the distribution of input elements. Using the grammars as input parsers, we show how to learn input distributions from input samples, allowing to create inputs that are similar to the sample; by inverting the probabilities, we can create inputs that are dissimilar to the sample. This allows for three test generation strategies: 1) “Common inputs” – by learning from common inputs, we can create inputs that are similar to the sample; this is useful for regression testing. 2) “Uncommon inputs” – learning from common inputs and inverting probabilities yields inputs that are strongly dissimilar to the sample; this is useful for completing a test suite with “inputs from hell” that test uncommon features, yet are syntactically valid. 3) “Failure-inducing inputs” – learning from inputs that caused failures in the past gives us inputs that share similar features and thus also have a high chance of triggering bugs; this is useful for testing the completeness of fixes. Our evaluation on three common input formats (JSON, JavaScript, CSS) shows the effectiveness of these approaches. Results show that “common inputs” reproduced 96% of the methods induced by the samples. In contrast, for almost all subjects (95%), the “uncommon inputs” covered significantly different methods from the samples. Learning from failure-inducing samples reproduced all exceptions (100%) triggered by the failure-inducing samples and discovered new exceptions not found in any of the samples learned from.

Disciplines :

Computer science

Author, co-author :

SOREMEKUN, Ezekiel ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal

Pavese, Esteban; Humboldt-Universit¨at zu Berlin, Berlin, Germany. > Department of Computer Science

Havrikov, Nikolas; CISPA Helmholtz Center for Information Security, Saarbrücken, Germany.

Grunske, Lars; Humboldt-Universit¨at zu Berlin, Berlin, Germany. > Department of Computer Science

Zeller, Andreas; CISPA Helmholtz Center for Information Security, Saarbrücken, Germany

External co-authors :

yes

Language :

English

Title :

Inputs from Hell: Learning Input Distributions for Grammar-Based Test Generation

Publication date :

01 April 2022

Journal title :

IEEE Transactions on Software Engineering

ISSN :

0098-5589

eISSN :

1939-3520

Publisher :

Institute of Electrical and Electronics Engineers, New-York, United States - New York

Volume :

Issue :

Pages :

1138 - 1153

Peer reviewed :

Peer Reviewed verified by ORBi

Focus Area :

Security, Reliability and Trust

Available on ORBilu :

since 05 February 2021

Statistics

Number of views

297 (21 by Unilu)

Number of downloads

282 (7 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenAlex citations

WoS citations^™