Paper published in a book (Scientific congresses, symposiums and conference proceedings)
Finding Data Compatibility Bugs with JSON Subschema Checking
Habib, Andrew; Shinnar, Avraham; Hirzel, Martin et al.
2021In The 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA)
Peer reviewed
 

Files


Full Text
JSONSubschema_issta21_paper.pdf
Publisher postprint (484.81 kB)
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
JSON schema; data compatibility bugs; subschema checking
Abstract :
[en] JSON is a data format used pervasively in web APIs, cloud computing, NoSQL databases, and increasingly also machine learning. To ensure that JSON data is compatible with an application, one can define a JSON schema and use a validator to check data against the schema. However, because validation can happen only once concrete data occurs during an execution, it may detect data compatibility bugs too late or not at all. Examples include evolving the schema for a web API, which may unexpectedly break client applications, or accidentally running a machine learning pipeline on incorrect data. This paper presents a novel way of detecting a class of data compatibility bugs via JSON subschema checking. Subschema checks find bugs before concrete JSON data is available and across all possible data specified by a schema. For example, one can check if evolving a schema would break API clients or if two components of a machine learning pipeline have incompatible expectations about data. Deciding whether one JSON schema is a subschema of another is non-trivial because the JSON Schema specification language is rich. Our key insight to address this challenge is to first reduce the richness of schemas by canonicalizing and simplifying them, and to then reason about the subschema question on simpler schema fragments using type-specific checkers. We apply our subschema checker to thousands of real-world schemas from different domains. In all experiments, the approach is correct whenever it gives an answer (100% precision and correctness), which is the case for most schema pairs (93.5% recall), clearly outperforming the state-of-the-art tool. Moreover, the approach reveals 43 previously unknown bugs in popular software, most of which have already been fixed, showing that JSON subschema checking helps finding data compatibility bugs early.
Research center :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > Trustworthy Software Engineering (TruX)
Disciplines :
Computer science
Author, co-author :
Habib, Andrew ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX
Shinnar, Avraham;  IBM Research
Hirzel, Martin;  IBM Research
Pradel, Michael;  University of Stuttgart
External co-authors :
yes
Language :
English
Title :
Finding Data Compatibility Bugs with JSON Subschema Checking
Publication date :
11 July 2021
Event name :
The 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA)
Event place :
Virtual, Denmark
Event date :
July 11 - 17, 2021
Audience :
International
Main work title :
The 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA)
Pages :
620-632
Peer reviewed :
Peer reviewed
Focus Area :
Security, Reliability and Trust
European Projects :
H2020 - 949014 - NATURAL - Natural Program Repair
Funders :
CE - Commission Européenne [BE]
Available on ORBilu :
since 13 February 2022

Statistics


Number of views
141 (10 by Unilu)
Number of downloads
20 (0 by Unilu)

Scopus citations®
 
12
Scopus citations®
without self-citations
9
OpenCitations
 
4
WoS citations
 
6

Bibliography


Similar publications



Contact ORBilu