Finding Data Compatibility Bugs with JSON Subschema Checking

HABIB, Andrew; Shinnar, Avraham; Hirzel, Martin; Pradel, Michael

doi:10.1145/3460319.3464796

Download

Paper published in a book (Scientific congresses, symposiums and conference proceedings)

Finding Data Compatibility Bugs with JSON Subschema Checking

HABIB, Andrew; Shinnar, Avraham; Hirzel, Martin et al.

2021 • In The 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA)

Peer reviewed

Permalink
https://hdl.handle.net/10993/50268

DOI
10.1145/3460319.3464796

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

JSONSubschema_issta21_paper.pdf

Publisher postprint (484.81 kB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

JSON schema; data compatibility bugs; subschema checking

Abstract :

[en] JSON is a data format used pervasively in web APIs, cloud computing, NoSQL databases, and increasingly also machine learning. To ensure that JSON data is compatible with an application, one can define a JSON schema and use a validator to check data against the schema. However, because validation can happen only once concrete data occurs during an execution, it may detect data compatibility bugs too late or not at all. Examples include evolving the schema for a web API, which may unexpectedly break client applications, or accidentally running a machine learning pipeline on incorrect data. This paper presents a novel way of detecting a class of data compatibility bugs via JSON subschema checking. Subschema checks find bugs before concrete JSON data is available and across all possible data specified by a schema. For example, one can check if evolving a schema would break API clients or if two components of a machine learning pipeline have incompatible expectations about data. Deciding whether one JSON schema is a subschema of another is non-trivial because the JSON Schema specification language is rich. Our key insight to address this challenge is to first reduce the richness of schemas by canonicalizing and simplifying them, and to then reason about the subschema question on simpler schema fragments using type-specific checkers. We apply our subschema checker to thousands of real-world schemas from different domains. In all experiments, the approach is correct whenever it gives an answer (100% precision and correctness), which is the case for most schema pairs (93.5% recall), clearly outperforming the state-of-the-art tool. Moreover, the approach reveals 43 previously unknown bugs in popular software, most of which have already been fixed, showing that JSON subschema checking helps finding data compatibility bugs early.

Research center :

Interdisciplinary Centre for Security, Reliability and Trust (SnT) > Trustworthy Software Engineering (TruX)

Disciplines :

Computer science

Author, co-author :

HABIB, Andrew ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX

Shinnar, Avraham; IBM Research

Hirzel, Martin; IBM Research

Pradel, Michael; University of Stuttgart

External co-authors :

yes

Language :

English

Title :

Finding Data Compatibility Bugs with JSON Subschema Checking

Publication date :

11 July 2021

Event name :

The 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA)

Event place :

Virtual, Denmark

Event date :

July 11 - 17, 2021

Audience :

International

Main work title :

The 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA)

Pages :

620-632

Peer reviewed :

Peer reviewed

Focus Area :

Security, Reliability and Trust

Additional URL :

https://dl.acm.org/doi/10.1145/3460319.3464796

European Projects :

H2020 - 949014 - NATURAL - Natural Program Repair

Funders :

CE - Commission Européenne

Available on ORBilu :

since 13 February 2022

Statistics

Number of views

284 (10 by Unilu)

Number of downloads

93 (0 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

WoS citations^™