Reference : Finding Data Compatibility Bugs with JSON Subschema Checking
Scientific congresses, symposiums and conference proceedings : Paper published in a book
Engineering, computing & technology : Computer science
Security, Reliability and Trust
http://hdl.handle.net/10993/50268
Finding Data Compatibility Bugs with JSON Subschema Checking
English
Habib, Andrew mailto [University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX >]
Shinnar, Avraham mailto [IBM Research]
Hirzel, Martin mailto [IBM Research]
Pradel, Michael mailto [University of Stuttgart]
11-Jul-2021
The 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA)
620-632
Yes
No
International
The 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA)
July 11 - 17, 2021
Virtual
Denmark
[en] JSON schema ; data compatibility bugs ; subschema checking
[en] JSON is a data format used pervasively in web APIs, cloud computing, NoSQL databases, and increasingly also machine learning. To ensure that JSON data is compatible with an application, one can define a JSON schema and use a validator to check data against the schema. However, because validation can happen only once concrete data occurs during an execution, it may detect data compatibility bugs too late or not at all. Examples include evolving the schema for a web API, which may unexpectedly break client applications, or accidentally running a machine learning pipeline on incorrect data. This paper presents a novel way of detecting a class of data compatibility bugs via JSON subschema checking. Subschema checks find bugs before concrete JSON data is available and across all possible data specified by a schema. For example, one can check if evolving a schema would break API clients or if two components of a machine learning pipeline have incompatible expectations about data. Deciding whether one JSON schema is a subschema of another is non-trivial because the JSON Schema specification language is rich. Our key insight to address this challenge is to first reduce the richness of schemas by canonicalizing and simplifying them, and to then reason about the subschema question on simpler schema fragments using type-specific checkers. We apply our subschema checker to thousands of real-world schemas from different domains. In all experiments, the approach is correct whenever it gives an answer (100% precision and correctness), which is the case for most schema pairs (93.5% recall), clearly outperforming the state-of-the-art tool. Moreover, the approach reveals 43 previously unknown bugs in popular software, most of which have already been fixed, showing that JSON subschema checking helps finding data compatibility bugs early.
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > Trustworthy Software Engineering (TruX)
Researchers
http://hdl.handle.net/10993/50268
10.1145/3460319.3464796
https://dl.acm.org/doi/10.1145/3460319.3464796
H2020 ; 949014 - NATURAL - Natural Program Repair

File(s) associated to this reference

Fulltext file(s):

FileCommentaryVersionSizeAccess
Open access
JSONSubschema_issta21_paper.pdfPublisher postprint473.45 kBView/Open

Bookmark and Share SFX Query

All documents in ORBilu are protected by a user license.