Communication publiée dans un ouvrage (Colloques, congrès, conférences scientifiques et actes)
DataPrism: Exposing Disconnect between Data and Systems
Galhotra, Sainyam; Fariha, Anna; DE PAULA LOURENCO, Raoni et al.
2022In SIGMOD 2022 - Proceedings of the 2022 International Conference on Management of Data
Peer reviewed
 

Documents


Texte intégral
3514221.3517864.pdf
Postprint Auteur (2.56 MB)
Télécharger

Tous les documents dans ORBilu sont protégés par une licence d'utilisation.

Envoyer vers



Détails



Mots-clés :
causal testing; data profiles; debugging; root-cause identification; Causal testing; Central component; Data driven; Data profiles; Debugging; Driven system; Health monitoring system; Property; Root cause; Root cause identification; Software; Information Systems
Résumé :
[en] As data is a central component of many modern systems, the cause of a system malfunction may reside in the data, and, specifically, particular properties of data. E.g., a health-monitoring system that is designed under the assumption that weight is reported in lbs will malfunction when encountering weight reported in kilograms. Like software debugging, which aims to find bugs in the source code or runtime conditions, our goal is to debug data to identify potential sources of disconnect between the assumptions about some data and systems that operate on that data. We propose DataPrism, a framework to identify data properties (profiles) that are the root causes of performance degradation or failure of a data-driven system. Such identification is necessary to repair data and resolve the disconnect between data and systems. Our technique is based on causal reasoning through interventions: when a system malfunctions for a dataset, DataPrism alters the data profiles and observes changes in the system's behavior due to the alteration. Unlike statistical observational analysis that reports mere correlations, DataPrism reports causally verified root causes-in terms of data profiles-of the system malfunction. We empirically evaluate DataPrism on seven real-world and several synthetic data-driven systems that fail on certain datasets due to a diverse set of reasons. In all cases, DataPrism identifies the root causes precisely while requiring orders of magnitude fewer interventions than prior techniques.
Disciplines :
Sciences informatiques
Auteur, co-auteur :
Galhotra, Sainyam;  University of Chicago, Chicago, United States
Fariha, Anna;  Microsoft, Seattle, United States
DE PAULA LOURENCO, Raoni  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal ; NYU - New York University [US-NY]
Freire, Juliana;  New York University, New York, United States
Meliou, Alexandra;  University of Massachusetts Amherst, Amherst, United States
Srivastava, Divesh;  At&t Chief Data Office, Bedminster, United States
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
DataPrism: Exposing Disconnect between Data and Systems
Date de publication/diffusion :
10 juin 2022
Nom de la manifestation :
Proceedings of the 2022 International Conference on Management of Data
Lieu de la manifestation :
Philladelphia, Usa
Date de la manifestation :
12-06-2022 => 17-06-2022
Titre de l'ouvrage principal :
SIGMOD 2022 - Proceedings of the 2022 International Conference on Management of Data
Maison d'édition :
Association for Computing Machinery
ISBN/EAN :
978-1-4503-9249-5
Peer reviewed :
Peer reviewed
Organisme subsidiant :
ACM SIGMOD
Disponible sur ORBilu :
depuis le 22 novembre 2023

Statistiques


Nombre de vues
53 (dont 1 Unilu)
Nombre de téléchargements
102 (dont 0 Unilu)

citations Scopus®
 
9
citations Scopus®
sans auto-citations
6
OpenCitations
 
2
citations OpenAlex
 
8

Bibliographie


Publications similaires



Contacter ORBilu