Article (Périodiques scientifiques)
An Empirical Study on Data Distribution-Aware Test Selection for Deep Learning Enhancement
HU, Qiang; GUO, Yuejun; CORDY, Maxime et al.
2022In ACM Transactions on Software Engineering and Methodology
Peer reviewed vérifié par ORBi
 

Documents


Texte intégral
TOSEM_DAT.pdf
Preprint Auteur (1.81 MB)
Télécharger

Tous les documents dans ORBilu sont protégés par une licence d'utilisation.

Envoyer vers



Détails



Mots-clés :
deep learning testing; test selection; data distribution
Résumé :
[en] Similar to traditional software that is constantly under evolution, deep neural networks (DNNs) need to evolve upon the rapid growth of test data for continuous enhancement, e.g., adapting to distribution shift in a new environment for deployment. However, it is labor-intensive to manually label all the collected test data. Test selection solves this problem by strategically choosing a small set to label. Via retraining with the selected set, DNNs will achieve competitive accuracy. Unfortunately, existing selection metrics involve three main limitations: 1) using different retraining processes; 2) ignoring data distribution shifts; 3) being insufficiently evaluated. To fill this gap, we first conduct a systemically empirical study to reveal the impact of the retraining process and data distribution on model enhancement. Then based on our findings, we propose a novel distribution-aware test (DAT) selection metric. Experimental results reveal that retraining using both the training and selected data outperforms using only the selected data. None of the selection metrics perform the best under various data distributions. By contrast, DAT effectively alleviates the impact of distribution shifts and outperforms the compared metrics by up to 5 times and 30.09% accuracy improvement for model enhancement on simulated and in-the-wild distribution shift scenarios, respectively.
Disciplines :
Sciences informatiques
Auteur, co-auteur :
HU, Qiang ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
GUO, Yuejun ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
CORDY, Maxime  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
Xie, Xiaofei;  Singapore Management University
Ma, Lei;  University of Alberta
PAPADAKIS, Mike ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > Computer Science and Communications Research Unit (CSC)
LE TRAON, Yves ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
An Empirical Study on Data Distribution-Aware Test Selection for Deep Learning Enhancement
Date de publication/diffusion :
2022
Titre du périodique :
ACM Transactions on Software Engineering and Methodology
ISSN :
1049-331X
Peer reviewed :
Peer reviewed vérifié par ORBi
Projet FnR :
FNR12669767 - Testing Self-learning Systems, 2018 (01/09/2019-31/08/2022) - Yves Le Traon
Intitulé du projet de recherche :
CORE project C18/IS/12669767/STELLAR/LeTraon
Organisme subsidiant :
FNR - Fonds National de la Recherche
Disponible sur ORBilu :
depuis le 12 février 2022

Statistiques


Nombre de vues
652 (dont 125 Unilu)
Nombre de téléchargements
381 (dont 36 Unilu)

citations Scopus®
 
37
citations Scopus®
sans auto-citations
26
OpenCitations
 
0
citations OpenAlex
 
39
citations WoS
 
33

Bibliographie


Publications similaires



Contacter ORBilu