Article (Scientific journals)
Automated anomaly detection for categorical data by repurposing a form filling recommender system
Belgacem, Hichem; Li, Xiaochen; BIANCULLI, Domenico et al.
2024In Journal of Data and Information Quality, 16 (3), p. 1--28
Peer reviewed Dataset
 

Files


Full Text
hichem_laff_error_detection.pdf
Author postprint (759.75 kB) Creative Commons License - Attribution
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
data quality; data anomaly detection; categorical data; machine learning
Abstract :
[en] Data quality is crucial in modern software systems, like data-driven decision support systems. However, data quality is affected by data anomalies, which represent instances that deviate from most of the data. These anomalies affect the reliability and trustworthiness of software systems, and may propagate and cause more issues. Although many anomaly detection approaches have been proposed, they mainly focus on numerical data. Moreover, the few approaches targeting anomaly detection for categorical data do not yield consistent results across datasets. In this paper, we propose a novel anomaly detection approach for categorical data named LAFF-AD (LAFF-based Anomaly Detection), which takes advantage of the learning ability of a state-of-the-art form filling tool (LAFF) to perform value inference on suspicious data. LAFF-AD runs a variant of LAFF that predicts the possible values of a suspicious categorical field in the suspicious instance. LAFF-AD then compares the output of LAFF to the recorded values in the suspicious instance, and uses a heuristic-based strategy to detect categorical data anomalies. We evaluated LAFF-AD by assessing its effectiveness and efficiency on six datasets. Our experimental results show that LAFF-AD can accurately determine a high range of data anomalies, with recall values between 0.6 and 1 and a precision value of at least 0.808. Furthermore, LAFF-AD is efficient, taking at most 7000 s and 735 ms to perform training and prediction, respectively.
Research center :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SVV - Software Verification and Validation
Disciplines :
Computer science
Author, co-author :
Belgacem, Hichem ;  Luxembourg Institute of Science and Technology, Esch-sur-Alzette, Luxembourg
Li, Xiaochen ;  Dalian University of Technology, Dalian, China
BIANCULLI, Domenico  ;  University of Luxembourg
Briand, Lionel ;  Lero SFI Centre for Software Research and University of Limerick, Limerick, Ireland ; University of Ottawa, Ottawa, Canada
External co-authors :
yes
Language :
English
Title :
Automated anomaly detection for categorical data by repurposing a form filling recommender system
Publication date :
04 October 2024
Journal title :
Journal of Data and Information Quality
ISSN :
1936-1955
Publisher :
Association for Computing Machinery (ACM)
Volume :
16
Issue :
3
Pages :
1--28
Peer reviewed :
Peer reviewed
Focus Area :
Security, Reliability and Trust
FnR Project :
FNR17373407 - Automated Log Smell Detection And Removal, 2022 (01/09/2023-31/08/2026) - Domenico Bianculli
Name of the research project :
U-AGR-7285 - C22/IS/17373407/LOGODOR - BIANCULLI Domenico
Funders :
FNR - Fonds National de la Recherche
NSERC - Natural Sciences and Engineering Research Council
SFI - Science Foundation Ireland
Funding number :
C22/IS/17373407/LOGODOR
Funding text :
This research was funded in whole, or in part, by the Luxembourg National Research Fund (FNR), grant reference C22/IS/17373407/LOGODOR. Lionel Briand was in part supported by the Canada Research Chair and Discovery Grant programs of the Natural Sciences and Engineering Research Council of Canada (NSERC), and the Science Foundation Ireland grant 13/RC/2094-2. For the purpose of open access, and in fulfillment of the obligations arising from the grant agreement, the authors have applied a Creative Commons Attribution 4.0 International (CC BY 4.0) license to any Author Accepted Manuscript version arising from this submission.
Available on ORBilu :
since 23 September 2024

Statistics


Number of views
122 (11 by Unilu)
Number of downloads
53 (3 by Unilu)

Scopus citations®
 
0
Scopus citations®
without self-citations
0
OpenCitations
 
0
OpenAlex citations
 
0

Bibliography


Similar publications



Contact ORBilu