Paper published in a journal (Scientific congresses, symposiums and conference proceedings)
OPTWIN: Drift Identification with Optimal Sub-Windows
DALLE LUCCA TOSI, Mauro; THEOBALD, Martin
2024In 2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)
Peer reviewed
 

Files


Full Text
Optwin_extended.pdf
Author postprint (1.55 MB)
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
concept drift; drift detection; data streams
Abstract :
[en] Online Learning (OL) is a subfield of Machine Learning (ML) that is increasingly gaining attention in academia and industry. A long-standing challenge in OL is the presence of concept drifts, which are commonly defined as unforeseeable changes in the statistical properties of an incoming data stream over time. State-of-the-art concept-drift detectors however still exhibit high false-positive rates in their drift identification, which then leads to an undue amount of computational resources spent by the underlying ML algorithm on retraining its model. In this paper, we propose OPTWIN, our “OPTimal WINdow” concept drift detector suited for both classification and regression problems. The novelty of OPTWIN lies in identifying where to split a sliding window W of error rates produced by an ML model into two provably optimal sub-windows, such that the split occurs at the earliest event at which a statistically significant difference according to either the t- or the f -tests occurred. Specifically, OPTWIN reaches this result by (1) considering both the mean and the variance of the error rates, and (2) improves the cost of detecting this optimal split from O(log|W|) to O(1). We assessed OPTWIN over the MOA framework, using ADWIN, DDM, EDDM, STEPD, and ECDD as baselines over 12 synthetic and real-world datasets with both sudden and gradual concept drifts. In our experiments, OPTWIN surpasses the F1-score of the baselines in a statistically significant manner while maintaining a lower detection delay and saving up to 21% of time spent on retraining the models.
Disciplines :
Computer science
Author, co-author :
DALLE LUCCA TOSI, Mauro  ;  University of Luxembourg > Faculty of Science, Technology and Medicine > Department of Computer Science > Team Martin THEOBALD
THEOBALD, Martin ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
External co-authors :
no
Language :
English
Title :
OPTWIN: Drift Identification with Optimal Sub-Windows
Publication date :
13 May 2024
Event name :
2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)
Event date :
13-16 May 2024
Audience :
International
Journal title :
2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)
ISSN :
2473-3490
eISSN :
1943-2895
Publisher :
IEEE
Peer reviewed :
Peer reviewed
Focus Area :
Computational Sciences
FnR Project :
FNR12252781 - Data-driven Computational Modelling And Applications, 2017 (01/09/2018-28/02/2025) - Andreas Zilian
Name of the research project :
R-AGR-3440 - PRIDE17/12252781 DRIVEN_Common - ZILIAN Andreas
Funders :
FNR - Luxembourg National Research Fund
Funding number :
12252781
Funding text :
This work is funded by the Luxembourg National Research Fund under the PRIDE program (PRIDE17/12252781)
Available on ORBilu :
since 01 July 2024

Statistics


Number of views
165 (2 by Unilu)
Number of downloads
65 (1 by Unilu)

OpenAlex citations
 
4

Bibliography


Similar publications



Contact ORBilu