[en] Online Learning (OL) is a subfield of Machine Learning (ML) that is increasingly gaining attention in academia and industry. A long-standing challenge in OL is the presence of concept drifts, which are commonly defined as unforeseeable changes in the statistical properties of an incoming data stream over time. State-of-the-art concept-drift detectors however still exhibit high false-positive rates in their drift identification, which then leads to an undue amount of computational resources spent by the underlying ML algorithm on retraining its model. In this paper, we propose OPTWIN, our “OPTimal WINdow” concept drift detector suited for both classification and regression problems. The novelty of OPTWIN lies in identifying where to split a sliding window W of error rates produced by an ML model into two provably optimal sub-windows, such that the split occurs at the earliest event at which a statistically significant difference according to either the t- or the f -tests occurred. Specifically, OPTWIN reaches this result by (1) considering both the mean and the variance of the error rates, and (2) improves the cost of detecting this optimal split from O(log|W|) to O(1). We assessed OPTWIN over the MOA framework, using ADWIN, DDM, EDDM, STEPD, and ECDD as baselines over 12 synthetic and real-world datasets with both sudden and gradual concept drifts. In our experiments, OPTWIN surpasses the F1-score of the baselines in a statistically significant manner while maintaining a lower detection delay and saving up to 21% of time spent on retraining the models.
Disciplines :
Computer science
Author, co-author :
DALLE LUCCA TOSI, Mauro ; University of Luxembourg > Faculty of Science, Technology and Medicine > Department of Computer Science > Team Martin THEOBALD
THEOBALD, Martin ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
External co-authors :
no
Language :
English
Title :
OPTWIN: Drift Identification with Optimal Sub-Windows
Publication date :
13 May 2024
Event name :
2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)
Event date :
13-16 May 2024
Audience :
International
Journal title :
2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)