Abstract :
[en] An increasing number of financial software system relies on Machine learning models to support human decision-makers.
Although these models have shown satisfactory performance to support human decision-makers in classifying financial transactions, the maintenance of such ML systems remains a challenge.
After deployment in production, the performance of the models tends to degrade over time due to concept drift.
Methods have been proposed to detect concept drift and retrain new models upon detection to mitigate the drop in performance.
However, little is known about the effectiveness of such methods in an industrial context.
In particular, their evaluation fails to consider the delay between the detection of the drift and the deployment of a new model.
This delay is inherent to the strict quality assurance and manual validation processes that financial (and other critical) institutions impose on their software systems.
To circumvent this limitation, we formalize the problem of retraining ML models against distribution drift in the presence of delay and propose a novel protocol to evaluate drift detectors. %
We report on an empirical study conducted on the transaction system of our industrial partner, BGL BNP Paribas, and two publicly available datasets: Lending Club Loan Data and Electricity.
We release our tool and benchmark on GitHub.
We demonstrate for the first time how ignoring the delays in the evaluation of the drift detectors overestimates their ability to mitigate performance drift, up to 39.86% for our industrial application.
Research center :
NCER-FT - FinTech National Centre of Excellence in Research
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SerVal - Security, Reasoning & Validation
Scopus citations®
without self-citations
0