Abstract :
[en] The substantial increase in AI model training has considerable environmental
implications, mandating more energy-efficient and sustainable AI practices. On
the one hand, data-centric approaches show great potential towards training
energy-efficient AI models. On the other hand, instance selection methods
demonstrate the capability of training AI models with minimised training sets
and negligible performance degradation. Despite the growing interest in both
topics, the impact of data-centric training set selection on energy efficiency
remains to date unexplored. This paper presents an evolutionary-based sampling
framework aimed at (i) identifying elite training samples tailored for datasets
and model pairs, (ii) comparing model performance and energy efficiency gains
against typical model training practice, and (iii) investigating the
feasibility of this framework for fostering sustainable model training
practices. To evaluate the proposed framework, we conducted an empirical
experiment including 8 commonly used AI classification models and 25 publicly
available datasets. The results showcase that by considering 10% elite training
samples, the models' performance can show a 50% improvement and remarkable
energy savings of 98% compared to the common training practice.