Communication publiée dans un ouvrage (Colloques, congrès, conférences scientifiques et actes)
Adaptive Sparsity Level during Training for Efficient Time Series Forecasting with Transformers
Atashgahi, Zahra; Pechenizkiy, Mykola; Veldhuis, Raymond et al.
2024In ECMLPKDD 2024: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
Peer reviewed
 

Documents


Texte intégral
2305.18382.pdf
Postprint Auteur (1.93 MB)
Télécharger

Tous les documents dans ORBilu sont protégés par une licence d'utilisation.

Envoyer vers



Détails



Mots-clés :
Computer Science - Learning; Sparse Training; Sparse Neural Networks; Time Series
Résumé :
[en] Efficient time series forecasting has become critical for real-world applications, particularly with deep neural networks (DNNs). Efficiency in DNNs can be achieved through sparse connectivity and reducing the model size. However, finding the sparsity level automatically during training remains a challenging task due to the heterogeneity in the loss-sparsity tradeoffs across the datasets. In this paper, we propose \enquote{\textbf{P}runing with \textbf{A}daptive \textbf{S}parsity \textbf{L}evel} (\textbf{PALS}), to automatically seek an optimal balance between loss and sparsity, all without the need for a predefined sparsity level. PALS draws inspiration from both sparse training and during-training methods. It introduces the novel "expand" mechanism in training sparse neural networks, allowing the model to dynamically shrink, expand, or remain stable to find a proper sparsity level. In this paper, we focus on achieving efficiency in transformers known for their excellent time series forecasting performance but high computational cost. Nevertheless, PALS can be applied directly to any DNN. In the scope of these arguments, we demonstrate its effectiveness also on the DLinear model. Experimental results on six benchmark datasets and five state-of-the-art transformer variants show that PALS substantially reduces model size while maintaining comparable performance to the dense model. More interestingly, PALS even outperforms the dense model, in 12 and 14 cases out of 30 cases in terms of MSE and MAE loss, respectively, while reducing 65% parameter count and 63% FLOPs on average. Our code will be publicly available upon acceptance of the paper.
Disciplines :
Sciences informatiques
Auteur, co-auteur :
Atashgahi, Zahra;  University of Twente [NL]
Pechenizkiy, Mykola;  Eindhoven University of Technology [NL]
Veldhuis, Raymond;  University of Twente [NL]
MOCANU, Decebal Constantin  ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS) ; University of Twente [NL] ; Eindhoven University of Technology [NL]
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
Adaptive Sparsity Level during Training for Efficient Time Series Forecasting with Transformers
Date de publication/diffusion :
2024
Nom de la manifestation :
ECMLPKDD 2024: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
Lieu de la manifestation :
Vilnius, Lithuanie
Date de la manifestation :
from 9 to 13 September 2024
Manifestation à portée :
International
Titre de l'ouvrage principal :
ECMLPKDD 2024: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
Maison d'édition :
LNCS
Peer reviewed :
Peer reviewed
Focus Area :
Computational Sciences
Objectif de développement durable (ODD) :
9. Industrie, innovation et infrastructure
Disponible sur ORBilu :
depuis le 15 janvier 2024

Statistiques


Nombre de vues
183 (dont 9 Unilu)
Nombre de téléchargements
69 (dont 1 Unilu)

citations Scopus®
 
1
citations Scopus®
sans auto-citations
1
citations OpenAlex
 
2

Bibliographie


Publications similaires



Contacter ORBilu