Article (Scientific journals)
Drop-in efficient self-attention approximation method
FRANCOIS, Damien; Saillot, Mathis; KLEIN, Jacques et al.
2025In Machine Learning, 114 (6)
Peer Reviewed verified by ORBi
 

Files


Full Text
s10994-025-06768-3.pdf
Publisher postprint (1.69 MB) Creative Commons License - Attribution
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
Attention approximation; Deep learning; Machine learning; Self-attention; Transformers; Approximation methods; Attention mechanisms; Machine-learning; Memory usage; Sequence lengths; State-of-the-art performance; Transformer; Software; Artificial Intelligence
Abstract :
[en] Transformers have achieved state-of-the-art performance in most common tasks to which they have been applied. Those achievements are attributed to the Self-Attention mechanism at their core. Self-Attention is understood to map the relationship between tokens of any given sequence. This exhaustive mapping incurs massive costs in memory and inference time, as Self-Attention scales quadratically with regard to sequence length. Standard Self-Attention has required increasingly large compute and memory usage when applied to long input sequences because of this memory and time bottleneck. Efficient Transformers emerged as performant alternatives demonstrating good scalability and occasionally better tracking of long-range dependencies. Their efficiency gains are obtained through different methods, usually focusing on the linear scaling of the attention matrix through sparsification, approximation, or other methods. Among existing approaches, those using low-rank approximation present particular advantages because of their compatibility with standard Self-Attention-based models, allowing for weight transfers and other time-saving schemes. More recently, hardware-aware versions of Self-Attention (e.g., FlashAttention) have mitigated all memory bottlenecks and have alleviated its compute burden. Unfortunately, hardware-aware Self-Attentions have stricter hardware compatibility requirements making Efficient Transformers still relevant for use on older or less powerful hardware. Furthermore, some Efficient Transformers can even be applied in an hardware-aware manner to further improve training and inference speed. In this paper, we propose a novel linear approximation method for Self-Attention inspired by the CUR approximation method. This method, proposed in two versions (one leveraging FlashAttention), is conceived as a drop-in replacement for standard Self-Attention with weights compatibility. Our method compares favorably to standard Transformers’ and Efficient Transformers’ performances on varied tasks and demonstrates a significant decrease in memory footprint as well as competitive performance in training speed, even compared to similar methods.
Research center :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > Other
Luxembourg Centre for Systems Biomedicine (LCSB): Integrative Cell Signalling (Skupin Group)
Disciplines :
Computer science
Author, co-author :
FRANCOIS, Damien  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX
Saillot, Mathis ;  LGIPM, Université de Lorraine, Metz, France
KLEIN, Jacques  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX
Bissyandé, Tegawendé F. ;  SnT, University of Luxembourg, Luxembourg, Luxembourg
SKUPIN, Alexander  ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Integrative Cell Signalling ; Department of Neurosciences, University of California San Diego, United States
External co-authors :
yes
Language :
English
Title :
Drop-in efficient self-attention approximation method
Publication date :
25 April 2025
Journal title :
Machine Learning
ISSN :
0885-6125
eISSN :
1573-0565
Publisher :
Springer
Volume :
114
Issue :
6
Peer reviewed :
Peer Reviewed verified by ORBi
Focus Area :
Computational Sciences
Name of the research project :
U-AGR-6008 - IAS AUDACITY IDAE - SKUPIN Alexander
Funders :
Institute for Advanced Studies of the University of Luxembourg
Funding text :
Author Damien Fran\u00E7ois acknowledges financial support of the Institute for Advanced Studies of the University of Luxembourg through the IDAE Audacity Grant (AUDACITY- 2021)
Available on ORBilu :
since 21 May 2025

Statistics


Number of views
97 (6 by Unilu)
Number of downloads
47 (1 by Unilu)

Scopus citations®
 
0
Scopus citations®
without self-citations
0
OpenCitations
 
0
OpenAlex citations
 
0
WoS citations
 
0

Bibliography


Similar publications



Contact ORBilu