Convergence Analysis of Decentralized ASGD

DALLE LUCCA TOSI, Mauro; THEOBALD, Martin

No full text

Eprint already available on another site (E-prints, Working papers and Research blog)

Convergence Analysis of Decentralized ASGD

DALLE LUCCA TOSI, Mauro; THEOBALD, Martin

2023

Permalink
https://hdl.handle.net/10993/56001

Files (0)Send to Details Statistics Bibliography Similar publications

Files

Full Text

No document available.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

SGD; asynchronous; decentralized; ASGD

Abstract :

[en] Over the last decades, Stochastic Gradient Descent (SGD) has been intensively studied by the Machine Learning community. Despite its versatility and excellent performance, the optimization of large models via SGD still is a time-consuming task. To reduce training time, it is common to distribute the training process across multiple devices. Recently, it has been shown that the convergence of asynchronous SGD (ASGD) will always be faster than mini-batch SGD. However, despite these improvements in the theoretical bounds, most ASGD convergence-rate proofs still rely on a centralized parameter server, which is prone to become a bottleneck when scaling out the gradient computations across many distributed processes. In this paper, we present a novel convergence-rate analysis for decentralized and asynchronous SGD (DASGD) which does not require partial synchronization among nodes nor restrictive network topologies. Specifically, we provide a bound of O(σ ɛ⁻²) + O(Q S_avg ɛ⁻³ᐟ²)+ O(S_avg ɛ⁻¹)) for the convergence rate of DASGD, where S_avg is the average staleness between models, Q is a constant that bounds the norm of the gradients, and ɛ is a (small) error that is allowed within the bound. Furthermore, when gradients are not bounded, we prove the convergence rate of DASGD to be O(σ ɛ⁻²) + O(√(Ŝ_avg Ŝ_max) ɛ⁻¹)), with Ŝ_max and Ŝ_avg representing a loose version of the average and maximum staleness, respectively. Our convergence proof holds for a fixed stepsize and any non-convex, homogeneous, and L-smooth objective function. We anticipate that our results will be of high relevance for the adoption of DASGD by a broad community of researchers and developers.

Disciplines :

Computer science

Author, co-author :

DALLE LUCCA TOSI, Mauro ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)

THEOBALD, Martin ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)

Language :

English

Title :

Convergence Analysis of Decentralized ASGD

Publication date :

07 September 2023

Source :

https://arxiv.org/abs/2309.03754

Additional URL :

https://arxiv.org/pdf/2309.03754.pdf

FnR Project :

FNR12252781 - Data-driven Computational Modelling And Applications, 2017 (01/09/2018-28/02/2025) - Andreas Zilian

Available on ORBilu :

since 18 September 2023

Statistics

Number of views

338 (7 by Unilu)

Number of downloads

0 (0 by Unilu)

More statistics