TensAIR: Real-Time Training of Neural Networks from Data-streams

DALLE LUCCA TOSI, Mauro; ELLAMPALLIL VENUGOPAL, Vinu; THEOBALD, Martin

No full text

Eprint already available on another site (E-prints, Working papers and Research blog)

TensAIR: Real-Time Training of Neural Networks from Data-streams

DALLE LUCCA TOSI, Mauro; ELLAMPALLIL VENUGOPAL, Vinu; THEOBALD, Martin

2022

Permalink
https://hdl.handle.net/10993/54534

Files (0)Send to Details Statistics Bibliography Similar publications

Files

Full Text

No document available.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Online Learning; Neural Networks; Asynchronous Stream Processing; Asynchronous Stochastic Gradient Descent

Abstract :

[en] Online learning (OL) from data streams is an emerging area of research that encompasses numerous challenges from stream processing, machine learning, and networking. Stream-processing platforms, such as Apache Kafka and Flink, have basic extensions for the training of Artificial Neural Networks (ANNs) in a stream-processing pipeline. However, these extensions were not designed to train ANNs in real-time, and they suffer from performance and scalability issues when doing so. This paper presents TensAIR, the first OL system for training ANNs in real time. TensAIR achieves remarkable performance and scalability by using a decentralized and asynchronous architecture to train ANN models (either freshly initialized or pre-trained) via DASGD (decentralized and asynchronous stochastic gradient descent). We empirically demonstrate that TensAIR achieves a nearly linear scale-out performance in terms of (1) the number of worker nodes deployed in the network, and (2) the throughput at which the data batches arrive at the dataflow operators. We depict the versatility of TensAIR by investigating both sparse (word embedding) and dense (image classification) use cases, for which TensAIR achieved from 6 to 116 times higher sustainable throughput rates than state-of-the-art systems for training ANN in a stream-processing pipeline.

Research center :

ULHPC - University of Luxembourg: High Performance Computing

Disciplines :

Computer science

Author, co-author :

DALLE LUCCA TOSI, Mauro ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)

ELLAMPALLIL VENUGOPAL, Vinu ; IIIT Bangalore > ScaDS Lab

THEOBALD, Martin ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)

Language :

English

Title :

TensAIR: Real-Time Training of Neural Networks from Data-streams

Publication date :

2022

Source :

https://arxiv.org/abs/2211.10280

FnR Project :

FNR12252781 - Data-driven Computational Modelling And Applications, 2017 (01/09/2018-28/02/2025) - Andreas Zilian

Available on ORBilu :

since 06 March 2023

Statistics

Number of views

64 (16 by Unilu)

Number of downloads

0 (0 by Unilu)

More statistics