Stream data processing; Big Data; sustainable-throughput
Abstract :
[en] Processing high-throughput data-streams has become a major challenge in areas such as real-time event monitoring, complex dataflow processing, and big data analytics. While there has been tremendous progress in distributed stream processing systems in the past few years, the high-throughput and low-latency (a.k.a. high sustainable-throughput) requirement of modern applications is pushing the limits of traditional data processing infrastructures. This paper introduces a new distributed stream data processing engine (DSPE), called “Asynchronous Iterative Routing” or simply AIR, which implements a light-weight, dynamic sharding protocol. AIR expedites a direct and asynchronous communication among all the worker nodes via multiple Message Passing Interface (MPI) communication channels and thereby completely avoids any additional communication overhead with a dedicated master node. With its unique design, AIR scales out to clusters consisting of up to 8 nodes and 224 cores, performing much better than existing DSPEs, and it performs up to 15 times better than Spark and Flink in terms of sustainable-throughput.
Disciplines :
Computer science
Author, co-author :
ELLAMPALLIL VENUGOPAL, Vinu ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
THEOBALD, Martin ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
Language :
English
Title :
Asynchronous Stream Data Processing using a Light-Weight and High-Performance Dataflow Engine
Publication date :
11 December 2020
Event name :
The Dutch-Belgian DataBase Day (DBDBD) 2020
Event organizer :
Software Languages Lab of the Vrije Universiteit Brussel