Asynchronous Stream Data Processing using a Light-Weight and High-Performance Dataflow Engine
English
Ellampallil Venugopal, Vinu[University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS) >]
Theobald, Martin[University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS) >]
11-Dec-2020
International
The Dutch-Belgian DataBase Day (DBDBD) 2020
11-12-2020
Software Languages Lab of the Vrije Universiteit Brussel
Brussels
Belgium
[en] Stream data processing ; Big Data ; sustainable-throughput
[en] Processing high-throughput data-streams has become a major challenge in areas such as real-time event monitoring, complex dataflow processing, and big data analytics. While there has been tremendous progress in distributed stream processing systems in the past few years, the high-throughput and low-latency (a.k.a. high sustainable-throughput) requirement of modern applications is pushing the limits of traditional data processing infrastructures. This paper introduces a new distributed stream data processing engine (DSPE), called “Asynchronous Iterative Routing” or simply AIR, which implements a light-weight, dynamic sharding protocol. AIR expedites a direct and asynchronous communication among all the worker nodes via multiple Message Passing Interface (MPI) communication channels and thereby completely avoids any additional communication overhead with a dedicated master node. With its unique design, AIR scales out to clusters consisting of up to 8 nodes and 224 cores, performing much better than existing DSPEs, and it performs up to 15 times better than Spark and Flink in terms of sustainable-throughput.