In support of push-based streaming for the computing continuum

MARCU, Ovidiu-Cristian; BOUVRY, Pascal

Request a copy

Paper published in a book (Scientific congresses, symposiums and conference proceedings)

In support of push-based streaming for the computing continuum

MARCU, Ovidiu-Cristian; BOUVRY, Pascal

2023 • In 15th Asian Conference on Intelligent Information and Database Systems

Peer reviewed

Permalink
https://hdl.handle.net/10993/55904

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

paper_63.pdf

Publisher postprint (519.12 kB)

Request a copy

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

streaming; real-time storage; push-based; pull-based; locality

Abstract :

[en] Real-time data architectures are core tools for implementing the edge-to-cloud computing continuum since streams are a natural abstraction for representing and predicting the needs of such applications. Over the past decade, Big Data architectures evolved into specialized layers for handling real-time storage and stream processing. Open-source streaming architectures efficiently decouple fast storage and processing engines by implementing stream reads through a pull-based interface exposed by storage. However, how much data the stream source operators have to pull from storage continuously and how often to issue pull-based requests are configurations left to the application and can result in increased system resources and overall reduced application performance. To tackle these issues, this paper proposes a unified streaming architecture that integrates co-located fast storage and streaming engines through push-based source integrations, making the data available for processing as soon as storage has them. We empirically evaluate pull-based versus push-based design alternatives of the streaming source reader and discuss the advantages of both approaches.

Disciplines :

Computer science

Author, co-author :

MARCU, Ovidiu-Cristian ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)

BOUVRY, Pascal ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)

External co-authors :

Language :

English

Title :

In support of push-based streaming for the computing continuum

Publication date :

2023

Event name :

15th Asian Conference on Intelligent Information and Database Systems

Event date :

from 24-07-2023 to 26-07-2023

Audience :

International

Main work title :

15th Asian Conference on Intelligent Information and Database Systems

Peer reviewed :

Peer reviewed

Additional URL :

https://inria.hal.science/hal-04150523

Commentary :

The experiments presented in this paper were carried out using the HPC facilities of the University of Luxembourg – see hpc.uni.lu. This work is partially funded by the SnT-LuxProvide partnership on bridging clouds and supercomputers and by the Fonds National de la Recherche Luxembourg (FNR) POLLUX program under the SERENITY Project (ref. C22/IS/17395419).

Available on ORBilu :

since 08 September 2023

Statistics

Number of views

188 (2 by Unilu)

Number of downloads

0 (0 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

Bibliography

Next CERN accelerator logging service architecture. https://www.slideshare.net/SparkSummit/next-cern-accelerator-logging-service-with-jakub-wozniak
Sensitive information detection using the NVIDIA Morpheus AI framework (2021). https://developers.redhat.com/articles/2021/10/18/sensitive-informationdetection-using-nvidia-morpheus-ai-framework
Apache Flink (2022). https://flink.apache.org/
Apache Kafka (2022). https://kafka.apache.org/
Apache Pulsar (2022). https://pulsar.apache.org/
Apache Spark (2022). https://spark.apache.org/
Large Hadron Holider. (2022). http://home.cern/topics/large-hadron-collider
Pravega (2022). http://pravega.io/
Akidau, T., et al.: The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proc. VLDB Endow. 8(12), 1792–1803 (2015). https://doi.org/10.14778/2824032. 2824076
Antoniu, G., Valduriez, P., Hoppe, H.C., Krüger, J.: Towards integrated hardware/software ecosystems for the edge-cloud-HPC. Continuum (2021). https://doi.org/10.5281/zenodo.5534464
Bhardwaj, A., Kulkarni, C., Stutsman, R.: Adaptive placement for in-memory storage functions. In: 2020 USENIX Annual Technical Conference (USENIX ATC 20), pp. 127–141. USENIX Association, July 2020
Carbone, P., Ewen, S., Fóra, G., Haridi, S., Richter, S., Tzoumas, K.: State management in apache flink R○: Consistent stateful distributed stream processing. Proc. VLDB Endow. 10(12), 1718–1729 (2017). https://doi.org/10.14778/3137765. 3137777
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008). https://doi.org/10.1145/1327452.1327492
Fried, J., Ruan, Z., Ousterhout, A., Belay, A.: Caladan: Mitigating Interference at Microsecond Timescales. USENIX Association, USA (2020)
Javed, M.H., Lu, X., Panda, D.K.D.: Characterization of big data stream processing pipeline: a case study using flink and kafka. In: Proceedings of the Fourth IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, BDCAT 2017, pp. 1–10. Association for Computing Machinery, New York, NY, USA (2017). https://doi.org/10.1145/3148055.3148068
Jay, K., Neha, N., Jun, R.: Kafka: a distributed messaging system for log processing. In: Proceedings of 6th International Workshop on Networking Meets Databases, NetDB 2011 (2011)
Kalavri, V., Liagouris, J., Hoffmann, M., Dimitrova, D., Forshaw, M., Roscoe, T.: Three steps is all you need: fast, accurate, automatic scaling decisions for distributed streaming dataflows. In: Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation, OSDI 2018, pp. 783–798. USENIX Association, USA (2018)
Marcu, O.C., Bouvry, P.: Colocating real-time storage and processing: an analysis of pull-based versus push-based streaming (2022)
Marcu, O.C., et al.: Kera: scalable data ingestion for stream processing. In: 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), pp. 1480–1485 (2018). https://doi.org/10.1109/ICDCS.2018.00152
Marcu, O.C., Costan, A., Nicolae, B., Antonin, G.: Virtual log-structured storage for high-performance streaming. In: 2021 IEEE International Conference on Cluster Computing (CLUSTER), pp. 135–145 (2021). https://doi.org/10.1109/Cluster48925.2021.00046
Miao, H., Park, H., Jeon, M., Pekhimenko, G., McKinley, K.S., Lin, F.X.: Streambox: modern stream processing on a multicore machine. In: USENIX ATC, pp. 617–629. USENIX Association (2017)
Nguyen, S., Salcic, Z., Zhang, X., Bisht, A.: A low-cost two-tier fog computing testbed for streaming IoT-based applications. IEEE Internet Things J. 8(8), 6928– 6939 (2021). https://doi.org/10.1109/JIOT.2020.3036352
Ousterhout, A., Fried, J., Behrens, J., Belay, A., Balakrishnan, H.: Shenango: achieving high CPU efficiency for latency-sensitive datacenter workloads. In: Proceedings of the 16th USENIX Conference on Networked Systems Design and Implementation, NSDI 2019, pp. 361–377. USENIX Association, USA (2019)
Qin, H., Li, Q., Speiser, J., Kraft, P., Ousterhout, J.: Arachne: Core-aware thread management. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), USENIX Association, Carlsbad, CA (2018)
Sijie, G., Robin, D., Leigh, S.: Distributedlog: a high performance replicated log service. In: IEEE 33rd International Conference on Data Engineering, ICDE 2017 (2017)
Varrette, S., Bouvry, P., Cartiaux, H., Georgatos, F.: Management of an academic HpC cluster: The UL experience. In: 2014 International Conference on High Performance Computing Simulation (HPCS), pp. 959–967 (2014). https://doi.org/10. 1109/HPCSim.2014.6903792
Venkataraman, S., et al.: Drizzle: fast and adaptable stream processing at scale. In: 26th SOSP, pp. 374–389. ACM (2017). https://doi.org/10.1145/3132747.3132750
Zou, J., Iyengar, A., Jermaine, C.: Pangea: monolithic distributed storage for data analytics. Proc. VLDB Endow. 12(6), 681–694 (2019). https://doi.org/10.14778/3311880.3311885