A Robust Approach for Ensuring Total Order Execution of Replicated Sporadic Tasks in Fault-Tolerant Multiprocessor Real-Time Systems

NAGHAVI, Amin; NAVET, Nicolas

doi:10.1145/3765620

Article (Scientific journals)

A Robust Approach for Ensuring Total Order Execution of Replicated Sporadic Tasks in Fault-Tolerant Multiprocessor Real-Time Systems

NAGHAVI, Amin; NAVET, Nicolas

2025 • In ACM Transactions on Cyber-Physical Systems, 9 (4), p. 36

Peer Reviewed verified by ORBi

Permalink
https://hdl.handle.net/10993/65645

DOI
10.1145/3765620

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

3765620 (1).pdf

Publisher postprint (102.28 MB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Real-Time Systems; Multiprocessor Systems; Fault Tolerance; Replication; Total Order

Abstract :

[en] Replication and diversification are commonly used fault-tolerance techniques to mask accidental faults or malicious behavior of compromised nodes in cyber-physical systems. In event-driven systems, executing diversified replicated tasks across multiple nodes can result in their different execution orders. Implementing a total order protocol for job execution across all nodes ensures consistency and facilitates recovery in case of failures. However, achieving total order comes with significant costs due to the high communication and coordination demands among nodes. Existing solutions require coordination either before each job execution or at each job release. Moreover, some total order protocols may lead to unbounded priority inversion on certain nodes in order to maintain a global execution order. Malicious nodes can deliberately exploit these protocols to launch priority inversion attacks, thereby jeopardizing the timeliness of tasks on healthy nodes in time-critical applications. We propose a total order execution protocol that guarantees bounds on the priority inversion tasks experience and ensures that tasks meet their deadlines in real-time systems. Our approach withstands priority inversion attacks and leverages common knowledge among nodes rather than relying on communication, allowing them to progress independently while still ensuring a consistent execution order of job replicas across nodes upon their release. Although inter-node communication is not required, the method can benefit from exchanged progress data to reduce job response times. It is compatible with coarsely synchronized clocks and, unlike other total order approaches, which are for non-preemptive scheduling, uses progress milestones to enable task preemption. We evaluate our method against existing approaches based on acceptance ratio and response times, and study how job response times vary with increasing communication delays when the approach is used.

Disciplines :

Computer science

Author, co-author :

NAGHAVI, Amin ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > APSIA

NAVET, Nicolas ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)

External co-authors :

Language :

English

Title :

A Robust Approach for Ensuring Total Order Execution of Replicated Sporadic Tasks in Fault-Tolerant Multiprocessor Real-Time Systems

Publication date :

October 2025

Journal title :

ACM Transactions on Cyber-Physical Systems

ISSN :

2378-962X

eISSN :

2378-9638

Publisher :

Association for Computing Machinery, New York, United States - New York

Volume :

Issue :

Pages :

Peer reviewed :

Peer Reviewed verified by ORBi

Focus Area :

Security, Reliability and Trust

Additional URL :

https://dl.acm.org/doi/10.1145/3765620

FnR Project :

FNR13691843 - ByzRT - Byzrt: Intrusion Resilient Real-time Communication And Computation In Autonomous Systems, 2019 (01/09/2020-31/08/2023) - Marcus Völp

Funders :

FNR - Fonds National de la Recherche

Funding number :

C19/IS/13691843/ByzRT

Available on ORBilu :

since 28 August 2025

Statistics

Number of views

90 (14 by Unilu)

Number of downloads

84 (6 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

Bibliography

Hakan Aydin Abhishek Roy and Dakai Zhu. 2021. Energy-aware primary/backup scheduling of periodic real-time tasks on heterogeneous multicore systems. Sustainable Computing: Informatics and Systems 29 (2021), 100474. DOI: 10.1016/j.suscom.2020.100474
Mani Amoozadeh, Arun Raghuramu, Chen-nee Chuah, Dipak Ghosal, H. Michael Zhang, Jeff Rowe, and Karl Levitt. 2015. Security vulnerabilities of connected vehicle streams and their impact on cooperative driving. IEEE Communications Magazine 53, 6 (2015), 126–132. DOI: 10.1109/MCOM.2015.7120028
Alaa Askkar. 2011. PA Telecommunications Minister: Palestinian Internet under Hacking Attacks. IMENC. Retrieved from http://www.imemc.org/article/62409
H. Aydin, R. Melhem, D. Mosse, and P. Mejia-Alvarez. 2001. Dynamic and aggressive scheduling techniques for power-aware real-time systems. In Proceedings 22nd IEEE Real-Time Systems Symposium (RTSS ’01) (Cat. No. 01PR1420), 95–105. DOI: 10.1109/REAL.2001.990600
Marko Bertogna, Giorgio Buttazzo, Mauro Marinoni, Gang Yao, Francesco Esposito, and Marco Caccamo. 2010. Preemption points placement for sporadic task Sets. In 2010 22nd Euromicro Conference on Real-Time Systems, 251–260. DOI: 10.1109/ECRTS.2010.9
N. L. Binkert, L. R. Hsu, A. G. Saidi, R. G. Dreslinski, A. L. Schultz, and S. K. Reinhardt. 2005. Performance analysis of system overheads in TCP/IP workloads. In 14th International Conference on Parallel Architectures and Compilation Techniques (PACT ’05), 218–228. DOI: 10.1109/PACT.2005.35
Miguel Castro and Barbara Liskov. 2002. Practical byzantine fault tolerance and proactive recovery. ACM Transactions on Computer Systems 20, 4 (Nov. 2002), 398–461. DOI: 10.1145/571637.571640
Stephen Checkoway, Damon McCoy, Brian Kantor, Danny Anderson, Hovav Shacham, Stefan Savage, Karl Koscher, Alexei Czeskis, Franziska Roesner, Tadayoshi Kohno, et al. 2011. Comprehensive experimental analyses of automotive attack surfaces. In USENIX Security Symposium.
Liming Chen and A. Avizienis. 1995. N-version programming: A fault-tolerance approach to reliability of software operation. In 25th International Symposium on Fault-Tolerant Computing, ‘Highlights from Twenty-Five Years’, 113. DOI: 10.1109/FTCSH.1995.532621
Weifan Chen, Ivan Izhbirdeev, Denis Hoornaert, Shahin Roozkhosh, Patrick Carpanedo, Sanskriti Sharma, and Renato Mancuso. 2023. Low-overhead online assessment of timely progress as a system commodity. In 35th Euromicro Conference on Real-Time Systems (ECRTS ’23). Alessandro V. Papadopoulos (Ed.), Leibniz International Proceedings in Informatics (LIPIcs), Vol. 262, Schloss Dagstuhl—Leibniz-Zentrum für Informatik, Dagstuhl, Article 13, 1–26. DOI: 10.4230/LIPIcs.ECRTS.2023.13
US Federal Energy Regulatory Commission. 2016. Reliability Standards for Physical Security Measures. RD14-6-000.
Miguel Correia, Nuno Ferreira Neves, and Paulo Verissimo. 2013. BFT-TO: Intrusion tolerance with less replicas. The Computer Journal 56, 6 (2013), 693–715. DOI: 10.1093/comjnl/bxs148
Flaviu Cristian, Danny Dolev, Ray Strong, and Houtan Aghili. 1990. Atomic broadcast in a real-time environment. In Fault-Tolerant Distributed Computing. Barbara Simons and Alfred Spector (Eds.), Springer, New York, NY, 51–71.
Sadegh Davari and Lui Sha. 1992. Sources of unbounded priority inversions in real-time systems and a comparative study of possible solutions. ACM SIGOPS Operating Systems Review 26, 2 (Apr. 1992), 110–120. DOI: 10.1145/142111.142126
Tobias Distler. 2021. Byzantine fault-tolerant state-machine replication from a systems perspective. ACM Computing Surveys 54, 1, Article 24 (Feb. 2021), 38 pages. DOI: 10.1145/3436728
Heiko Falk, Sebastian Altmeyer, Peter Hellinckx, Björn Lisper, Wolfgang Puffitsch, Christine Rochange, Martin Schoeberl, Rasmus Bo Sørensen, Peter Wägemann, and Simon Wegener. 2016. TACLeBench: A benchmark collection to support worst-case execution time research. In 16th International Workshop on Worst-Case Execution Time Analysis. DOI: 10.4230/OASIcs.WCET.2016.2
Pietro Fara, Gabriele Serra, Alessandro Biondi, and Ciro Donnarumma. 2021. Scheduling replica voting in fixed-priority real-time systems. In 33rd Euromicro Conference on Real-Time Systems (ECRTS ’21). Björn B. Brandenburg (Ed.), Leibniz International Proceedings in Informatics (LIPIcs), Vol. 196, Schloss Dagstuhl—Leibniz-Zentrum für Informatik, Dagstuhl, Article 13, 1–21. DOI: 10.4230/LIPIcs.ECRTS.2021.13
Neeraj Gandhi, Edo Roth, Robert Gifford, Linh Thi Xuan Phan, and Andreas Haeberlen. 2020. Bounded-time recovery for distributed real-time systems. In 2020 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 110–123. DOI: 10.1109/RTAS48715.2020.00-13
Miguel Garcia, Alysson Bessani, Ilir Gashi, Nuno Neves, and Rafael Obelheiro. 2011. OS diversity for intrusion tolerance: Myth or reality? In 2011 IEEE/IFIP 41st International Conference on Dependable Systems and Networks (DSN), 383–394. DOI: 10.1109/DSN.2011.5958251
Ajei Gopal, Ray Strong, Sam Toueg, and Flaviu Cristian. 1990. Early-delivery atomic broadcast. In 9th Annual ACM Symposium on Principles of Distributed Computing, 297–309.
David Griffin, Iain Bate, and Robert I. Davis. 2020. Generating utilization vectors for the systematic evaluation of schedulability tests. In 2020 IEEE Real-Time Systems Symposium (RTSS), 76–88. DOI: 10.1109/RTSS49844.2020.00018
Arpan Gujarati, Sergey Bozhko, and Björn B. Brandenburg.2020. Real-time replica consistency over ethernet with reliability bounds. In 2020 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 376–389. DOI: 10.1109/RTAS48715.2020.00012
Arpan Gujarati, Ningfeng Yang, and Björn B. Brandenburg. 2022. In-ConcReTeS: Interactive consistency meets distributed real-time systems, again! In 2022 IEEE Real-Time Systems Symposium (RTSS), 211–224. DOI: 10.1109/RTSS55097.2022.00027
Mario Günzel, Harun Teper, Kuan-Hsun Chen, Georg von der Brüggen, and Jian-Jia Chen. 2023. On the equivalence of maximum reaction time and maximum data age for cause-effect chains. In 35th Euromicro Conference on Real-Time Systems (ECRTS ’23). Alessandro V. Papadopoulos (Ed.), Leibniz International Proceedings in Informatics (LIPIcs), Vol. 262, Schloss Dagstuhl—Leibniz-Zentrum für Informatik, Dagstuhl, Article 10, 1–22. DOI: 10.4230/LIPIcs.ECRTS.2023.10
Zhishan Guo, Sudharsan Vaidhun, Abdullah Al Arafat, Nan Guan, and Kecheng Yang. 2023. Stealing static slack via WCRT and sporadic p-servers in deadline-driven scheduling. In 2023 IEEE Real-Time Systems Symposium (RTSS), 40–52. DOI: 10.1109/RTSS59052.2023.00014
Mario Günzel, Kuan-Hsun Chen, Niklas Ueter, Georg von der Brüggen, Marco Dürr, and Jian-Jia Chen. 2021. Timing analysis of asynchronized distributed cause-effect chains. In 2021 IEEE 27th Real-Time and Embedded Technology and Applications Symposium (RTAS), 40–52. DOI: 10.1109/RTAS52030.2021.00012
Monowar Hasan, Sibin Mohan, Rodolfo Pellizzoni, and Rakesh B. Bobba. 2018. A design-space exploration for allocating security tasks in multicore real-time systems. In 2018 Design, Automation Test in Europe Conference Exhibition (DATE), 225–230. DOI: 10.23919/DATE.2018.8342007
Andrea Höller, Tobias Rauter, Johannes Iber, and Christian Kreiner. 2015. Diverse compiling for microprocessor fault detection in temporal redundant systems. In 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing, 1928–1935. DOI: 10.1109/CIT/IUCC/DASC/PICOM.2015.285
Gregg Keizer. 2010. Is Stuxnet the ‘Best’ Malware Ever? Retrieved from http://www.infoworld.com/article/2626009/malware/is-stuxnet-the-best-malware-ever-.html
J. Kim, G. Park, H. Shim, and Y. Eun. 2016. Zero-stealthy attack for sampled data control systems: The case of faster actuation than sensing. In IEEE Conference on Decision and Control (CDC), 5956–5961.
K. H. Kim, Jing Qian, Zhen Zhang, Qian Zhou, Kyung-Deok Moon, Jun-Hee Park, Kwang-Roh Park, and Doo-Hyun Kim. 2010. A scheme for reliable real-time messaging with bounded delays. In 2010 13th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, 18–27. DOI: 10.1109/ISORC.2010.45
Leonie Köhler, Phil Hertha, Matthias Beckert, Alex Bendrick, and Rolf Ernst. 2023. Robust cause-effect chains with bounded execution time and system-level logical execution time. ACM Transactions on Embedded Computing Systems 22, 3, Article 50 (Apr. 2023), 28 pages. DOI: 10.1145/3573388
H. Kopetz and G. Grunsteidl. 1993. TTP—A time-triggered protocol for fault-tolerant real-time systems. In 23rd International Symposium on Fault-Tolerant Computing (FTCS ’23), 524–533. DOI: 10.1109/FTCS.1993.627355
D. Kozhaya, J. Decouchant, and P. Esteves-Verissimo. 2019. RT-ByzCast: Byzantine-resilient real-time reliable broadcast. IEEE Transactions on Computers 68, 03 (Mar. 2019), 440–454. DOI: 10.1109/TC.2018.2871443
David Kozhaya, Jérémie Decouchant, Vincent Rahli, and Paulo Esteves-Verissimo. 2021. PISTIS: An event-triggered real-time byzantine-resilient protocol suite. IEEE Transactions on Parallel and Distributed Systems 32, 9 (2021), 2277–2290. DOI: 10.1109/TPDS.2021.3056718
Angeliki Kritikakou, Christine Rochange, Madeleine Faugère, Claire Pagetti, Matthieu Roy, Sylvain Girbal, and Daniel Gracia Pérez. 2014. Distributed run-time WCET controller for concurrent critical tasks in mixed-critical systems. In 22nd International Conference on Real-Time Networks and Systems (RTNS ’14). ACM, New York, NY, 139–148. DOI: 10.1145/2659787.2659799
Kristin Krüger, Marcus Völp, and Gerhard Fohler. 2018. Vulnerability analysis and mitigation of directed timing inference based attacks on time-triggered systems. In 30th Euromicro Conference on Real-Time Systems (ECRTS ’18). Sebastian Altmeyer (Ed.), Leibniz International Proceedings in Informatics (LIPIcs), Vol. 106, Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Article 22, 1–17. DOI: 10.4230/LIPIcs.ECRTS.2018.22
Kristin Krüger, Nils Vreman, Richard Pates, Martina Maggio, Marcus Völp, and Gerhard Fohler. 2021. Randomization as mitigation of directed timing inference based attacks on time-triggered real-time systems with task replication. Leibniz Transactions on Embedded Systems 7, Article 1 (Aug. 2021), 1–29. DOI: 10.4230/LITES.7.1.1
Robert M. Lee, Michael J. Assante, and Tim Conway. 2016. Analysis of the Cyber Attack on the Ukrainian Power Grid. E-ISAC. Retrieved from https://ics.sans.org/media/E-ISAC_SANS_Ukraine_DUC_5.pdf
Haoran Li, Chenyang Lu, and Christopher D. Gill. 2021. RT-ZooKeeper: Taming the recovery latency of a coordination service. ACM Transactions on Embedded Computing Systems 20, 5s, Article 103 (Sept. 2021), 22 pages. DOI: 10.1145/3477034
Jeanne Meserve. 2007. Mouse Click Could Plunge City into Darkness, Experts Say. Retrieved March 12, 2017 from http://edition.cnn.com/2007/US/09/27/power.at.risk/index.html
Sparsh Mittal. 2017. A survey of techniques for cache partitioning in multicore processors. ACM Computing Surveys 50, 2, Article 27 (May 2017), 39 pages. DOI: 10.1145/3062394
Amin Naghavi and Nicolas Navet. 2025. Total execution order in fault-tolerant real-time systems. In 32nd International Conference on Real-Time Networks and Systems (RTNS ’24). ACM, New York, NY, 12–24. DOI: 10.1145/3696355.3699704
Amin Naghavi, Sepideh Safari, and Shaahin Hessabi. 2021. Tolerating permanent faults with low-energy overhead in multicore mixed-criticality systems. IEEE Transactions on Emerging Topics in Computing 10, 2 (2021), 985–996. DOI: 10.1109/TETC.2021.3059724
Mitra Nasri, Thidapat Chantem, Gedare Bloom, and Ryan M. Gerdes. 2019. On the pitfalls and vulnerabilities of schedule randomization against schedule-based attacks. In 2019 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 103–116. DOI: 10.1109/RTAS.2019.00017
Risat Mahmud Pathan. 2014. Fault-tolerant and real-time scheduling for mixed-criticality systems. Real-Time Systems 50 (2014), 509–547. DOI: 10.1016/j.suscom.2020.100474
P. E. Veríssimo, N. F. Neves, and M. P. Correia. 2003. Intrusion-tolerant architectures: Concepts and design. In Architecting Dependable Systems. R. Lemos, C. Gacek, and A. Romanovsky (Eds.).Lecture Notes in Computer Science, Vol. 2677. Springer, Berlin. DOI: 10.1007/3-540-45177-3_1
M. Pease, R. Shostak, and L. Lamport. 1980. Reaching agreement in the presence of faults. Journal of the ACM 27, 2 (April 1980), 228–234. DOI: 10.1145/322186.322188
Linh T. X. Phan, Meng Xu, Jaewoo Lee, Insup Lee, and Oleg Sokolsky. 2013. Overhead-aware compositional analysis of real-time systems. In 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS), 237–246. DOI: 10.1109/RTAS.2013.6531096
S. Poledna, A. Burns, A. Wellings, and P. Barrett. 2000. Replica determinism and flexible scheduling in hard real-time dependable systems. IEEE Transactions on Computers 49, 2 (2000), 100–111. DOI: 10.1109/12.833107
Riccardo Pucella and Fred B. Schneider. 2010. Independence from obfuscation: A semantic framework for diversity. Journal of Computer Security 18, 5 (2010), 701–749.
Federico Reghenzani, Zhishan Guo, and William Fornaciari. 2023. Software fault tolerance in real-time systems: Identifying the future research questions. ACM Computing Surveys 55, 14s, Article 306 (July 2023), 30 pages. DOI: 10.1145/3589950
Luís Rodrigues, Paulo Veríssimo, and Antonio Casimiro. 1995. Priority-based totally ordered multicast. In 3rd IFIP/IFAC workshop on Algorithms and Architectures for Real-Time Control (AARTC ’95).
Edo Roth and Andreas Haeberlen. 2021. Do not overpay for fault tolerance! In 27th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS ’21). DOI: 10.1109/RTAS52030.2021.00037
Fred B. Schneider. 1993. Replication management using the state machine approach. In Distributed Systems (2nd Ed.). ACM Press/Addison-Wesley Publishing Co., USA, 169–197.
Soham Sinha, Richard West, and Ahmad Golchin. 2020. PAStime: Progress-aware scheduling for time-critical computing. In 32nd Euromicro Conference on Real-Time Systems (ECRTS ’20). Marcus Völp (Ed.), Leibniz International Proceedings in Informatics (LIPIcs), Vol. 165, Schloss Dagstuhl—Leibniz-Zentrum für Informatik, Dagstuhl, Article 3, 1–24. DOI: 10.4230/LIPIcs.ECRTS.2020.3
Giuliana Santos Veronese, Miguel Correia, Alysson Neves Bessani, Lau Cheuk Lung, and Paulo Verissimo. 2013. Efficient byzantine fault-tolerance. IEEE Transactions on Computers 62, 1 (2013), 16–30. DOI: 10.1109/TC.2011.221
Yun Wang, E. Anceaume, F. Brasileiro, F. Greve, and M. Hurfin. 2002. Solving the group priority inversion problem in a timed asynchronous system. IEEE Transactions on Computers 51, 8 (2002), 900–915. DOI: 10.1109/TC.2002.1024738
Yi-wen Zhang and Rui-feng Guo. 2013. Power-aware scheduling algorithms for sporadic tasks in real-time systems. Journal of Systems and Software 86, 10 (2013), 2611–2619. DOI: 10.1016/j.jss.2013.04.075
Jia Xu. 2010. A method for adjusting the periods of periodic processes to reduce the least common multiple of the period lengths in real-time embedded systems. In 2010 IEEE/ASME International Conference on Mechatronic and Embedded Systems and Applications, 288–294. DOI: 10.1109/MESA.2010.5552058
Gang Yao, Giorgio Buttazzo, and Marko Bertogna. 2009. Bounding the maximum length of non-preemptive regions under fixed priority scheduling. In 2009 15th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications. IEEE, 351–360.
Gang Yao, Giorgio Buttazzo, and Marko Bertogna. 2010. Feasibility analysis under fixed priority scheduling with fixed preemption points. In 2010 IEEE 16th International Conference on Embedded and Real-Time Computing Systems and Applications, 71–80. DOI: 10.1109/RTCSA.2010.40
Man-Ki Yoon, Sibin Mohan, Chien-Ying Chen, and Lui Sha. 2016. TaskShuffler: A schedule randomization protocol for obfuscation against timing inference attacks in real-time systems. In 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 1–12. DOI: 10.1109/RTAS.2016.7461362
Heechul Yun, Renato Mancuso, Zheng-Pei Wu, and Rodolfo Pellizzoni. 2014. PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms. In 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS), 155–166. DOI: 10.1109/RTAS.2014.6925999
Lin Zhang, Kaustubh Sridhar, Mengyu Liu, Pengyuan Lu, Xin Chen, Fanxin Kong, Oleg Sokolsky, and Insup Lee.2023. Real-time data-predictive attack-recovery for complex cyber-physical systems. In 2023 IEEE 29th Real-Time and Embedded Technology and Applications Symposium (RTAS), 209–222. DOI: 10.1109/RTAS58335.2023.00024
Yunhao Zhang, Srinath Setty, Qi Chen, Lidong Zhou, and Lorenzo Alvisi. 2020. Byzantine ordered consensus without byzantine oligarchy. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’20). USENIX Association, 633–649. https://www.usenix.org/conference/osdi20/presentation/zhang-yunhao
H. Zou and F. Jahanian. 1998. Real-time primary-backup (RTPB) replication with temporal consistency guarantees. In Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No. 98CB36183), 48–56. DOI: 10.1109/ICDCS.1998.679486