Tolerating Node Failures in Multi-Processor Real-Time Systems with Data Dependencies

NAGHAVI, Amin; HU, Tingting; NAVET, Nicolas

doi:10.1109/HPCC67675.2025.00167

Download

Paper published in a journal (Scientific congresses, symposiums and conference proceedings)

Tolerating Node Failures in Multi-Processor Real-Time Systems with Data Dependencies

NAGHAVI, Amin; HU, Tingting; NAVET, Nicolas

2025 • In IEEE International Conference on High Performance Computing and Communications (HPCC)

Peer reviewed

Permalink
https://hdl.handle.net/10993/65426

DOI
10.1109/HPCC67675.2025.00167

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

ICESS-173-Camera-Ready-PDF Express Verified.pdf

Author preprint (1.48 MB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Real-Time Systems; Fault Tolerant; Cause Effect Chain; Data Dependency; Intrusion Tolerant

Abstract :

[en] In real-time systems with data-dependent tasks, ensuring both correctness and timeliness is critical not only for individual task executions but also for data processing across cause-effect chains. When these systems are deployed on multi-processor platforms, tasks belonging to the same chain might be distributed over multiple nodes. In such situations, data received from other nodes may be unreliable, as those nodes could be compromised by faults or malicious attacks. Prior research on fault-tolerant cause-effect chains has largely focused on crash faults and often fails to ensure that task deadlines are met during fault recovery. This paper presents a method that tolerates node failures caused by both faults and malicious intrusions, while ensuring task deadlines in multi-processor (or multi-core) real-time systems through active replication. Our approach leverages majority voting on outputs from replicated tasks across different nodes, enabling each task to validate incoming data before processing it further along the chain. Additionally, for systems using active replication, we present a formal job-level end-to-end latency analysis for cause-effect chains. To reduce the end-to-end latency of task chains, we propose a replica-to-node mapping strategy that enables improved worst-case response times. Experimental evaluations demonstrate that our latency-aware mapping reduces end-to-end latency compared to the commonly used worst-fit decreasing heuristic, although it may slightly reduce task acceptance ratios at high total utilizations.

Disciplines :

Computer science

Author, co-author :

NAGHAVI, Amin ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > APSIA

HU, Tingting ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)

NAVET, Nicolas ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)

External co-authors :

Language :

English

Title :

Tolerating Node Failures in Multi-Processor Real-Time Systems with Data Dependencies

Publication date :

2025

Event name :

IEEE International Conference on Embedded Software and Systems (ICESS)

Event date :

13-15 August

Journal title :

IEEE International Conference on High Performance Computing and Communications (HPCC)

Peer reviewed :

Peer reviewed

Focus Area :

Security, Reliability and Trust

Additional URL :

https://ieeexplore.ieee.org/document/11207347

Available on ORBilu :

since 20 July 2025

Statistics

Number of views

191 (29 by Unilu)

Number of downloads

147 (12 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenAlex citations

Bibliography

AUTOSAR, "Specification of timing extensions, " Release 4.0.1, 2009.
S. Liu, B. Yu, N. Guan, Z. Dong, and B. Akesson, "Realtime scheduling and analysis of an autonomous driving system, " Proc. RTSS Ind. Challenge Problem, 2021.
N. Feiertag, K. Richter, J. Nordlander, and J. Jonsson, "A compositional framework for end-to-end path delay calculation of automotive systems under different path semantics, " in Real-Time Syst. Symp. (RTSS), 2009.
M. Dürr, G. V. D. Brüggen, K.-H. Chen, and J.-J. Chen, "End-to-end timing analysis of sporadic cause-effect chains in distributed systems, " ACM Trans. on Embed. Comp. Syst. (TECS), vol. 18, no. 5s, pp. 1-24, 2019.
N. Gandhi, E. Roth, B. Sandler, A. Haeberlen, and L. T. X. Phan, "Rebound: Defending distributed systems against attacks with bounded-time recovery, " in Proc. of the European Conf. on Comp. Syst., 2021, p. 523-539.
A. Naghavi and N. Navet, "Total execution order in faulttolerant real-time systems, " in Proc. of Int. Conf. on Real-Time Networks and Syst. (RTNS), 2025, p. 12-24.
K. Klobedanz, J. Jatzkowski, A. Rettberg, and W. Mueller, "Fault-tolerant deployment of real-time software in autosar ecu networks, " in Embed. Syst.: Design, Analysis and Verification, 2013, pp. 238-249.
A. Hamann, D. Dasari, S. Kramer, M. Pressler, and F. Wurst, "Communication Centric Design in Complex Automotive Embedded Systems, " in Euromicro Conf. on Real-Time Syst. (ECRTS), vol. 76, 2017, pp. 10:1-10:20.
M. Günzel, K.-H. Chen, N. Ueter, G. v. d. Brüggen, M. Dürr, and J.-J. Chen, "Timing analysis of asynchronized distributed cause-effect chains, " in Real-Time and Embed. Tech. and Appl. Symp. (RTAS), 2021, pp. 40-52.
M. Günzel, H. Teper, K.-H. Chen, G. von der Brüggen, and J.-J. Chen, "On the Equivalence of Maximum Reaction Time and Maximum Data Age for Cause-Effect Chains, " in Euromicro Conf. on Real-Time Syst. (ECRTS), vol. 262, 2023, pp. 10:1-10:22.
A. Davare, Q. Zhu, M. Di Natale, C. Pinello, S. Kanajan, and A. Sangiovanni-Vincentelli, "Period optimization for hard realtime distributed automotive systems, " in Proc. of Annual Design Automation Conf., ser. DAC '07, 2007, p. 278-283.
J. Abdullah, G. Dai, and W. Yi, "Worst-case cause-effect reaction latency in systems with non-blocking communication, " in Design, Automation Test in Europe Conf. Exhibition (DATE), 2019, pp. 1625-1630.
M. Becker, D. Dasari, S. Mubeen, M. Behnam, and T. Nolte, "End-to-end timing analysis of cause-effect chains in automotive embedded systems, " Journal of Syst. Architecture, vol. 80, pp. 104-113, 2017.
M. Becker, D. Dasari, S. Mubeen, M. Behnam, and T. Nolte, "Synthesizing job-level dependencies for automotive multi-rate effect chains, " in IEEE Int. Conf. on Embed. and Real-Time Comp. Syst. and Appl. (RTCSA), 2016, pp. 159-169.
L. Köhler, P. Hertha, M. Beckert, A. Bendrick, and R. Ernst, "Robust cause-effect chains with bounded execution time and system-level logical execution time, " ACM Trans. Embed. Comp. Syst., vol. 22, no. 3, Apr. 2023.
A. Naghavi, S. Safari, and S. Hessabi, "Tolerating permanent faults with low-energy overhead in multicore mixed-criticality systems, " IEEE Trans. on Emerging Topics in Computing, vol. 10, no. 2, pp. 985-996, 2022.
J. Day, An AUTOSAR-Compliant Automotive Platform for Meeting Reliability and Timing Constraints (2011-01-0448), 2016, pp. 33-47.
N. Mahmud, G. Rodriguez-Navas, H. Faragardi, S. Mubeen, and C. Seceleanu, "Power-aware allocation of fault-tolerant multirate autosar applications, " in 2018 25th Asia-Pacific Software Engineering Conf. (APSEC), 2018, pp. 199-208.
N. Mahmud, G. Rodriguez-Navasa, H. Reza, S. M. Faragardib, and C. Seceleanua, "Optimized allocation of fault-tolerant embedded software with end-to-end timing constraints, " Tech. Rep., 2019.
N. Gandhi, E. Roth, R. Gifford, L. T. X. Phan, and A. Haeberlen, "Bounded-time recovery for distributed real-time systems, " in Real-Time and Embed. Tech. and Appl. Symp. (RTAS), 2020, pp. 110-123.
S. Mallareddy, P. K. Kondooru, and D. Gangadharan, "Checkpointing-aware end-to-end data age analysis of task chains under transient faults, " in Int. Symp. on Real-Time Distributed Computing (ISORC), 2024, pp. 1-10.
P. Gohari, J. Voeten, and M. Nasri, "Towards a safe and latencyaware fault-tolerant scheduling technique for multi-rate task chains, " in Proc. of Int. Conf. on Real-Time Networks and Syst. (RTNS), 2025, p. 25-36.
P. Nasahl and N. Timmers, "Attacking autosar using software and hardware attacks, " Jul. 2019, escar USA.
P. Fara, G. Serra, A. Biondi, C. Donnarumma et al., "Scheduling replica voting in fixed-priority real-time systems, " in Euromicro Conf. on Real-Time Syst. (ECRTS), vol. 1, 2021.
D. Griffin, I. Bate, and R. I. Davis, "Generating utilization vectors for the systematic evaluation of schedulability tests, " in IEEE RTSS, 2020, pp. 76-88.
S. Kramer, D. Ziegenbein, and A. Hamann, "Real world automotive benchmarks for free, " in Int. Wkshp. on WATERS, vol. 130, 2015, p. 43.