Paper published in a journal (Scientific congresses, symposiums and conference proceedings)
Tolerating Node Failures in Multi-Processor Real-Time Systems with Data Dependencies
NAGHAVI, Amin; HU, Tingting; NAVET, Nicolas
2025In IEEE International Conference on High Performance Computing and Communications (HPCC)
Peer reviewed
 

Files


Full Text
ICESS-173-Camera-Ready-PDF Express Verified.pdf
Author preprint (1.48 MB)
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
Real-Time Systems; Fault Tolerant; Cause Effect Chain; Data Dependency; Intrusion Tolerant
Abstract :
[en] In real-time systems with data-dependent tasks, ensuring both correctness and timeliness is critical not only for individual task executions but also for data processing across cause-effect chains. When these systems are deployed on multi-processor platforms, tasks belonging to the same chain might be distributed over multiple nodes. In such situations, data received from other nodes may be unreliable, as those nodes could be compromised by faults or malicious attacks. Prior research on fault-tolerant cause-effect chains has largely focused on crash faults and often fails to ensure that task deadlines are met during fault recovery. This paper presents a method that tolerates node failures caused by both faults and malicious intrusions, while ensuring task deadlines in multi-processor (or multi-core) real-time systems through active replication. Our approach leverages majority voting on outputs from replicated tasks across different nodes, enabling each task to validate incoming data before processing it further along the chain. Additionally, for systems using active replication, we present a formal job-level end-to-end latency analysis for cause-effect chains. To reduce the end-to-end latency of task chains, we propose a replica-to-node mapping strategy that enables improved worst-case response times. Experimental evaluations demonstrate that our latency-aware mapping reduces end-to-end latency compared to the commonly used worst-fit decreasing heuristic, although it may slightly reduce task acceptance ratios at high total utilizations.
Disciplines :
Computer science
Author, co-author :
NAGHAVI, Amin  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > APSIA
HU, Tingting ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
NAVET, Nicolas ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
External co-authors :
no
Language :
English
Title :
Tolerating Node Failures in Multi-Processor Real-Time Systems with Data Dependencies
Publication date :
2025
Event name :
IEEE International Conference on Embedded Software and Systems (ICESS)
Event date :
13-15 August
Journal title :
IEEE International Conference on High Performance Computing and Communications (HPCC)
Peer reviewed :
Peer reviewed
Focus Area :
Security, Reliability and Trust
Available on ORBilu :
since 20 July 2025

Statistics


Number of views
191 (29 by Unilu)
Number of downloads
147 (12 by Unilu)

Scopus citations®
 
0
Scopus citations®
without self-citations
0
OpenAlex citations
 
0

Bibliography


Similar publications



Contact ORBilu