Abstract :
[en] Replication and diversification are commonly used fault-tolerance techniques to mask accidental faults or malicious behavior of compromised nodes in cyber-physical systems. In event-driven systems, executing diversified replicated tasks across multiple nodes can result in their different execution orders. Implementing a total order protocol for job execution across all nodes ensures consistency and facilitates recovery in case of failures. However, achieving total order comes with significant costs due to the high communication and coordination demands among nodes. Existing solutions require coordination either before each job execution or at each job release. Moreover, some total order protocols may lead to unbounded priority inversion on certain nodes in order to maintain a global execution order. Malicious nodes can deliberately exploit these protocols to launch priority inversion attacks, thereby jeopardizing the timeliness of tasks on healthy nodes in time-critical applications. We propose a total order execution protocol that guarantees bounds on the priority inversion tasks experience and ensures that tasks meet their deadlines in real-time systems. Our approach withstands priority inversion attacks and leverages common knowledge among nodes rather than relying on communication, allowing them to progress independently while still ensuring a consistent execution order of job replicas across nodes upon their release. Although inter-node communication is not required, the method can benefit from exchanged progress data to reduce job response times. It is compatible with coarsely synchronized clocks and, unlike other total order approaches, which are for non-preemptive scheduling, uses progress milestones to enable task preemption. We evaluate our method against existing approaches based on acceptance ratio and response times, and study how job response times vary with increasing communication delays when the approach is used.
Scopus citations®
without self-citations
0