Scientific presentation in universities or research centers (Scientific presentations in universities or research centers)
Optimized Coordinated Checkpoint/Rollback Protocol using a Dataflow Graph Model
BESSERON, Xavier; Gautier, Thierry


Full Text
Author postprint (1.01 MB)

All documents in ORBilu are protected by a user license.

Send to


Keywords :
Grid; Distributed Computing; Fault Tolerance; Dataflow graph
Abstract :
[en] Fault-tolerance protocols play an important role in today long runtime scienti\ufb01c parallel applications. The probability of a failure may be important due to the number of unreliable components involved during an execution. We present our approach and preliminary results about a new checkpoint/rollback protocol based on a coordinated scheme. The application is described using a dataflow graph, which is an abstract representation of the execution. Thanks to this representation, the fault recovery in our protocol only requires a partial restart of other processes. Simulations on a domain decomposition application show that the amount of computations required to restart and the number of involved processes are reduced compared to the classical global rollback protocol.
Disciplines :
Computer science
Author, co-author :
BESSERON, Xavier  ;  University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Engineering Research Unit ; Laboratoire d'Informatique de Grenoble > MOAIS project
Gautier, Thierry;  Laboratoire d'Informatique de Grenoble > MOAIS project
Language :
Title :
Optimized Coordinated Checkpoint/Rollback Protocol using a Dataflow Graph Model
Publication date :
22 January 2009
Event name :
Workshop APRETAF : Algorithmes Parallèles, Répartis Et Tolérance Aux Fautes
Event place :
Grenoble, France
Event date :
from 22-01-2009 to 23-01-2009
Available on ORBilu :
since 24 July 2019


Number of views
44 (5 by Unilu)
Number of downloads
12 (3 by Unilu)


Similar publications

Contact ORBilu