Reference : Optimized Coordinated Checkpoint/Rollback Protocol using a Dataflow Graph Model
Scientific Presentations in Universities or Research Centers : Scientific presentation in universities or research centers
Engineering, computing & technology : Computer science
http://hdl.handle.net/10993/39970
Optimized Coordinated Checkpoint/Rollback Protocol using a Dataflow Graph Model
English
Besseron, Xavier mailto [University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Engineering Research Unit > ; Laboratoire d'Informatique de Grenoble > MOAIS project]
Gautier, Thierry [Laboratoire d'Informatique de Grenoble > MOAIS project]
22-Jan-2009
National
Workshop APRETAF : Algorithmes Parallèles, Répartis Et Tolérance Aux Fautes
from 22-01-2009 to 23-01-2009
Grenoble
France
[en] Grid ; Distributed Computing ; Fault Tolerance ; Dataflow graph
[en] Fault-tolerance protocols play an important role in today long runtime scienti\ufb01c parallel applications. The probability of a failure may be important due to the number of unreliable components involved during an execution. We present our approach and preliminary results about a new checkpoint/rollback protocol based on a coordinated scheme. The application is described using a dataflow graph, which is an abstract representation of the execution. Thanks to this representation, the fault recovery in our protocol only requires a partial restart of other processes. Simulations on a domain decomposition application show that the amount of computations required to restart and the number of involved processes are reduced compared to the classical global rollback protocol.
http://hdl.handle.net/10993/39970
http://www-verimag.imag.fr/~apretaf/APRETAF/APRETAF.html

File(s) associated to this reference

Fulltext file(s):

FileCommentaryVersionSizeAccess
Open access
talk_2009_apretaf.pdfAuthor postprint984.11 kBView/Open

Bookmark and Share SFX Query

All documents in ORBilu are protected by a user license.