Besseron, Xavier ; University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Engineering Research Unit ; Laboratoire d'Informatique de Grenoble > MOAIS project
Jafar, Samir; Laboratoire d'Informatique de Grenoble > MOAIS project
Gautier, Thierry; Laboratoire d'Informatique de Grenoble > MOAIS project
Roch, Jean Louis; Laboratoire d'Informatique de Grenoble > MOAIS project
External co-authors :
yes
Language :
English
Title :
CCK: An Improved Coordinated Checkpoint/Rollback Protocol for Dataflow Applications in Kaapi
Publication date :
April 2006
Event name :
IEEE Conference on Information and Communication Technologies: from Theory to Applications (ICTTA'06)
Event place :
Damascus, Syria
Event date :
from 24-04-2006 to 28-04-2006
Audience :
International
Main work title :
2006 2nd International Conference on Information & Communication Technologies
R. Baldoni. A communication-induced checkpointing protocol that ensures rollback-dependency trackability. In Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS '97), page 68. IEEE Computer Society, 1997.
D. Baraff and A. Witkin. Large steps in cloth simulation. In Computer Graphics Proceedings, Annual Conference Series, pages 43-54. SIGGRAPH, 1998.
A. Bouteiller, P. Lemarinier, G. Krawezik, and F. Cap-pello. Coordinated checkpoint versus message log for fault tolerant mpi. In proceedings of The 2003 IEEE International Conference on Cluster Computing, Honk Hong, China, 2003.
E. N. Mootaz Elnozahy, L. Alvisi, Y.-M. Wang, and Johnson D. B. A survey of rollback-recovery protocols in message-passing systems. ACM Comput. Surv., 34(3):375-08, 2002.
G. Cavalheiro, M. Doreille, F. Galilee, J.-L. Roch. Athapascan-1: On-line building data flow graph in a parallel language. In IEEE, editor, Pact'98, pages 88-95, Paris, France, October 1998.
L. V. Kale, G. Zheng, L. Shi. Ftc-charm++: An in-memory checkpoint-based fault tolerant runtime for charm++ and mpi. In 2004 IEEE International Conference on Cluster Computing, San Dieago, CA, September 2004.
Revire J., L. Roch, T. Gautier. Athapascan: Api for asynchronous parallel programming. Technical Report RT-0276, www-id.imag.fr/software/athl, Projet APACHE, INRIA, February 2003.
T. Ungerer, J. Silc, B. Robic. Asynchrony in parallel computing: from dataflow to multithreading, pages 133. Nova Science Publishers, Inc., 2001.
S. Jafar, T. Gautier, A. Krings, and J-L. Roch. A checkpoint/recovery model for heterogeneous dataflow computations using work-stealing. In Proceedings of (LNCS) Euro Par '05, Lisboa, Portugal, August 2005.
S. Jafar, A. Krings, T. Gautier, and J-L. Roch. Theft- induced checkpointing for reconfigurable dataflow applications. In Proceedings of the IEEE Electro/Information Technology Conference EIT2005, Lincoln, Nebraska, U.S.A., May 2005.
L. Lamport K. M. Chandy. Distributed snapshots: determining global states of distributed systems. ACM Trans. Comput. Syst., 3(1):63-75, 1985.
Laxmikant Kal, Robert Skeel, Milind Bhandarkar, Robert Brunner, Attila Gursoy, Neal Krawetz, James Phillips, Artiomo Shinozaki, Krishnan Varadarajan, and Klaus Schulten. Namd2: greater scalability for parallel molecular dynamics. J. Comput. Phys., 151(1):283-312, 1999.
G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar. Multilevel hypergraph partitioning: Applications in VLSI domain. Technical report, 1997.
A. Nguyen-Tuong, A. S. Grimshaw, and M. Hyett. Exploiting data-flow for fault-tolerance in a wide-area parallel system. In Proceedings 15 th Symposium on Reliable Distributed Systesm, pages 2-11, 1996.
F. Pellegrini and J. Roman. Experimental analysis of the dual recursive bipartitioning algorithm for static mapping. Technical Report 1038-96, 1996.
B. Randell. System structure for software fault tolerance. In Proceedings of the international conference on Reliable software, pages 437-149, 1975.
R. Revire, F. Zara, and T. Gautier. Efficient and easy parallel implementation of large numerical simulation. In Springer, editor, Proceedings of ParSim03 of Eu- roPVM/MP103, pages 663-666, Venice, Italy, 2003.
V. Strumpen. Compiler technology for portable checkpoints. Technical Report MA-02139, MIT Laboratory for Computer Science, Cambridge, 1998.
F. Zara, F. Faure, and J-M. Vincent. Physical cloth simulation on a pc cluster. In X. Pueyo D. Bartz and E. Rein- hard, editors, Fourth Eurographics Workshop on Parallel Graphics and Visualization 2002, Blaubeuren, Germany, September 2002.