Varrette, Sébastien ; University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)
Language :
English
Title :
Probabilistic Certification of Divide & Conquer Algorithms on Global Computing Platforms. Application to Fault-Tolerant Exact Matrix-Vector Product
Publication date :
July 2007
Event name :
Proc. of the ACM Intl. Workshop on Parallel Symbolic Computation’07 (PASCO’07)
Event place :
London, ON, Canada
Event date :
July 27--28
Audience :
International
Main work title :
Proc. of the ACM Intl. Workshop on Parallel Symbolic Computation’07 (PASCO’07)
Publisher :
ACM
ISBN/EAN :
978-1-59593-741-4
Pages :
88-92
Peer reviewed :
Peer reviewed
Commentary :
Proc. of the ACM Intl. Workshop on Parallel Symbolic Computation'07 (PASCO'07)
M. A. Bender and M. O. Rabin. Online scheduling of parallel programs on heterogeneous systems with applications to cilk. Theory Comput. Syst., 35(3):289-304, 2002.
Z. Chen and J. J. Dongarra. Algorithm-Based Checkpoint-Free Fault Tolerance for Parallel Matrix Computations on Volatile Resources. Rhodes Island, Greece, april 2006.
M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the cilk-5 multithreaded language. In SIGPLAN Conference on Programming Language Design and Implementation, pages 212-223, 1998.
N. V. R. R. George A. Reis, Jonathan Chang and D. I. August. SWIFT: Software Implemented Fault Tolerance. In Proceedings of the Third International Symposium on Code Generation and Optimization (CGO), March 2005.
S. Jafar, T. Gautier, A. W. Krings, and J.-L. Roch. A checkpoint/recovery model for heterogeneous dataflow computations using work-stealing. In EUROPAR'2005, August 2005.
A. Krings, J.-L. Roch, S. Jafar, and S. Varrette. A Probabilistic Approach for Task and Result Certification of Large-scale Distributed Applications in Hostile Environments. In EGC2005, LNCS 3470. Springer Verlag, February 14-16 2005.
A. W. Krings, J.-L. Roch, and S. Jafar. Certification of large distributed computations with task dependencies in hostile environments. In EIT 2005, May 2005.
A. Li and B. Hong. A low-cost correction algorithm for transient data errors. In Ubiquity, volume 7, May 2006.
MOAIS Team. KAAPI. http://kaapi.gforge.inria.fr/, 2005.
J. S. Plank, Y. Kim, and J. Dongarra. Fault tolerant matrix operations for networks of workstations using diskless checkpointing. Journal of Parallel and Distributed Computing, 43(2):125-138, June 1997.
V. Pless. Introduction To The Theory of Error Correcting Codes. John Wiley Sons, 1990.
J.-L. Roch, D. Traore, and J. Bernard. On-line adaptive parallel prefix computation. In LNCS 4128, EUROPAR'2006, pages 843-850, August 2006.
G. K. Saha. Software based fault tolerance: a survey. Ubiquity, 7(25):1-1, 2006.
L. F. G. Sarmenta. Volunteer Computing. PhD thesis, Dept. of Electrical Engineering and Computer Science, MIT, March 2001.
S. Varrette, J.-L. Roch, J. Montagnat, L. Seitz, J.-M. Pierson, and F. Leprvost. Safe Distributed Architecture for Image-based Computer Assisted Diagnosis. In IEEE 1st International Workshop on Health Pervasive Systems (HPS'06), Lyon, France, june 2006.