Fault-Tolerant Average Execution Time Optimization for General Purpose Multi-Processor System-on Chips


Autoria(s): Vayrynen, Mikael; Singh, Virendra; Larsson, Erik
Data(s)

24/04/2009

Resumo

Fault-tolerance is due to the semiconductor technology development important, not only for safety-critical systems but also for general-purpose (non-safety critical) systems. However, instead of guaranteeing that deadlines always are met, it is for general-purpose systems important to minimize the average execution time (AET) while ensuring fault-tolerance. For a given job and a soft (transient) error probability, we define mathematical formulas for AET that includes bus communication overhead for both voting (active replication) and rollback-recovery with checkpointing (RRC). And, for a given multi-processor system-on-chip (MPSoC), we define integer linear programming (ILP) models that minimize AET including bus communication overhead when: (1) selecting the number of checkpoints when using RRC, (2) finding the number of processors and job-to-processor assignment when using voting, and (3) defining fault-tolerance scheme (voting or RRC) per job and defining its usage for each job. Experiments demonstrate significant savings in AET.

Formato

application/pdf

Identificador

http://eprints.iisc.ernet.in/41243/1/Fault-Tolerant.pdf

Vayrynen, Mikael and Singh, Virendra and Larsson, Erik (2009) Fault-Tolerant Average Execution Time Optimization for General Purpose Multi-Processor System-on Chips. In: International Conference on Design Automation and Test in Europe (DATE), Nice, 20-24 April 2009 , Nice.

Publicador

IEEE

Relação

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5090713&tag=1

http://eprints.iisc.ernet.in/41243/

Palavras-Chave #Supercomputer Education & Research Centre
Tipo

Conference Paper

PeerReviewed