Exploiting operating system services to efficiently checkpoint parallel applications in GENESIS


Autoria(s): Rough, Justin; Goscinski, Andrzej
Contribuinte(s)

Zhou, Wanlei

Chi, Xue-bin

Goscinski, Andrzej

Li, Guo-jie

Data(s)

01/01/2002

Resumo

Recent research efforts of parallel processing on non-dedicated clusters have focused on high execution performance, parallelism management, transparent access to resources, and making clusters easy to use. However, as a collection of independent computers used by multiple users, clusters are susceptible to failure. This paper shows the development of a coordinated checkpointing facility for the GENESIS cluster operating system. This facility was developed by exploiting existing operating system services. High performance and low overheads are achieved by allowing the processes of a parallel application to continue executing during the creation of checkpoints, while maintaining low demands on cluster resources by using coordinated checkpointing.<br />

Identificador

http://hdl.handle.net/10536/DRO/DU:30004682

Idioma(s)

eng

Publicador

IEEE Xplore

Relação

http://dro.deakin.edu.au/eserv/DU:30004682/goscinski-exploitingoperatingsystem-2002.pdf

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1173584

Direitos

2002, IEEE

Tipo

Conference Paper