Fault detection service architecture for grid computing systems


Autoria(s): Abawajy, Jemal
Data(s)

01/01/2004

Resumo

The ability to tolerate failures while effectively exploiting the grid computing resources in an scalable and transparent manner must be an integral part of grid computing infrastructure. Hence, fault-detection service is a necessary prerequisite to fault tolerance and fault recovery in grid computing. To this end, we present an scalable fault detection service architecture. The proposed fault-detection system provides services that monitors user applications, grid middlewares and the dynamically changing state of a collection of distributed resources. It reports summaries of this information to the appropriate agents on demand or instantaneously in the event of failures.

Identificador

http://hdl.handle.net/10536/DRO/DU:30002371

Idioma(s)

eng

Publicador

Springer-Verlag

Relação

http://dro.deakin.edu.au/eserv/DU:30002371/abawajy-faultdetection-2004.pdf

http://www.springerlink.com/content/t9jrblggtkh6qht4/fulltext.pdf

Direitos

2004, Springer-Verlag

Palavras-Chave #fault-tolerance #grid computing #fault-detection #grid scheduler #reconfigurable infrastructure
Tipo

Journal Article