An online data access prediction and optimization approach for distributed systems


Autoria(s): Ishii, Renato Porfirio; Mello, Rodrigo Fernandes de
Contribuinte(s)

UNIVERSIDADE DE SÃO PAULO

Data(s)

07/11/2013

07/11/2013

2012

Resumo

Current scientific applications have been producing large amounts of data. The processing, handling and analysis of such data require large-scale computing infrastructures such as clusters and grids. In this area, studies aim at improving the performance of data-intensive applications by optimizing data accesses. In order to achieve this goal, distributed storage systems have been considering techniques of data replication, migration, distribution, and access parallelism. However, the main drawback of those studies is that they do not take into account application behavior to perform data access optimization. This limitation motivated this paper which applies strategies to support the online prediction of application behavior in order to optimize data access operations on distributed systems, without requiring any information on past executions. In order to accomplish such a goal, this approach organizes application behaviors as time series and, then, analyzes and classifies those series according to their properties. By knowing properties, the approach selects modeling techniques to represent series and perform predictions, which are, later on, used to optimize data access operations. This new approach was implemented and evaluated using the OptorSim simulator, sponsored by the LHC-CERN project and widely employed by the scientific community. Experiments confirm this new approach reduces application execution time in about 50 percent, specially when handling large amounts of data.

FAPESP-Sao Paulo Research Foundation, Brazil [2011/02655-9]

CNPq-National Council for Scientific and Technological Development research funding agency [304338/2008-7 and 470739/2008-8]

Identificador

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, LOS ALAMITOS, v. 23, n. 6, p. 1017-1029, JUN, 2012

1045-9219

http://www.producao.usp.br/handle/BDPI/43238

10.1109/TPDS.2011.256

http://dx.doi.org/10.1109/TPDS.2011.256

Idioma(s)

eng

Publicador

IEEE COMPUTER SOC

LOS ALAMITOS

Relação

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

Direitos

restrictedAccess

Copyright IEEE COMPUTER SOC

Palavras-Chave #DISTRIBUTED COMPUTING #DISTRIBUTED FILE SYSTEM #DATA ACCESS OPTIMIZATION #TIME SERIES ANALYSIS #PREDICTION #TIME-SERIES #RECURRENCE PLOTS #PACKAGE #GRIDS #COMPUTER SCIENCE, THEORY & METHODS #ENGINEERING, ELECTRICAL & ELECTRONIC
Tipo

article

original article

publishedVersion