Task clustering on ETL systems – A pattern-oriented approach


Autoria(s): Oliveira, Bruno Moisés Teixeira; Belo, O.
Data(s)

20/07/2015

Resumo

Usually, data warehousing populating processes are data-oriented workflows composed by dozens of granular tasks that are responsible for the integration of data coming from different data sources. Specific subset of these tasks can be grouped on a collection together with their relationships in order to form higher- level constructs. Increasing task granularity allows for the generalization of processes, simplifying their views and providing methods to carry out expertise to new applications. Well-proven practices can be used to describe general solutions that use basic skeletons configured and instantiated according to a set of specific integration requirements. Patterns can be applied to ETL processes aiming to simplify not only a possible conceptual representation but also to reduce the gap that often exists between two design perspectives. In this paper, we demonstrate the feasibility and effectiveness of an ETL pattern-based approach using task clustering, analyzing a real world ETL scenario through the definitions of two commonly used clusters of tasks: a data lookup cluster and a data conciliation and integration cluster.

Identificador

http://hdl.handle.net/1822/38340

Idioma(s)

por

Direitos

info:eu-repo/semantics/restrictedAccess

Palavras-Chave #Data Warehousing Systems #ETL Conceptual Modelling #Task Clustering #ETL Patterns #ETL Skeletons #BPMN #Kettle
Tipo

info:eu-repo/semantics/conferenceObject