858 resultados para ETL Processes
Resumo:
Modeling Extract-Transform-Load (ETL) processes of a Data Warehousing System has always been a challenge. The heterogeneity of the sources, the quality of the data obtained and the conciliation process are some of the issues that must be addressed in the design phase of this critical component. Commercial ETL tools often provide proprietary diagrammatic components and modeling languages that are not standard, thus not providing the ideal separation between a modeling platform and an execution platform. This separation in conjunction with the use of standard notations and languages is critical in a system that tends to evolve through time and which cannot be undermined by a normally expensive tool that becomes an unsatisfactory component. In this paper we demonstrate the application of Relational Algebra as a modeling language of an ETL system as an effort to standardize operations and provide a basis for uncommon ETL execution platforms.
Resumo:
Usually, data warehousing populating processes are data-oriented workflows composed by dozens of granular tasks that are responsible for the integration of data coming from different data sources. Specific subset of these tasks can be grouped on a collection together with their relationships in order to form higher- level constructs. Increasing task granularity allows for the generalization of processes, simplifying their views and providing methods to carry out expertise to new applications. Well-proven practices can be used to describe general solutions that use basic skeletons configured and instantiated according to a set of specific integration requirements. Patterns can be applied to ETL processes aiming to simplify not only a possible conceptual representation but also to reduce the gap that often exists between two design perspectives. In this paper, we demonstrate the feasibility and effectiveness of an ETL pattern-based approach using task clustering, analyzing a real world ETL scenario through the definitions of two commonly used clusters of tasks: a data lookup cluster and a data conciliation and integration cluster.
Resumo:
During the last few years many research efforts have been done to improve the design of ETL (Extract-Transform-Load) systems. ETL systems are considered very time-consuming, error-prone and complex involving several participants from different knowledge domains. ETL processes are one of the most important components of a data warehousing system that are strongly influenced by the complexity of business requirements, their changing and evolution. These aspects influence not only the structure of a data warehouse but also the structures of the data sources involved with. To minimize the negative impact of such variables, we propose the use of ETL patterns to build specific ETL packages. In this paper, we formalize this approach using BPMN (Business Process Modelling Language) for modelling more conceptual ETL workflows, mapping them to real execution primitives through the use of a domain-specific language that allows for the generation of specific instances that can be executed in an ETL commercial tool.
Resumo:
Esta dissertação incide sobre a problemática da construção de um data warehouse para a empresa AdClick que opera na área de marketing digital. O marketing digital é um tipo de marketing que utiliza os meios de comunicação digital, com a mesma finalidade do método tradicional que se traduz na divulgação de bens, negócios e serviços e a angariação de novos clientes. Existem diversas estratégias de marketing digital tendo em vista atingir tais objetivos, destacando-se o tráfego orgânico e tráfego pago. Onde o tráfego orgânico é caracterizado pelo desenvolvimento de ações de marketing que não envolvem quaisquer custos inerentes à divulgação e/ou angariação de potenciais clientes. Por sua vez o tráfego pago manifesta-se pela necessidade de investimento em campanhas capazes de impulsionar e atrair novos clientes. Inicialmente é feita uma abordagem do estado da arte sobre business intelligence e data warehousing, e apresentadas as suas principais vantagens as empresas. Os sistemas business intelligence são necessários, porque atualmente as empresas detêm elevados volumes de dados ricos em informação, que só serão devidamente explorados fazendo uso das potencialidades destes sistemas. Nesse sentido, o primeiro passo no desenvolvimento de um sistema business intelligence é concentrar todos os dados num sistema único integrado e capaz de dar apoio na tomada de decisões. É então aqui que encontramos a construção do data warehouse como o sistema único e ideal para este tipo de requisitos. Nesta dissertação foi elaborado o levantamento das fontes de dados que irão abastecer o data warehouse e iniciada a contextualização dos processos de negócio existentes na empresa. Após este momento deu-se início à construção do data warehouse, criação das dimensões e tabelas de factos e definição dos processos de extração e carregamento dos dados para o data warehouse. Assim como a criação das diversas views. Relativamente ao impacto que esta dissertação atingiu destacam-se as diversas vantagem a nível empresarial que a empresa parceira neste trabalho retira com a implementação do data warehouse e os processos de ETL para carregamento de todas as fontes de informação. Sendo que algumas vantagens são a centralização da informação, mais flexibilidade para os gestores na forma como acedem à informação. O tratamento dos dados de forma a ser possível a extração de informação a partir dos mesmos.
Resumo:
Aquest TFC consisteix en la creació d'un magatzem de dades que automatitzi la recollida de dades de l'estat dels embassaments de la Confederació Hidrogràfica Nord-Est mitjançant processos ETL, per posteriorment tractar aquestes dades amb processos PL/SQL amb l'objectiu de poder explotar aquestes dades mitjançant eines de Business Intelligence.
Resumo:
Tässä työssä tutkitaan tietovaraston latausprosessin kehittämisen nopeuttamista Mic-rosoft SQL Server 2008 -ympäristössä. Työn teoriaosuudet on tarkoitettu tukemaan sekä työn tutkimus- että käytännönosia. Aiheeseen liittyviä tutkimuksia käytiin läpi parhaiden latausprosessin kehittämiseen kuluvaa aikaa vähentävien tapojen selvittä-miseksi. Nykytutkimus keskittyy valmistajasta riippumattomien mallien kehittämiseen ja valmistajakohtaisen latausprosessin luomiseen näiden mallien pohjalta. Yleinen konsensus parhaan mallin suhteen kuitenkin puuttuu. Aiheeseen liittyvien tutkimusten pohjalta esitetään arkkitehtuuri, joka saattaisi tule-vaisuudessa vähentää latausprosessin kehittämiseen kuluvaa aikaa huomattavasti. Tästä arkkitehtuurista luotiin yksinkertaistettu versio sekä siihen pohjautuva sovellus nopeuttamaan latausprosessin kehittämistä Microsoftin ETL-työkalulla.
Resumo:
Pós-graduação em Ciência da Computação - IBILCE
Resumo:
Today it is easy to find a lot of tools to define data migration schemas among different types of information systems. Data migration processes use to be implemented on a very diverse range of applications, ranging from conventional operational systems to data warehousing platforms. The implementation of a data migration process often involves a serious planning, considering the development of conceptual migration schemas at early stages. Such schemas help architects and engineers to plan and discuss the most adequate way to migrate data between two different systems. In this paper we present and discuss a way for enriching data migration conceptual schemas in BPMN using a domain-specific language, demonstrating how to convert such enriched schemas to a first correspondent physical representation (a skeleton) in a conventional ETL implementation tool like Kettle.
Resumo:
The MAP-i Doctoral Programme in Informatics, of the Universities of Minho, Aveiro and Porto
Resumo:
Acid drainage influence on the water and sediment quality was investigated in a coal mining area (southern Brazil). Mine drainage showed pH between 3.2 and 4.6 and elevated concentrations of sulfate, As and metals, of which, Fe, Mn and Zn exceeded the limits for the emission of effluents stated in the Brazilian legislation. Arsenic also exceeded the limit, but only slightly. Groundwater monitoring wells from active mines and tailings piles showed pH interval and chemical concentrations similar to those of mine drainage. However, the river and ground water samples of municipal public water supplies revealed a pH range from 7.2 to 7.5 and low chemical concentrations, although Cd concentration slightly exceeded the limit adopted by Brazilian legislation for groundwater. In general, surface waters showed large pH range (6 to 10.8), and changes caused by acid drainage in the chemical composition of these waters were not very significant. Locally, acid drainage seemed to have dissolved carbonate rocks present in the local stratigraphic sequence, attenuating the dispersion of metals and As. Stream sediments presented anomalies of these elements, which were strongly dependent on the proximity of tailings piles and abandoned mines. We found that precipitation processes in sediments and the dilution of dissolved phases were responsible for the attenuation of the concentrations of the metals and As in the acid drainage and river water mixing zone. In general, a larger influence of mining activities on the chemical composition of the surface waters and sediments was observed when enrichment factors in relation to regional background levels were used.
Resumo:
The Centers for High Cost Medication (Centros de Medicação de Alto Custo, CEDMAC), Health Department, São Paulo were instituted by project in partnership with the Clinical Hospital of the Faculty of Medicine, USP, sponsored by the Foundation for Research Support of the State of São Paulo (Fundação de Amparo à Pesquisa do Estado de São Paulo, FAPESP) aimed at the formation of a statewide network for comprehensive care of patients referred for use of immunobiological agents in rheumatological diseases. The CEDMAC of Hospital de Clínicas, Universidade Estadual de Campinas (HC-Unicamp), implemented by the Division of Rheumatology, Faculty of Medical Sciences, identified the need for standardization of the multidisciplinary team conducts, in face of the specificity of care conducts, verifying the importance of describing, in manual format, their operational and technical processes. The aim of this study is to present the methodology applied to the elaboration of the CEDMAC/HC-Unicamp Manual as an institutional tool, with the aim of offering the best assistance and administrative quality. In the methodology for preparing the manuals at HC-Unicamp since 2008, the premise was to obtain a document that is participatory, multidisciplinary, focused on work processes integrated with institutional rules, with objective and didactic descriptions, in a standardized format and with electronic dissemination. The CEDMAC/HC-Unicamp Manual was elaborated in 10 months, with involvement of the entire multidisciplinary team, with 19 chapters on work processes and techniques, in addition to those concerning the organizational structure and its annexes. Published in the electronic portal of HC Manuals in July 2012 as an e-Book (ISBN 978-85-63274-17-5), the manual has been a valuable instrument in guiding professionals in healthcare, teaching and research activities.
Resumo:
Oral squamous cell carcinoma is the most common type of cancer in the oral cavity, representing more than 90% of all oral cancers. The characterization of altered molecules in oral cancer is essential to understand molecular mechanisms underlying tumor progression as well as to contribute to cancer biomarker and therapeutic target discovery. Proteoglycans are key molecular effectors of cell surface and pericellular microenvironments, performing multiple functions in cancer. Two of the major basement membrane proteoglycans, agrin and perlecan, were investigated in this study regarding their role in oral cancer. Using real time quantitative PCR (qRT-PCR), we showed that agrin and perlecan are highly expressed in oral squamous cell carcinoma. Interestingly, cell lines originated from distinct sites showed different expression of agrin and perlecan. Enzymatically targeting chondroitin sulfate modification by chondroitinase, oral squamous carcinoma cell line had a reduced ability to adhere to extracellular matrix proteins and increased sensibility to cisplatin. Additionally, knockdown of agrin and perlecan promoted a decrease on cell migration and adhesion, and on resistance of cells to cisplatin. Our study showed, for the first time, a negative regulation on oral cancer-associated events by either targeting chondroitin sulfate content or agrin and perlecan levels.
Resumo:
Vaso-occlusion, responsible for much of the morbidity of sickle-cell disease, is a complex multicellular process, apparently triggered by leukocyte adhesion to the vessel wall. The microcirculation represents a major site of leukocyte-endothelial interactions and vaso-occlusive processes. We have developed a biochip with subdividing interconnecting microchannels that decrease in size (40 μm to 10 μm in width), for use in conjunction with a precise microfluidic device, to mimic cell flow and adhesion through channels of sizes that approach those of the microcirculation. The biochips were utilized to observe the dynamics of the passage of neutrophils and red blood cells, isolated from healthy and sickle-cell anemia (SCA) individuals, through laminin or endothelial adhesion molecule-coated microchannels at physiologically relevant rates of flow and shear stress. Obstruction of E-selectin/intercellular adhesion molecule 1-coated biochip microchannels by SCA neutrophils was significantly greater than that observed for healthy neutrophils, particularly in the microchannels of 40-15 μm in width. Whereas SCA red blood cells alone did not significantly adhere to, or obstruct, microchannels, mixed suspensions of SCA neutrophils and red blood cells significantly adhered to and obstructed laminin-coated channels. Results from this in vitro microfluidic model support a primary role for leukocytes in the initiation of SCA occlusive processes in the microcirculation. This assay represents an easy-to-use and reproducible in vitro technique for understanding molecular mechanisms and cellular interactions occurring in subdividing microchannels of widths approaching those observed in the microvasculature. The assay could hold potential for testing drugs developed to inhibit occlusive mechanisms such as those observed in SCA and thrombotic diseases.
Resumo:
Below cloud scavenging processes have been investigated considering a numerical simulation, local atmospheric conditions and particulate matter (PM) concentrations, at different sites in Germany. The below cloud scavenging model has been coupled with bulk particulate matter counter TSI (Trust Portacounter dataset, consisting of the variability prediction of the particulate air concentrations during chosen rain events. The TSI samples and meteorological parameters were obtained during three winter Campaigns: at Deuselbach, March 1994, consisting in three different events; Sylt, April 1994 and; Freiburg, March 1995. The results show a good agreement between modeled and observed air concentrations, emphasizing the quality of the conceptual model used in the below cloud scavenging numerical modeling. The results between modeled and observed data have also presented high square Pearson coefficient correlations over 0.7 and significant, except the Freiburg Campaign event. The differences between numerical simulations and observed dataset are explained by the wind direction changes and, perhaps, the absence of advection mass terms inside the modeling. These results validate previous works based on the same conceptual model.
Resumo:
The Pantanal of Nhecolândia, the world's largest and most diversified field of tropical lakes, comprises approximately 10,000 lakes, which cover an area of 24,000 km² and vary greatly in salinity, pH, alkalinity, colour, physiography and biological activity. The hyposaline lakes have variable pHs, low alkalinity, macrophytes and low phytoplankton densities. The saline lakes have pHs above 9 or 10, high alkalinity, a high density of phytoplankton and sand beaches. The cause of the diversity of these lakes has been an open question, which we have addressed in our research. Here we propose a hybrid process, both geochemical and biological, as the main cause, including (1) a climate with an important water deficit and poverty in Ca2+ in both superficial and phreatic waters; and (2) an elevation of pH during cyanobacteria blooms. These two aspects destabilise the general tendency of Earth's surface waters towards a neutral pH. This imbalance results in an increase in the pH and dissolution of previously precipitated amorphous silica and quartzose sand. During extreme droughts, amorphous silica precipitates in the inter-granular spaces of the lake bottom sediment, increasing the isolation of the lake from the phreatic level. This paper discusses this biogeochemical problem in the light of physicochemical, chemical, altimetric and phytoplankton data.