958 resultados para Data Warehouse
Resumo:
In this paper, the authors introduce a novel mechanism for data management in a middleware for smart home control, where a relational database and semantic ontology storage are used at the same time in a Data Warehouse. An annotation system has been designed for instructing the storage format and location, registering new ontology concepts and most importantly, guaranteeing the Data Consistency between the two storage methods. For easing the data persistence process, the Data Access Object (DAO) pattern is applied and optimized to enhance the Data Consistency assurance. Finally, this novel mechanism provides an easy manner for the development of applications and their integration with BATMP. Finally, an application named "Parameter Monitoring Service" is given as an example for assessing the feasibility of the system.
Resumo:
Secure access to patient data is becoming of increasing importance, as medical informatics grows in significance, to both assist with population health studies, and patient specific medicine in support of treatment. However, assembling the many different types of data emanating from the clinic is in itself a difficulty, and doing so across national borders compounds the problem. In this paper we present our solution: an easy to use distributed informatics platform embedding a state of the art data warehouse incorporating a secure pseudonymisation system protecting access to personal healthcare data. Using this system, a whole range of patient derived data, from genomics to imaging to clinical records, can be assembled and linked, and then connected with analytics tools that help us to understand the data. Research performed in this environment will have immediate clinical impact for personalised patient healthcare.
Resumo:
Nowadays, data mining is based on low-level specications of the employed techniques typically bounded to a specic analysis platform. Therefore, data mining lacks a modelling architecture that allows analysts to consider it as a truly software-engineering process. Here, we propose a model-driven approach based on (i) a conceptual modelling framework for data mining, and (ii) a set of model transformations to automatically generate both the data under analysis (via data-warehousing technology) and the analysis models for data mining (tailored to a specic platform). Thus, analysts can concentrate on the analysis problem via conceptual data-mining models instead of low-level programming tasks related to the underlying-platform technical details. These tasks are now entrusted to the model-transformations scaffolding.
Resumo:
Data mining is one of the most important analysis techniques to automatically extract knowledge from large amount of data. Nowadays, data mining is based on low-level specifications of the employed techniques typically bounded to a specific analysis platform. Therefore, data mining lacks a modelling architecture that allows analysts to consider it as a truly software-engineering process. Bearing in mind this situation, we propose a model-driven approach which is based on (i) a conceptual modelling framework for data mining, and (ii) a set of model transformations to automatically generate both the data under analysis (that is deployed via data-warehousing technology) and the analysis models for data mining (tailored to a specific platform). Thus, analysts can concentrate on understanding the analysis problem via conceptual data-mining models instead of wasting efforts on low-level programming tasks related to the underlying-platform technical details. These time consuming tasks are now entrusted to the model-transformations scaffolding. The feasibility of our approach is shown by means of a hypothetical data-mining scenario where a time series analysis is required.
Resumo:
This thesis makes a contribution to the Change Data Capture (CDC) field by providing an empirical evaluation on the performance of CDC architectures in the context of realtime data warehousing. CDC is a mechanism for providing data warehouse architectures with fresh data from Online Transaction Processing (OLTP) databases. There are two types of CDC architectures, pull architectures and push architectures. There is exiguous data on the performance of CDC architectures in a real-time environment. Performance data is required to determine the real-time viability of the two architectures. We propose that push CDC architectures are optimal for real-time CDC. However, push CDC architectures are seldom implemented because they are highly intrusive towards existing systems and arduous to maintain. As part of our contribution, we pragmatically develop a service based push CDC solution, which addresses the issues of intrusiveness and maintainability. Our solution uses Data Access Services (DAS) to decouple CDC logic from the applications. A requirement for the DAS is to place minimal overhead on a transaction in an OLTP environment. We synthesize DAS literature and pragmatically develop DAS that eciently execute transactions in an OLTP environment. Essentially we develop effeicient RESTful DAS, which expose Transactions As A Resource (TAAR). We evaluate the TAAR solution and three pull CDC mechanisms in a real-time environment, using the industry recognised TPC-C benchmark. The optimal CDC mechanism in a real-time environment, will capture change data with minimal latency and will have a negligible affect on the database's transactional throughput. Capture latency is the time it takes a CDC mechanism to capture a data change that has been applied to an OLTP database. A standard definition for capture latency and how to measure it does not exist in the field. We create this definition and extend the TPC-C benchmark to make the capture latency measurement. The results from our evaluation show that pull CDC is capable of real-time CDC at low levels of user concurrency. However, as the level of user concurrency scales upwards, pull CDC has a significant impact on the database's transaction rate, which affirms the theory that pull CDC architectures are not viable in a real-time architecture. TAAR CDC on the other hand is capable of real-time CDC, and places a minimal overhead on the transaction rate, although this performance is at the expense of CPU resources.
Resumo:
Il lavoro presentato in questo elaborato tratterà lo sviluppo di un sistema di alerting che consenta di monitorare proattivamente una o più sorgenti dati aziendali, segnalando le eventuali condizioni di irregolarità rilevate; questo verrà incluso all'interno di sistemi già esistenti dedicati all'analisi dei dati e alla pianificazione, ovvero i cosiddetti Decision Support Systems. Un sistema di supporto alle decisioni è in grado di fornire chiare informazioni per tutta la gestione dell'impresa, misurandone le performance e fornendo proiezioni sugli andamenti futuri. Questi sistemi vengono catalogati all'interno del più ampio ambito della Business Intelligence, che sottintende l'insieme di metodologie in grado di trasformare i dati di business in informazioni utili al processo decisionale. L'intero lavoro di tesi è stato svolto durante un periodo di tirocinio svolto presso Iconsulting S.p.A., IT System Integrator bolognese specializzato principalmente nello sviluppo di progetti di Business Intelligence, Enterprise Data Warehouse e Corporate Performance Management. Il software che verrà illustrato in questo elaborato è stato realizzato per essere collocato all'interno di un contesto più ampio, per rispondere ai requisiti di un cliente multinazionale leader nel settore della telefonia mobile e fissa.
Resumo:
During the SINOPS project, an optimal state of the art simulation of the marine silicon cycle is attempted employing a biogeochemical ocean general circulation model (BOGCM) through three particular time steps relevant for global (paleo-) climate. In order to tune the model optimally, results of the simulations are compared to a comprehensive data set of 'real' observations. SINOPS' scientific data management ensures that data structure becomes homogeneous throughout the project. Practical work routine comprises systematic progress from data acquisition, through preparation, processing, quality check and archiving, up to the presentation of data to the scientific community. Meta-information and analytical data are mapped by an n-dimensional catalogue in order to itemize the analytical value and to serve as an unambiguous identifier. In practice, data management is carried out by means of the online-accessible information system PANGAEA, which offers a tool set comprising a data warehouse, Graphical Information System (GIS), 2-D plot, cross-section plot, etc. and whose multidimensional data model promotes scientific data mining. Besides scientific and technical aspects, this alliance between scientific project team and data management crew serves to integrate the participants and allows them to gain mutual respect and appreciation.
Resumo:
Pretende-se desenvolver um Data Warehouse para um grupo empresarial constituído por quatro empresas, tendo como objectivo primordial a consolidação de informação. A consolidação da informação é de extrema utilidade, uma vez que as empresas podem ter dados comuns, tais como, produtos ou clientes. O principal objectivo dos sistemas analíticos é permitir analisar os dados dos sistemas transacionais da organização, fazendo com que os utilizadores que nada percebem destes sistemas consigam ter apoio nas tomadas decisão de uma forma simples e eficaz. A utilização do Data Warehouse é útil no apoio a decisões, uma vez que torna os utilizadores autónomos na realização de análises. Os utilizadores deixam de estar dependentes de especialistas em informática para efectuar as suas consultas e passam a ser eles próprios a realizá-las. Por conseguinte, o tempo de execução de uma consulta através do Data Warehouse é de poucos segundos, ao contrário das consultas criadas anteriormente pelos especialistas que por vezes demoravam horas a ser executadas. __ ABSTRACT: lt is intended to develop a Data Warehouse for a business related group of four companies, having by main goal the information consolidation. This information consolidation is of extreme usefulness since the companies can have common data, such as products or customers. The main goal of the analytical systems is to allow analyze data from the organization transactional systems, making that the users that do not understand anything of these systems may have support in a simple and effective way in every process of taking decisions. Using the Data Warehouse is useful to support decisions, once it will allow users to become autonomous in carrying out analysis. Users will no longer depend on computer experts to make their own queries and they can do it themselves. Therefore, the time of a query through the Data Warehouse takes only a few seconds, unlike the earlier queries created previously by experts that sometimes took hours to run.
Resumo:
Data quality has become a major concern for organisations. The rapid growth in the size and technology of a databases and data warehouses has brought significant advantages in accessing, storing, and retrieving information. At the same time, great challenges arise with rapid data throughput and heterogeneous accesses in terms of maintaining high data quality. Yet, despite the importance of data quality, literature has usually condensed data quality into detecting and correcting poor data such as outliers, incomplete or inaccurate values. As a result, organisations are unable to efficiently and effectively assess data quality. Having an accurate and proper data quality assessment method will enable users to benchmark their systems and monitor their improvement. This paper introduces a granules mining for measuring the random degree of error data which will enable decision makers to conduct accurate quality assessment and allocate the most severe data, thereby providing an accurate estimation of human and financial resources for conducting quality improvement tasks.
Resumo:
Data warehouse projects, today, are in an ambivalent situation. On the one hand, data warehouses are critical for a company’s success and various methodological and technological tools are sophisticatedly developed to implement them. On the other hand, a significant amount of data warehouse projects fails due to non-technical reasons such as insufficient management support or in-corporative employees. But management support and user participation can be increased dramatically with specification methods that are understandable to these user groups. This paper aims at overcoming possible non-technical failure reasons by introducing a user-adequate specification approach within the field of management information systems.
Resumo:
Online business or Electronic Commerce (EC) is getting popular among customers today, as a result large number of product reviews have been posted online by the customers. This information is very valuable not only for prospective customers to make decision on buying product but also for companies to gather information of customers’ satisfaction about their products. Opinion mining is used to capture customer reviews and separated this review into subjective expressions (sentiment word) and objective expressions (no sentiment word). This paper proposes a novel, multi-dimensional model for opinion mining, which integrates customers’ characteristics and their opinion about any products. The model captures subjective expression from product reviews and transfers to fact table before representing in multi-dimensions named as customers, products, time and location. Data warehouse techniques such as OLAP and Data Cubes were used to analyze opinionated sentences. A comprehensive way to calculate customers’ orientation on products’ features and attributes are presented in this paper.
Resumo:
Since 2007, close collaboration between the Learning and Teaching Unit’s Academic Quality and Standards team and the Department of Reporting and Analysis’ Business Objects team resulted in a generational approach to reporting where QUT established a place of trust. This place of trust is where data owners are confident in date storage, data integrity, reported and shared. While the role of the Department of Reporting and Analysis focused on the data warehouse, data security and publication of reports, the Academic Quality and Standards team focused on the application of learning analytics to solve academic research questions and improve student learning. Addressing questions such as: • Are all students who leave course ABC academically challenged? • Do the students who leave course XYZ stay within the faculty, university or leave? • When students withdraw from a unit do they stay enrolled on full or part load or leave? • If students enter through a particular pathway, what is their experience in comparison to other pathways? • With five years historic reporting, can a two-year predictive forecast provide any insight? In answering these questions, the Academic Quality and Standards team then developed prototype data visualisation through curriculum conversations with academic staff. Where these enquiries were applicable more broadly this information would be brought into the standardised reporting for the benefit of the whole institution. At QUT an annual report to the executive committees allows all stakeholders to record the performance and outcomes of all courses in a snapshot in time or use this live report at any point during the year. This approach to learning analytics was awarded the Awarded 2014 ATEM/Campus Review Best Practice Awards in Tertiary Education Management for The Unipromo Award for Excellence in Information Technology Management.
Resumo:
El propóosito del proyecto aquíı descrito radica en, por una parte, sentar una base de un sistema de Business Inteligence adaptable a diversos casos de negocio, y por otra, diseñar e implementar una solución completa para una empresa especíıfica fácilmente adaptable a otro caso, incluyendo desde los procesos de Extracción, Transformación y Carga, pasando por el data warehouse hasta el Business Analysis y la Minería de Datos.
Resumo:
No Brasil, o início do processo de convergência às normas internacionais de contabilidade no setor público ocorre desde 2007 na União, nos Estados e nos Municípios, o que acaba gerando muitas mudanças e também muitos desafios na adoção dos novos procedimentos. Um dos novos procedimentos envolve a avaliação e depreciação do Ativo Imobilizado. Nota técnica divulgada recentemente pela STN descreve que os Entes estão encontrando dificuldades em adotar as novas regras. Nesse contexto, este estudo se propõe a responder a seguinte questão de pesquisa: como superar os desafios na implantação dos procedimentos contábeis sobre avaliação e depreciação do Ativo Imobilizado no Governo do Estado do Rio de Janeiro? Tem como objetivo geral identificar os desafios na implantação dos procedimentos contábeis sobre avaliação e depreciação do Ativo Imobilizado no Governo do Estado do Rio de Janeiro e como objetivo específico investigar e analisar a estrutura contábil e patrimonial, assim como propor soluções básicas e essenciais para a aplicação dos procedimentos contábeis. Quanto aos fins, foi realizada pesquisa descritiva e quanto aos meios, foi realizada pesquisa bibliográfica, documental e o estudo de caso, com a realização de entrevistas com os responsáveis de patrimônio e almoxarifado de 23 órgãos da Administração Direta do Estado do Rio de Janeiro. A análise dos dados coletados revela que não há integração entre o setor contábil, o setor de patrimônio e o setor de almoxarifado nestes órgãos. Os setores possuem baixo quantitativo de funcionários e estes são pouco valorizados, não existindo padronização dos procedimentos sobre gestão patrimonial. O desafio de adotar esses procedimentos ultrapassa a competência do setor de contabilidade e exige a integração dos setores de patrimônio, almoxarifado e contábil. Assim, o estudo propõe a aquisição ou desenvolvimento de um sistema integrado de controle de bens, em que a contabilidade, o patrimônio e o almoxarifado acessem os mesmos dados e possuam uma ferramenta de comunicação confiável, que possibilite a elaboração de relatórios que gerem informações úteis ao gestor e aos demais interessados. Propõe também a regulamentação dos novos procedimentos, o fortalecimento da carreira dos funcionários que atuam no patrimônio e no almoxarifado e orienta sobre a adoção de procedimentos iniciais, para o período de transição.
Resumo:
ETL过程是一个从分布数据源(包括数据库、应用系统、文件系统等)抽取数据,进行转换、集成和传输,并最终加载到目标系统的过程。传统的ETL过程主要服务于数据仓库(Data Warehouse),属于企业决策支持系统的一部分。随着数据集成技术的发展和轻量级的数据集成中间件的出现,ETL过程广泛应用于企业数据集成与数据交换系统。在ETL过程中,数据质量控制是一个极为重要的基本组件和功能,它对集成中的数据进行检测、转换、清洗,以防止“脏”数据进入目标系统。在ETL过程中如果缺少对数据质量的有效控制,就会导致数据集成项目无法圆满实现目标或彻底失败。 针对ETL过程中存在的数据质量问题,设计并实现面向ETL过程的数据质量控制系统,是本文研究的重点。论文通过对ETL过程中各阶段可能产生的数据质量问题进行了分类,并对质量控制需求建模,提出一个面向ETL过程的数据质量控制框架,该框架通过对源端数据的分析来指导ETL的设计,通过灵活、可配置、可扩展的数据处理机制实现数据的过滤、转换与清洗,并支持对数据质量处理全过程进行监控。在该框架基础上,论文特别在灵活的数据处理机制、数据分析、数据过滤和数据清洗四个方面进行了探讨。在数据处理机制方面,提出了基于插件元模型的数据处理机制,该机制可以满足用户对数据过滤、数据转换与数据清洗等功能的各种定制需求,并具有较强的可扩展性;在数据分析方面,根据字段类型对数据进行分类统计,并针对大数据量统计分析问题,提出了可自动配置的不同数据统计策略;在数据过滤方面,通过将抽取数据的SQL语句重写的方式,过滤不满足完整性约束的元组;在数据清洗方法方面给出了一种利用统计信息动态确定属性相似度权重的方法,对基于字段的相似记录检测算法的领域无关算法进行了改进,提高了数据检测的准确性。在上述工作基础上,在数据集成中间件OnceDI中设计并实现了数据质量控制系统,并在设计中通过设计模式的应用增强系统的可扩展性。