Biblioteca Digital

991 resultados para data warehouse

Data management as a part of performance management: Case study about production reporting development project

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Because of the increased availability of different kind of business intelligence technologies and tools it can be easy to fall in illusion that new technologies will automatically solve the problems of data management and reporting of the company. The management is not only about management of technology but also the management of processes and people. This thesis is focusing more into traditional data management and performance management of production processes which both can be seen as a requirement for long lasting development. Also some of the operative BI solutions are considered in the ideal state of reporting system. The objectives of this study are to examine what requirements effective performance management of production processes have for data management and reporting of the company and to see how they are effecting on the efficiency of it. The research is executed as a theoretical literary research about the subjects and as a qualitative case study about reporting development project of Finnsugar Ltd. The case study is examined through theoretical frameworks and by the active participant observation. To get a better picture about the ideal state of reporting system simple investment calculations are performed. According to the results of the research, requirements for effective performance management of production processes are automation in the collection of data, integration of operative databases, usage of efficient data management technologies like ETL (Extract, Transform, Load) processes, data warehouse (DW) and Online Analytical Processing (OLAP) and efficient management of processes, data and roles.

Towards data warehousing and mining of protein unfolding simulation data

Relevância:

70.00% 70.00%

Publicador:

Resumo:

OBJECTIVES: The prediction of protein structure and the precise understanding of protein folding and unfolding processes remains one of the greatest challenges in structural biology and bioinformatics. Computer simulations based on molecular dynamics (MD) are at the forefront of the effort to gain a deeper understanding of these complex processes. Currently, these MD simulations are usually on the order of tens of nanoseconds, generate a large amount of conformational data and are computationally expensive. More and more groups run such simulations and generate a myriad of data, which raises new challenges in managing and analyzing these data. Because the vast range of proteins researchers want to study and simulate, the computational effort needed to generate data, the large data volumes involved, and the different types of analyses scientists need to perform, it is desirable to provide a public repository allowing researchers to pool and share protein unfolding data. METHODS: To adequately organize, manage, and analyze the data generated by unfolding simulation studies, we designed a data warehouse system that is embedded in a grid environment to facilitate the seamless sharing of available computer resources and thus enable many groups to share complex molecular dynamics simulations on a more regular basis. RESULTS: To gain insight into the conformational fluctuations and stability of the monomeric forms of the amyloidogenic protein transthyretin (TTR), molecular dynamics unfolding simulations of the monomer of human TTR have been conducted. Trajectory data and meta-data of the wild-type (WT) protein and the highly amyloidogenic variant L55P-TTR represent the test case for the data warehouse. CONCLUSIONS: Web and grid services, especially pre-defined data mining services that can run on or 'near' the data repository of the data warehouse, are likely to play a pivotal role in the analysis of molecular dynamics unfolding data.

Grid warehousing of molecular dynamics protein unfolding data

Relevância:

70.00% 70.00%

Publicador:

Resumo:

With the increasing awareness of protein folding disorders, the explosion of genomic information, and the need for efficient ways to predict protein structure, protein folding and unfolding has become a central issue in molecular sciences research. Molecular dynamics computer simulations are increasingly employed to understand the folding and unfolding of proteins. Running protein unfolding simulations is computationally expensive and finding ways to enhance performance is a grid issue on its own. However, more and more groups run such simulations and generate a myriad of data, which raises new challenges in managing and analyzing these data. Because the vast range of proteins researchers want to study and simulate, the computational effort needed to generate data, the large data volumes involved, and the different types of analyses scientists need to perform, it is desirable to provide a public repository allowing researchers to pool and share protein unfolding data. This paper describes efforts to provide a grid-enabled data warehouse for protein unfolding data. We outline the challenge and present first results in the design and implementation of the data warehouse.

P-found: grid-enabling distributed repositories of protein folding and unfolding simulations for data mining

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The P-found protein folding and unfolding simulation repository is designed to allow scientists to perform data mining and other analyses across large, distributed simulation data sets. There are two storage components in P-found: a primary repository of simulation data that is used to populate the second component, and a data warehouse that contains important molecular properties. These properties may be used for data mining studies. Here we demonstrate how grid technologies can support multiple, distributed P-found installations. In particular, we look at two aspects: firstly, how grid data management technologies can be used to access the distributed data warehouses; and secondly, how the grid can be used to transfer analysis programs to the primary repositories — this is an important and challenging aspect of P-found, due to the large data volumes involved and the desire of scientists to maintain control of their own data. The grid technologies we are developing with the P-found system will allow new large data sets of protein folding simulations to be accessed and analysed in novel ways, with significant potential for enabling scientific discovery.

Data mining : aplicação voltada a geração de informações para tomada de decisão na Secretaria de Estado da Educação de São Paulo

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Trata da aplicação de ferramentas de Data Mining e do conceito de Data Warehouse à coleta e análise de dados obtidos a partir das ações da Secretaria de Estado da Educação de São Paulo. A variável dependente considerada na análise é o resultado do rendimento das escolas estaduais obtido através das notas de avaliação do SARESP (prova realizada no estado de São Paulo). O data warehouse possui ainda dados operacionais e de ações já realizadas, possibilitando análise de influência nos resultados

Modelo de gerenciamento de versões para evolução de Data Warehouses

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Sistemas de tomada de decisão baseados em Data Warehouse (DW) estão sendo cada dia mais utilizados por grandes empresas e organizações. O modelo multidimensional de organização dos dados utilizado por estes sistemas, juntamente com as técnicas de processamento analítico on-line (OLAP), permitem análises complexas sobre o histórico dos negócios através de uma simples e intuitiva interface de consulta. Apesar dos DWs armazenarem dados históricos por natureza, as estruturas de organização e classificação destes dados, chamadas de dimensões, não possuem a rigor uma representação temporal, refletindo somente a estrutura corrente. Para um sistema destinado à análise de dados, a falta do histórico das dimensões impossibilita consultas sobre o ambiente real de contextualização dos dados passados. Além disso, as alterações dos esquemas multidimensionais precisam ser assistidas e gerenciadas por um modelo de evolução, de forma a garantir a consistência e integridade do modelo multidimensional sem a perda de informações relevantes. Neste trabalho são apresentadas dezessete operações de alteração de esquema e sete operações de alteração de instâncias para modelos multidimensionais de DW. Um modelo de versões, baseado na associação de intervalos de validade aos esquemas e instâncias, é proposto para o gerenciamento dessas operações. Todo o histórico de definições e de dados do DW é mantido por esse modelo, permitindo análises completas dos dados passados e da evolução do DW. Além de suportar consultas históricas sobre as definições e as instâncias do DW, o modelo também permite a manutenção de mais de um esquema ativo simultaneamente. Isto é, dois ou mais esquemas podem continuar a ter seus dados atualizados periodicamente, permitindo assim que as aplicações possam consultar dados recentes utilizando diferentes versões de esquema.

The SB-index and the HSB-index: efficient indices for spatial data warehouses

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Spatial data warehouses (SDWs) allow for spatial analysis together with analytical multidimensional queries over huge volumes of data. The challenge is to retrieve data related to ad hoc spatial query windows according to spatial predicates, avoiding the high cost of joining large tables. Therefore, mechanisms to provide efficient query processing over SDWs are essential. In this paper, we propose two efficient indices for SDW: the SB-index and the HSB-index. The proposed indices share the following characteristics. They enable multidimensional queries with spatial predicate for SDW and also support predefined spatial hierarchies. Furthermore, they compute the spatial predicate and transform it into a conventional one, which can be evaluated together with other conventional predicates by accessing a star-join Bitmap index. While the SB-index has a sequential data structure, the HSB-index uses a hierarchical data structure to enable spatial objects clustering and a specialized buffer-pool to decrease the number of disk accesses. The advantages of the SB-index and the HSB-index over the DBMS resources for SDW indexing (i.e. star-join computation and materialized views) were investigated through performance tests, which issued roll-up operations extended with containment and intersection range queries. The performance results showed that improvements ranged from 68% up to 99% over both the star-join computation and the materialized view. Furthermore, the proposed indices proved to be very compact, adding only less than 1% to the storage requirements. Therefore, both the SB-index and the HSB-index are excellent choices for SDW indexing. Choosing between the SB-index and the HSB-index mainly depends on the query selectivity of spatial predicates. While low query selectivity benefits the HSB-index, the SB-index provides better performance for higher query selectivity.

Integrating structured and unstructured data in a business-intelligence-system

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The recent liberalization of the German energy market has forced the energy industry to develop and install new information systems to support agents on the energy trading floors in their analytical tasks. Besides classical approaches of building a data warehouse giving insight into the time series to understand market and pricing mechanisms, it is crucial to provide a variety of external data from the web. Weather information as well as political news or market rumors are relevant to give the appropriate interpretation to the variables of a volatile energy market. Starting from a multidimensional data model and a collection of buy and sell transactions a data warehouse is built that gives analytical support to the agents. Following the idea of web farming we harvest the web, match the external information sources after a filtering and evaluation process to the data warehouse objects, and present this qualified information on a user interface where market values are correlated with those external sources over the time axis.

Justification of Data Warehousing Projects

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Project justification is regarded as one of the major methodological deficits in Data Warehousing practice. As reasons for applying inappropriate methods, performing incomplete evaluations, or even entirely omitting justifications, the special nature of Data Warehousing benefits and the large portion of infrastructure-related activities are stated. In this paper, the economic justification of Data Warehousing projects is analyzed, and first results from a large academiaindustry collaboration project in the field of non-technical issues of Data Warehousing are presented. As conceptual foundations, the role of the Data Warehouse system in corporate application architectures is analyzed, and the specific properties of Data Warehousing projects are discussed. Based on an applicability analysis of traditional approaches to economic IT project justification, basic steps and responsibilities for the justification of Data Warehousing projects are derived.

Multibeam Converter - a tool to create Ocean Data View import files from multibeam files

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The software Multibeam Converter is a tool to convert files or folders of files (ascii/tab-separated data files with or without metaheader), downloaded from PANGAEA via the search engine or the data warehouse to the ODV import format, e.g. for visualization or further processing. MultibeamConverter is distributed as freeware for the operating systems Microsoft Windows, Apple OS X and Linux.

Data consistency management in an open smart home management platform

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper, the authors introduce a novel mechanism for data management in a middleware for smart home control, where a relational database and semantic ontology storage are used at the same time in a Data Warehouse. An annotation system has been designed for instructing the storage format and location, registering new ontology concepts and most importantly, guaranteeing the Data Consistency between the two storage methods. For easing the data persistence process, the Data Access Object (DAO) pattern is applied and optimized to enhance the Data Consistency assurance. Finally, this novel mechanism provides an easy manner for the development of applications and their integration with BATMP. Finally, an application named "Parameter Monitoring Service" is given as an example for assessing the feasibility of the system.

p-medicine: a medical informatics platform for integrated large scale heterogeneous patient data

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Secure access to patient data is becoming of increasing importance, as medical informatics grows in significance, to both assist with population health studies, and patient specific medicine in support of treatment. However, assembling the many different types of data emanating from the clinic is in itself a difficulty, and doing so across national borders compounds the problem. In this paper we present our solution: an easy to use distributed informatics platform embedding a state of the art data warehouse incorporating a secure pseudonymisation system protecting access to personal healthcare data. Using this system, a whole range of patient derived data, from genomics to imaging to clinical records, can be assembled and linked, and then connected with analytics tools that help us to understand the data. Research performed in this environment will have immediate clinical impact for personalised patient healthcare.

Towards a model-driven engineering approach of data mining

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Nowadays, data mining is based on low-level specications of the employed techniques typically bounded to a specic analysis platform. Therefore, data mining lacks a modelling architecture that allows analysts to consider it as a truly software-engineering process. Here, we propose a model-driven approach based on (i) a conceptual modelling framework for data mining, and (ii) a set of model transformations to automatically generate both the data under analysis (via data-warehousing technology) and the analysis models for data mining (tailored to a specic platform). Thus, analysts can concentrate on the analysis problem via conceptual data-mining models instead of low-level programming tasks related to the underlying-platform technical details. These tasks are now entrusted to the model-transformations scaffolding.

Integrating the development of data mining and data warehouses via model-driven engineering

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Data mining is one of the most important analysis techniques to automatically extract knowledge from large amount of data. Nowadays, data mining is based on low-level specifications of the employed techniques typically bounded to a specific analysis platform. Therefore, data mining lacks a modelling architecture that allows analysts to consider it as a truly software-engineering process. Bearing in mind this situation, we propose a model-driven approach which is based on (i) a conceptual modelling framework for data mining, and (ii) a set of model transformations to automatically generate both the data under analysis (that is deployed via data-warehousing technology) and the analysis models for data mining (tailored to a specific platform). Thus, analysts can concentrate on understanding the analysis problem via conceptual data-mining models instead of wasting efforts on low-level programming tasks related to the underlying-platform technical details. These time consuming tasks are now entrusted to the model-transformations scaffolding. The feasibility of our approach is shown by means of a hypothetical data-mining scenario where a time series analysis is required.

Pragmatic development of service based real-time change data capture

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This thesis makes a contribution to the Change Data Capture (CDC) field by providing an empirical evaluation on the performance of CDC architectures in the context of realtime data warehousing. CDC is a mechanism for providing data warehouse architectures with fresh data from Online Transaction Processing (OLTP) databases. There are two types of CDC architectures, pull architectures and push architectures. There is exiguous data on the performance of CDC architectures in a real-time environment. Performance data is required to determine the real-time viability of the two architectures. We propose that push CDC architectures are optimal for real-time CDC. However, push CDC architectures are seldom implemented because they are highly intrusive towards existing systems and arduous to maintain. As part of our contribution, we pragmatically develop a service based push CDC solution, which addresses the issues of intrusiveness and maintainability. Our solution uses Data Access Services (DAS) to decouple CDC logic from the applications. A requirement for the DAS is to place minimal overhead on a transaction in an OLTP environment. We synthesize DAS literature and pragmatically develop DAS that eciently execute transactions in an OLTP environment. Essentially we develop effeicient RESTful DAS, which expose Transactions As A Resource (TAAR). We evaluate the TAAR solution and three pull CDC mechanisms in a real-time environment, using the industry recognised TPC-C benchmark. The optimal CDC mechanism in a real-time environment, will capture change data with minimal latency and will have a negligible affect on the database's transactional throughput. Capture latency is the time it takes a CDC mechanism to capture a data change that has been applied to an OLTP database. A standard definition for capture latency and how to measure it does not exist in the field. We create this definition and extend the TPC-C benchmark to make the capture latency measurement. The results from our evaluation show that pull CDC is capable of real-time CDC at low levels of user concurrency. However, as the level of user concurrency scales upwards, pull CDC has a significant impact on the database's transaction rate, which affirms the theory that pull CDC architectures are not viable in a real-time architecture. TAAR CDC on the other hand is capable of real-time CDC, and places a minimal overhead on the transaction rate, although this performance is at the expense of CPU resources.

«
1
2
3
4
5
6
7
8
...
66
67
»