Biblioteca Digital

994 resultados para Data Warehouses

Data Warehouses

Relevância:

100.00% 100.00%

Publicador:

Veja mais

Modelo de gerenciamento de versões para evolução de Data Warehouses

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sistemas de tomada de decisão baseados em Data Warehouse (DW) estão sendo cada dia mais utilizados por grandes empresas e organizações. O modelo multidimensional de organização dos dados utilizado por estes sistemas, juntamente com as técnicas de processamento analítico on-line (OLAP), permitem análises complexas sobre o histórico dos negócios através de uma simples e intuitiva interface de consulta. Apesar dos DWs armazenarem dados históricos por natureza, as estruturas de organização e classificação destes dados, chamadas de dimensões, não possuem a rigor uma representação temporal, refletindo somente a estrutura corrente. Para um sistema destinado à análise de dados, a falta do histórico das dimensões impossibilita consultas sobre o ambiente real de contextualização dos dados passados. Além disso, as alterações dos esquemas multidimensionais precisam ser assistidas e gerenciadas por um modelo de evolução, de forma a garantir a consistência e integridade do modelo multidimensional sem a perda de informações relevantes. Neste trabalho são apresentadas dezessete operações de alteração de esquema e sete operações de alteração de instâncias para modelos multidimensionais de DW. Um modelo de versões, baseado na associação de intervalos de validade aos esquemas e instâncias, é proposto para o gerenciamento dessas operações. Todo o histórico de definições e de dados do DW é mantido por esse modelo, permitindo análises completas dos dados passados e da evolução do DW. Além de suportar consultas históricas sobre as definições e as instâncias do DW, o modelo também permite a manutenção de mais de um esquema ativo simultaneamente. Isto é, dois ou mais esquemas podem continuar a ter seus dados atualizados periodicamente, permitindo assim que as aplicações possam consultar dados recentes utilizando diferentes versões de esquema.

Veja mais

The SB-index and the HSB-index: efficient indices for spatial data warehouses

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spatial data warehouses (SDWs) allow for spatial analysis together with analytical multidimensional queries over huge volumes of data. The challenge is to retrieve data related to ad hoc spatial query windows according to spatial predicates, avoiding the high cost of joining large tables. Therefore, mechanisms to provide efficient query processing over SDWs are essential. In this paper, we propose two efficient indices for SDW: the SB-index and the HSB-index. The proposed indices share the following characteristics. They enable multidimensional queries with spatial predicate for SDW and also support predefined spatial hierarchies. Furthermore, they compute the spatial predicate and transform it into a conventional one, which can be evaluated together with other conventional predicates by accessing a star-join Bitmap index. While the SB-index has a sequential data structure, the HSB-index uses a hierarchical data structure to enable spatial objects clustering and a specialized buffer-pool to decrease the number of disk accesses. The advantages of the SB-index and the HSB-index over the DBMS resources for SDW indexing (i.e. star-join computation and materialized views) were investigated through performance tests, which issued roll-up operations extended with containment and intersection range queries. The performance results showed that improvements ranged from 68% up to 99% over both the star-join computation and the materialized view. Furthermore, the proposed indices proved to be very compact, adding only less than 1% to the storage requirements. Therefore, both the SB-index and the HSB-index are excellent choices for SDW indexing. Choosing between the SB-index and the HSB-index mainly depends on the query selectivity of spatial predicates. While low query selectivity benefits the HSB-index, the SB-index provides better performance for higher query selectivity.

Veja mais

Integrating the development of data mining and data warehouses via model-driven engineering

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data mining is one of the most important analysis techniques to automatically extract knowledge from large amount of data. Nowadays, data mining is based on low-level specifications of the employed techniques typically bounded to a specific analysis platform. Therefore, data mining lacks a modelling architecture that allows analysts to consider it as a truly software-engineering process. Bearing in mind this situation, we propose a model-driven approach which is based on (i) a conceptual modelling framework for data mining, and (ii) a set of model transformations to automatically generate both the data under analysis (that is deployed via data-warehousing technology) and the analysis models for data mining (tailored to a specific platform). Thus, analysts can concentrate on understanding the analysis problem via conceptual data-mining models instead of wasting efforts on low-level programming tasks related to the underlying-platform technical details. These time consuming tasks are now entrusted to the model-transformations scaffolding. The feasibility of our approach is shown by means of a hypothetical data-mining scenario where a time series analysis is required.

Veja mais

Materialized view maintenance in data warehouses

Relevância:

100.00% 100.00%

Publicador:

Veja mais

Building Data Warehouses Using Numbered Information Spaces

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An approach for organizing the information in the data warehouses is presented in the paper. The possibilities of the numbered information spaces for building data warehouses are discussed. An application is outlined in the paper.

Veja mais

Leveraging social network data for analytical CRM strategies : the introduction of social BI

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The skyrocketing trend for social media on the Internet greatly alters analytical Customer Relationship Management (CRM). Against this backdrop, the purpose of this paper is to advance the conceptual design of Business Intelligence (BI) systems with data identified from social networks. We develop an integrated social network data model, based on an in-depth analysis of Facebook. The data model can inform the design of data warehouses in order to offer new opportunities for CRM analyses, leading to a more consistent and richer picture of customers? characteristics, needs, wants, and demands. Four major contributions are offered. First, Social CRM and Social BI are introduced as emerging fields of research. Second, we develop a conceptual data model to identify and systematize the data available on online social networks. Third, based on the identified data, we design a multidimensional data model as an early contribution to the conceptual design of Social BI systems and demonstrate its application by developing management reports in a retail scenario. Fourth, intellectual challenges for advancing Social CRM and Social BI are discussed.

Veja mais

Information Accountability of Healthcare Big Data

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Huge amount of data are generated from a variety of information sources in healthcare while the data sources originate from a veracity of clinical information systems and corporate data warehouses. The data derived from the above data sources are used for analysis and trending purposes thus playing an influential role as a real time decision-making tool. The unstructured, narrative data provided by these data sources qualify as healthcare big-data and researchers argue that the application of big-data in healthcare might enable the accountability and efficiency.

Veja mais

P-found: grid-enabling distributed repositories of protein folding and unfolding simulations for data mining

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The P-found protein folding and unfolding simulation repository is designed to allow scientists to perform data mining and other analyses across large, distributed simulation data sets. There are two storage components in P-found: a primary repository of simulation data that is used to populate the second component, and a data warehouse that contains important molecular properties. These properties may be used for data mining studies. Here we demonstrate how grid technologies can support multiple, distributed P-found installations. In particular, we look at two aspects: firstly, how grid data management technologies can be used to access the distributed data warehouses; and secondly, how the grid can be used to transfer analysis programs to the primary repositories — this is an important and challenging aspect of P-found, due to the large data volumes involved and the desire of scientists to maintain control of their own data. The grid technologies we are developing with the P-found system will allow new large data sets of protein folding simulations to be accessed and analysed in novel ways, with significant potential for enabling scientific discovery.

Veja mais

Data Processing Modeling in Decision Support Systems.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Due to the advancement of both, information technology in general, and databases in particular; data storage devices are becoming cheaper and data processing speed is increasing. As result of this, organizations tend to store large volumes of data holding great potential information. Decision Support Systems, DSS try to use the stored data to obtain valuable information for organizations. In this paper, we use both data models and use cases to represent the functionality of data processing in DSS following Software Engineering processes. We propose a methodology to develop DSS in the Analysis phase, respective of data processing modeling. We have used, as a starting point, a data model adapted to the semantics involved in multidimensional databases or data warehouses, DW. Also, we have taken an algorithm that provides us with all the possible ways to automatically cross check multidimensional model data. Using the aforementioned, we propose diagrams and descriptions of use cases, which can be considered as patterns representing the DSS functionality, in regard to DW data processing, DW on which DSS are based. We highlight the reusability and automation benefits that this can be achieved, and we think this study can serve as a guide in the development of DSS.

Veja mais

Enrichment of the Phenotypic and Genotypic Data Warehouse analysis using Question Answering systems to facilitate the decision making process in cereal breeding programs

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Currently there are an overwhelming number of scientific publications in Life Sciences, especially in Genetics and Biotechnology. This huge amount of information is structured in corporate Data Warehouses (DW) or in Biological Databases (e.g. UniProt, RCSB Protein Data Bank, CEREALAB or GenBank), whose main drawback is its cost of updating that makes it obsolete easily. However, these Databases are the main tool for enterprises when they want to update their internal information, for example when a plant breeder enterprise needs to enrich its genetic information (internal structured Database) with recently discovered genes related to specific phenotypic traits (external unstructured data) in order to choose the desired parentals for breeding programs. In this paper, we propose to complement the internal information with external data from the Web using Question Answering (QA) techniques. We go a step further by providing a complete framework for integrating unstructured and structured information by combining traditional Databases and DW architectures with QA systems. The great advantage of our framework is that decision makers can compare instantaneously internal data with external data from competitors, thereby allowing taking quick strategic decisions based on richer data.

Veja mais

Analyzing the Data in OLAP Data Cubes

Relevância:

70.00% 70.00%

Publicador:

Resumo:

* Supported partially by the Bulgarian National Science Fund under Grant MM-1405/2004

Veja mais

Legal requirements on secondary use of medical data in the EU and USA – A case study

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The use of secondary data in health care research has become a very important issue over the past few years. Data from the treatment context are being used for evaluation of medical data for external quality assurance, as well as to answer medical questions in the form of registers and research databases. Additionally, the establishment of electronic clinical systems like data warehouses provides new opportunities for the secondary use of clinical data. Because health data is among the most sensitive information about an individual, the data must be safeguarded from disclosure.

Veja mais

Configurable Web Warehouses construction through BPM Systems

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The process of building Data Warehouses (DW) is well known with well defined stages but at the same time, mostly carried out manually by IT people in conjunction with business people. Web Warehouses (WW) are DW whose data sources are taken from the web. We define a flexible WW, which can be configured accordingly to different domains, through the selection of the web sources and the definition of data processing characteristics. A Business Process Management (BPM) System allows modeling and executing Business Processes (BPs) providing support for the automation of processes. To support the process of building flexible WW we propose a two BPs level: a configuration process to support the selection of web sources and the definition of schemas and mappings, and a feeding process which takes the defined configuration and loads the data into the WW. In this paper we present a proof of concept of both processes, with focus on the configuration process and the defined data.

Veja mais

A decision table method for randomness measurement

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Data quality has become a major concern for organisations. The rapid growth in the size and technology of a databases and data warehouses has brought significant advantages in accessing, storing, and retrieving information. At the same time, great challenges arise with rapid data throughput and heterogeneous accesses in terms of maintaining high data quality. Yet, despite the importance of data quality, literature has usually condensed data quality into detecting and correcting poor data such as outliers, incomplete or inaccurate values. As a result, organisations are unable to efficiently and effectively assess data quality. Having an accurate and proper data quality assessment method will enable users to benchmark their systems and monitor their improvement. This paper introduces a granules mining for measuring the random degree of error data which will enable decision makers to conduct accurate quality assessment and allocate the most severe data, thereby providing an accurate estimation of human and financial resources for conducting quality improvement tasks.

Veja mais

994 resultados para Data Warehouses

Filtro por publicador