999 resultados para Data warehouses inductivas


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Doctoral Thesis in Information Systems and Technologies Area of Engineering and Manag ement Information Systems

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação apresentada como requisito parcial para obtenção do grau de Mestre em Estatística e Gestão de Informação

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Os recursos computacionais exigidos durante o processamento de grandes volumes de dados durante um processo de povoamento de um data warehouse faz com que a necessidade da procura de novas implementações tenha também em atenção a eficiência energética dos diversos componentes processuais que integram um qualquer sistema de povoamento. A lacuna de técnicas ou metodologias para categorizar e avaliar o consumo de energia em sistemas de povoamento de data warehouses é claramente notória. O acesso a esse tipo de informação possibilitaria a construção de sistemas de povoamento de data warehouses com níveis de consumo de energia mais baixos e, portanto, mais eficientes. Partindo da adaptação de técnicas aplicadas a sistemas de gestão de base de dados para a obtenção dos consumos energéticos da execução de interrogações, desenhámos e implementámos uma nova técnica que nos permite obter os consumos de energia para um qualquer processo de povoamento de um data warehouse, através da avaliação do consumo de cada um dos componentes utilizados na sua implementação utilizando uma ferramenta convencional. Neste artigo apresentamos a forma como fazemos tal avaliação, utilizando na demonstração da viabilidade da nossa proposta um processo de povoamento bastante típico em data warehouses – substituição encadeada de chaves operacionais -, que foi implementado através da ferramenta Kettle.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Magdeburg, Univ., Fak. für Informatik, Diss., 2013

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sistemas de tomada de decisão baseados em Data Warehouse (DW) estão sendo cada dia mais utilizados por grandes empresas e organizações. O modelo multidimensional de organização dos dados utilizado por estes sistemas, juntamente com as técnicas de processamento analítico on-line (OLAP), permitem análises complexas sobre o histórico dos negócios através de uma simples e intuitiva interface de consulta. Apesar dos DWs armazenarem dados históricos por natureza, as estruturas de organização e classificação destes dados, chamadas de dimensões, não possuem a rigor uma representação temporal, refletindo somente a estrutura corrente. Para um sistema destinado à análise de dados, a falta do histórico das dimensões impossibilita consultas sobre o ambiente real de contextualização dos dados passados. Além disso, as alterações dos esquemas multidimensionais precisam ser assistidas e gerenciadas por um modelo de evolução, de forma a garantir a consistência e integridade do modelo multidimensional sem a perda de informações relevantes. Neste trabalho são apresentadas dezessete operações de alteração de esquema e sete operações de alteração de instâncias para modelos multidimensionais de DW. Um modelo de versões, baseado na associação de intervalos de validade aos esquemas e instâncias, é proposto para o gerenciamento dessas operações. Todo o histórico de definições e de dados do DW é mantido por esse modelo, permitindo análises completas dos dados passados e da evolução do DW. Além de suportar consultas históricas sobre as definições e as instâncias do DW, o modelo também permite a manutenção de mais de um esquema ativo simultaneamente. Isto é, dois ou mais esquemas podem continuar a ter seus dados atualizados periodicamente, permitindo assim que as aplicações possam consultar dados recentes utilizando diferentes versões de esquema.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spatial data warehouses (SDWs) allow for spatial analysis together with analytical multidimensional queries over huge volumes of data. The challenge is to retrieve data related to ad hoc spatial query windows according to spatial predicates, avoiding the high cost of joining large tables. Therefore, mechanisms to provide efficient query processing over SDWs are essential. In this paper, we propose two efficient indices for SDW: the SB-index and the HSB-index. The proposed indices share the following characteristics. They enable multidimensional queries with spatial predicate for SDW and also support predefined spatial hierarchies. Furthermore, they compute the spatial predicate and transform it into a conventional one, which can be evaluated together with other conventional predicates by accessing a star-join Bitmap index. While the SB-index has a sequential data structure, the HSB-index uses a hierarchical data structure to enable spatial objects clustering and a specialized buffer-pool to decrease the number of disk accesses. The advantages of the SB-index and the HSB-index over the DBMS resources for SDW indexing (i.e. star-join computation and materialized views) were investigated through performance tests, which issued roll-up operations extended with containment and intersection range queries. The performance results showed that improvements ranged from 68% up to 99% over both the star-join computation and the materialized view. Furthermore, the proposed indices proved to be very compact, adding only less than 1% to the storage requirements. Therefore, both the SB-index and the HSB-index are excellent choices for SDW indexing. Choosing between the SB-index and the HSB-index mainly depends on the query selectivity of spatial predicates. While low query selectivity benefits the HSB-index, the SB-index provides better performance for higher query selectivity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data mining is one of the most important analysis techniques to automatically extract knowledge from large amount of data. Nowadays, data mining is based on low-level specifications of the employed techniques typically bounded to a specific analysis platform. Therefore, data mining lacks a modelling architecture that allows analysts to consider it as a truly software-engineering process. Bearing in mind this situation, we propose a model-driven approach which is based on (i) a conceptual modelling framework for data mining, and (ii) a set of model transformations to automatically generate both the data under analysis (that is deployed via data-warehousing technology) and the analysis models for data mining (tailored to a specific platform). Thus, analysts can concentrate on understanding the analysis problem via conceptual data-mining models instead of wasting efforts on low-level programming tasks related to the underlying-platform technical details. These time consuming tasks are now entrusted to the model-transformations scaffolding. The feasibility of our approach is shown by means of a hypothetical data-mining scenario where a time series analysis is required.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An approach for organizing the information in the data warehouses is presented in the paper. The possibilities of the numbered information spaces for building data warehouses are discussed. An application is outlined in the paper.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Geographic Data Warehouses (GDW) are one of the main technologies used in decision-making processes and spatial analysis, and the literature proposes several conceptual and logical data models for GDW. However, little effort has been focused on studying how spatial data redundancy affects SOLAP (Spatial On-Line Analytical Processing) query performance over GDW. In this paper, we investigate this issue. Firstly, we compare redundant and non-redundant GDW schemas and conclude that redundancy is related to high performance losses. We also analyze the issue of indexing, aiming at improving SOLAP query performance on a redundant GDW. Comparisons of the SB-index approach, the star-join aided by R-tree and the star-join aided by GiST indicate that the SB-index significantly improves the elapsed time in query processing from 25% up to 99% with regard to SOLAP queries defined over the spatial predicates of intersection, enclosure and containment and applied to roll-up and drill-down operations. We also investigate the impact of the increase in data volume on the performance. The increase did not impair the performance of the SB-index, which highly improved the elapsed time in query processing. Performance tests also show that the SB-index is far more compact than the star-join, requiring only a small fraction of at most 0.20% of the volume. Moreover, we propose a specific enhancement of the SB-index to deal with spatial data redundancy. This enhancement improved performance from 80 to 91% for redundant GDW schemas.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Mestrado em Engenharia Informática - Área de Especialização em Tecnologias do Conhecimento e Decisão

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The P-found protein folding and unfolding simulation repository is designed to allow scientists to perform data mining and other analyses across large, distributed simulation data sets. There are two storage components in P-found: a primary repository of simulation data that is used to populate the second component, and a data warehouse that contains important molecular properties. These properties may be used for data mining studies. Here we demonstrate how grid technologies can support multiple, distributed P-found installations. In particular, we look at two aspects: firstly, how grid data management technologies can be used to access the distributed data warehouses; and secondly, how the grid can be used to transfer analysis programs to the primary repositories — this is an important and challenging aspect of P-found, due to the large data volumes involved and the desire of scientists to maintain control of their own data. The grid technologies we are developing with the P-found system will allow new large data sets of protein folding simulations to be accessed and analysed in novel ways, with significant potential for enabling scientific discovery.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Due to the advancement of both, information technology in general, and databases in particular; data storage devices are becoming cheaper and data processing speed is increasing. As result of this, organizations tend to store large volumes of data holding great potential information. Decision Support Systems, DSS try to use the stored data to obtain valuable information for organizations. In this paper, we use both data models and use cases to represent the functionality of data processing in DSS following Software Engineering processes. We propose a methodology to develop DSS in the Analysis phase, respective of data processing modeling. We have used, as a starting point, a data model adapted to the semantics involved in multidimensional databases or data warehouses, DW. Also, we have taken an algorithm that provides us with all the possible ways to automatically cross check multidimensional model data. Using the aforementioned, we propose diagrams and descriptions of use cases, which can be considered as patterns representing the DSS functionality, in regard to DW data processing, DW on which DSS are based. We highlight the reusability and automation benefits that this can be achieved, and we think this study can serve as a guide in the development of DSS.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Currently there are an overwhelming number of scientific publications in Life Sciences, especially in Genetics and Biotechnology. This huge amount of information is structured in corporate Data Warehouses (DW) or in Biological Databases (e.g. UniProt, RCSB Protein Data Bank, CEREALAB or GenBank), whose main drawback is its cost of updating that makes it obsolete easily. However, these Databases are the main tool for enterprises when they want to update their internal information, for example when a plant breeder enterprise needs to enrich its genetic information (internal structured Database) with recently discovered genes related to specific phenotypic traits (external unstructured data) in order to choose the desired parentals for breeding programs. In this paper, we propose to complement the internal information with external data from the Web using Question Answering (QA) techniques. We go a step further by providing a complete framework for integrating unstructured and structured information by combining traditional Databases and DW architectures with QA systems. The great advantage of our framework is that decision makers can compare instantaneously internal data with external data from competitors, thereby allowing taking quick strategic decisions based on richer data.