994 resultados para Data Warehouses


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Data warehouse projects, today, are in an ambivalent situation. On the one hand, data warehouses are critical for a company’s success and various methodological and technological tools are sophisticatedly developed to implement them. On the other hand, a significant amount of data warehouse projects fails due to non-technical reasons such as insufficient management support or in-corporative employees. But management support and user participation can be increased dramatically with specification methods that are understandable to these user groups. This paper aims at overcoming possible non-technical failure reasons by introducing a user-adequate specification approach within the field of management information systems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Association rule mining is an indispensable tool for discovering
insights from large databases and data warehouses.
The data in a warehouse being multi-dimensional, it is often
useful to mine rules over subsets of data defined by selections
over the dimensions. Such interactive rule mining
over multi-dimensional query windows is difficult since rule
mining is computationally expensive. Current methods using
pre-computation of frequent itemsets require counting
of some itemsets by revisiting the transaction database at
query time, which is very expensive. We develop a method
(RMW) that identifies the minimal set of itemsets to compute
and store for each cell, so that rule mining over any
query window may be performed without going back to the
transaction database. We give formal proofs that the set of
itemsets chosen by RMW is sufficient to answer any query
and also prove that it is the optimal set to be computed
for 1 dimensional queries. We demonstrate through an extensive
empirical evaluation that RMW achieves extremely
fast query response time compared to existing methods, with
only moderate overhead in pre-computation and storage

Relevância:

60.00% 60.00%

Publicador:

Resumo:

É possível assistir nos dias de hoje, a um processo tecnológico evolutivo acentuado por toda a parte do globo. No caso das empresas, quer as pequenas, médias ou de grandes dimensões, estão cada vez mais dependentes dos sistemas informatizados para realizar os seus processos de negócio, e consequentemente à geração de informação referente aos negócios e onde, muitas das vezes, os dados não têm qualquer relacionamento entre si. A maioria dos sistemas convencionais informáticos não são projetados para gerir e armazenar informações estratégicas, impossibilitando assim que esta sirva de apoio como recurso estratégico. Portanto, as decisões são tomadas com base na experiência dos administradores, quando poderiam serem baseadas em factos históricos armazenados pelos diversos sistemas. Genericamente, as organizações possuem muitos dados, mas na maioria dos casos extraem pouca informação, o que é um problema em termos de mercados competitivos. Como as organizações procuram evoluir e superar a concorrência nas tomadas de decisão, surge neste contexto o termo Business Intelligence(BI). A GisGeo Information Systems é uma empresa que desenvolve software baseado em SIG (sistemas de informação geográfica) recorrendo a uma filosofia de ferramentas open-source. O seu principal produto baseia-se na localização geográfica dos vários tipos de viaturas, na recolha de dados, e consequentemente a sua análise (quilómetros percorridos, duração de uma viagem entre dois pontos definidos, consumo de combustível, etc.). Neste âmbito surge o tema deste projeto que tem objetivo de dar uma perspetiva diferente aos dados existentes, cruzando os conceitos BI com o sistema implementado na empresa de acordo com a sua filosofia. Neste projeto são abordados alguns dos conceitos mais importantes adjacentes a BI como, por exemplo, modelo dimensional, data Warehouse, o processo ETL e OLAP, seguindo a metodologia de Ralph Kimball. São também estudadas algumas das principais ferramentas open-source existentes no mercado, assim como quais as suas vantagens/desvantagens relativamente entre elas. Em conclusão, é então apresentada a solução desenvolvida de acordo com os critérios enumerados pela empresa como prova de conceito da aplicabilidade da área Business Intelligence ao ramo de Sistemas de informação Geográfica (SIG), recorrendo a uma ferramenta open-source que suporte visualização dos dados através de dashboards.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The P-found protein folding and unfolding simulation repository is designed to allow scientists to perform analyses across large, distributed simulation data sets. There are two storage components in P-found: a primary repository of simulation data and a data warehouse. Here we demonstrate how grid technologies can support multiple, distributed P-found installations. In particular we look at two aspects, first how grid data management technologies can be used to access the distributed data warehouses; and secondly, how the grid can be used to transfer analysis programs to the primary repositories --- this is an important and challenging aspect of P-found because the data volumes involved are too large to be centralised. The grid technologies we are developing with the P-found system will allow new large data sets of protein folding simulations to be accessed and analysed in novel ways, with significant potential for enabling new scientific discoveries.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Cada vez mais o tempo acaba sendo o diferencial de uma empresa para outra. As empresas, para serem bem sucedidas, precisam da informação certa, no momento certo e para as pessoas certas. Os dados outrora considerados importantes para a sobrevivência das empresas hoje precisam estar em formato de informações para serem utilizados. Essa é a função das ferramentas de “Business Intelligence”, cuja finalidade é modelar os dados para obter informações, de forma que diferencie as ações das empresas e essas consigam ser mais promissoras que as demais. “Business Intelligence” é um processo de coleta, análise e distribuição de dados para melhorar a decisão de negócios, que leva a informação a um número bem maior de usuários dentro da corporação. Existem vários tipos de ferramentas que se propõe a essa finalidade. Esse trabalho tem como objetivo comparar ferramentas através do estudo das técnicas de modelagem dimensional, fundamentais nos projetos de estruturas informacionais, suporte a “Data Warehouses”, “Data Marts”, “Data Mining” e outros, bem como o mercado, suas vantagens e desvantagens e a arquitetura tecnológica utilizada por estes produtos. Assim sendo, foram selecionados os conjuntos de ferramentas de “Business Intelligence” das empresas Microsoft Corporation e Oracle Corporation, visto as suas magnitudes no mundo da informática.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Many topics related to association mining have received attention in the research community, especially the ones focused on the discovery of interesting knowledge. A promising approach, related to this topic, is the application of clustering in the pre-processing step to aid the user to find the relevant associative patterns of the domain. In this paper, we propose nine metrics to support the evaluation of this kind of approach. The metrics are important since they provide criteria to: (a) analyze the methodologies, (b) identify their positive and negative aspects, (c) carry out comparisons among them and, therefore, (d) help the users to select the most suitable solution for their problems. Some experiments were done in order to present how the metrics can be used and their usefulness. © 2013 Springer-Verlag GmbH.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

La minería de datos es un campo de las ciencias de la computación referido al proceso que intenta descubrir patrones en grandes volúmenes de datos. La minería de datos busca generar información similar a la que podría producir un experto humano. Además es el proceso de descubrir conocimientos interesantes, como patrones, asociaciones, cambios, anomalías y estructuras significativas a partir de grandes cantidades de datos almacenadas en bases de datos, data warehouses o cualquier otro medio de almacenamiento de información. El aprendizaje automático o aprendizaje de máquinas es una rama de la Inteligencia artificial cuyo objetivo es desarrollar técnicas que permitan a las computadoras aprender. De forma más concreta, se trata de crear programas capaces de generalizar comportamientos a partir de una información no estructurada suministrada en forma de ejemplos. La minería de datos utiliza métodos de aprendizaje automático para descubrir y enumerar patrones presentes en los datos. En los últimos años se han aplicado las técnicas de clasificación y aprendizaje automático en un número elevado de ámbitos como el sanitario, comercial o de seguridad. Un ejemplo muy actual es la detección de comportamientos y transacciones fraudulentas en bancos. Una aplicación de interés es el uso de las técnicas desarrolladas para la detección de comportamientos fraudulentos en la identificación de usuarios existentes en el interior de entornos inteligentes sin necesidad de realizar un proceso de autenticación. Para comprobar que estas técnicas son efectivas durante la fase de análisis de una determinada solución, es necesario crear una plataforma que de soporte al desarrollo, validación y evaluación de algoritmos de aprendizaje y clasificación en los entornos de aplicación bajo estudio. El proyecto planteado está definido para la creación de una plataforma que permita evaluar algoritmos de aprendizaje automático como mecanismos de identificación en espacios inteligentes. Se estudiarán tanto los algoritmos propios de este tipo de técnicas como las plataformas actuales existentes para definir un conjunto de requisitos específicos de la plataforma a desarrollar. Tras el análisis se desarrollará parcialmente la plataforma. Tras el desarrollo se validará con pruebas de concepto y finalmente se verificará en un entorno de investigación a definir. ABSTRACT. The data mining is a field of the sciences of the computation referred to the process that it tries to discover patterns in big volumes of information. The data mining seeks to generate information similar to the one that a human expert might produce. In addition it is the process of discovering interesting knowledge, as patterns, associations, changes, abnormalities and significant structures from big quantities of information stored in databases, data warehouses or any other way of storage of information. The machine learning is a branch of the artificial Intelligence which aim is to develop technologies that they allow the computers to learn. More specifically, it is a question of creating programs capable of generalizing behaviors from not structured information supplied in the form of examples. The data mining uses methods of machine learning to discover and to enumerate present patterns in the information. In the last years there have been applied classification and machine learning techniques in a high number of areas such as healthcare, commercial or security. A very current example is the detection of behaviors and fraudulent transactions in banks. An application of interest is the use of the techniques developed for the detection of fraudulent behaviors in the identification of existing Users inside intelligent environments without need to realize a process of authentication. To verify these techniques are effective during the phase of analysis of a certain solution, it is necessary to create a platform that support the development, validation and evaluation of algorithms of learning and classification in the environments of application under study. The project proposed is defined for the creation of a platform that allows evaluating algorithms of machine learning as mechanisms of identification in intelligent spaces. There will be studied both the own algorithms of this type of technologies and the current existing platforms to define a set of specific requirements of the platform to develop. After the analysis the platform will develop partially. After the development it will be validated by prove of concept and finally verified in an environment of investigation that would be define.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Decision support systems (DSS) support business or organizational decision-making activities, which require the access to information that is internally stored in databases or data warehouses, and externally in the Web accessed by Information Retrieval (IR) or Question Answering (QA) systems. Graphical interfaces to query these sources of information ease to constrain dynamically query formulation based on user selections, but they present a lack of flexibility in query formulation, since the expressivity power is reduced to the user interface design. Natural language interfaces (NLI) are expected as the optimal solution. However, especially for non-expert users, a real natural communication is the most difficult to realize effectively. In this paper, we propose an NLI that improves the interaction between the user and the DSS by means of referencing previous questions or their answers (i.e. anaphora such as the pronoun reference in “What traits are affected by them?”), or by eliding parts of the question (i.e. ellipsis such as “And to glume colour?” after the question “Tell me the QTLs related to awn colour in wheat”). Moreover, in order to overcome one of the main problems of NLIs about the difficulty to adapt an NLI to a new domain, our proposal is based on ontologies that are obtained semi-automatically from a framework that allows the integration of internal and external, structured and unstructured information. Therefore, our proposal can interface with databases, data warehouses, QA and IR systems. Because of the high NL ambiguity of the resolution process, our proposal is presented as an authoring tool that helps the user to query efficiently in natural language. Finally, our proposal is tested on a DSS case scenario about Biotechnology and Agriculture, whose knowledge base is the CEREALAB database as internal structured data, and the Web (e.g. PubMed) as external unstructured information.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The majority of the organizations store their historical business information in data warehouses which are queried to make strategic decisions by using online analytical processing (OLAP) tools. This information has to be correctly assured against unauthorized accesses, but nevertheless there are a great amount of legacy OLAP applications that have been developed without considering security aspects or these have been incorporated once the system was implemented. This work defines a reverse engineering process that allows us to obtain the conceptual model corresponding to a legacy OLAP application, and also analyses and represents the security aspects that could have established. This process has been aligned with a model-driven architecture for developing secure OLAP applications by defining the transformations needed to automatically apply it. Once the conceptual model has been extracted, it can be easily modified and improved with security, and automatically transformed to generate the new implementation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Nos dias de hoje o acesso à informação por parte das empresas é vital para o bom desempenho das suas funções. As empresas de telecomunicações não fogem à regra, a sua posição no mercado está dependente das decisões que são tomadas com base na avaliação dessa informação. Para suportar os processos de apoio à decisão é coerente recorrer-se a Data Warehouses que permitem integrar informação de diversas fontes, verificando a sua qualidade, actualização e coerência, organizando-a para um fácil acesso e consulta de vários pontos de vista. Numa empresa de telecomunicações móvel, um Data Mart geográfico baseado na informação de tráfego da companhia que pode identificar as localizações preferenciais dos utilizadores na rede é muito importante porque fornece indicadores muito úteis para o departamento de marketing e negócio da empresa de maneira a que se saiba onde e como actuar para permitir que esta se desenvolva e ganhe vantagem no mercado. ABSTRACT: Today the access to information by enterprises is vital for the company’s performance. Telecommunications companies are no exception. Their position in the market is dependent on the decisions that are taken based on the evaluation of such information. To support the decision making process Data Warehouse is today an extremely useful tool; it integrates information from different sources, checking on its validity, quality and update, coherence, organizing it for an easy access and search from various perspectives. ln a mobile telecommunications company a geographical Data Mart-based traffic information that can identify the preferential locations of users on the network is very important It provides useful indicators to the Department of Marketing and Business there by allowing you to know where and how to act and boosting the development of the company.

Relevância:

20.00% 20.00%

Publicador: