797 resultados para Técnicas de Data Mining


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Análisis multivariante con MDS

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Análisis multivariante de Componentes Principales (PCA)

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Tema 6. Text Mining con Topic Modeling.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Este trabajo analiza las nuevas tendencias en la creación y gestión de información geográfica, para la elaboración de modelos inductivos basados exclusivamente en bases de datos geográficas. Estos modelos permiten integrar grandes volúmenes de datos de características heterogéneas, lo que supone una gran complejidad técnica y metodológica. Se propone una metodología que permite conocer detalladamente la distribución de los recursos hídricos naturales en un territorio y derivar numerosas capas de información que puedan ser incorporadas a estos modelos «ávidos de datos» (data-hungry). La zona de estudio escogida para aplicar esta metodología es la comarca de la Marina Baja (Alicante), para la que se presenta un cálculo del balance hídrico espacial mediante el uso de herramientas estadísticas, geoestadísticas y Sistemas de Información Geográfica. Finalmente, todas las capas de información generadas (84) han sido validadas y se ha comprobado que su creación admite un cierto grado de automatización que permitirá incorporarlas en análisis de Minería de Datos más amplios.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Mode of access: Internet.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

As with all new ideas, the concept of Open Innovation requires extensive empirical investigation, testing and development. This paper analyzes Procter and Gamble's 'Connect and Develop' strategy as a case study of the major organizational and technological changes associated with open innovation. It argues that although some of the organizational changes accompanying open innovation are beginning to be described in the literature, more analysis is warranted into the ways technological changes have facilitated open innovation strategies, particularly related to new product development. Information and communications technologies enable the exchange of distributed sources of information in the open innovation process. The case study shows that furthermore a suite of new technologies for data mining, simulation, prototyping and visual representation, what we call 'innovation technology', help to support open innovation in Procter and Gamble. The paper concludes with a suggested research agenda for furthering understanding of the role played by and consequences of this technology.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Sharing data among organizations often leads to mutual benefit. Recent technology in data mining has enabled efficient extraction of knowledge from large databases. This, however, increases risks of disclosing the sensitive knowledge when the database is released to other parties. To address this privacy issue, one may sanitize the original database so that the sensitive knowledge is hidden. The challenge is to minimize the side effect on the quality of the sanitized database so that nonsensitive knowledge can still be mined. In this paper, we study such a problem in the context of hiding sensitive frequent itemsets by judiciously modifying the transactions in the database. To preserve the non-sensitive frequent itemsets, we propose a border-based approach to efficiently evaluate the impact of any modification to the database during the hiding process. The quality of database can be well maintained by greedily selecting the modifications with minimal side effect. Experiments results are also reported to show the effectiveness of the proposed approach. © 2005 IEEE

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The design, development, and use of complex systems models raises a unique class of challenges and potential pitfalls, many of which are commonly recurring problems. Over time, researchers gain experience in this form of modeling, choosing algorithms, techniques, and frameworks that improve the quality, confidence level, and speed of development of their models. This increasing collective experience of complex systems modellers is a resource that should be captured. Fields such as software engineering and architecture have benefited from the development of generic solutions to recurring problems, called patterns. Using pattern development techniques from these fields, insights from communities such as learning and information processing, data mining, bioinformatics, and agent-based modeling can be identified and captured. Collections of such 'pattern languages' would allow knowledge gained through experience to be readily accessible to less-experienced practitioners and to other domains. This paper proposes a methodology for capturing the wisdom of computational modelers by introducing example visualization patterns, and a pattern classification system for analyzing the relationship between micro and macro behaviour in complex systems models. We anticipate that a new field of complex systems patterns will provide an invaluable resource for both practicing and future generations of modelers.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Spatial data mining recently emerges from a number of real applications, such as real-estate marketing, urban planning, weather forecasting, medical image analysis, road traffic accident analysis, etc. It demands for efficient solutions for many new, expensive, and complicated problems. In this paper, we investigate the problem of evaluating the top k distinguished “features” for a “cluster” based on weighted proximity relationships between the cluster and features. We measure proximity in an average fashion to address possible nonuniform data distribution in a cluster. Combining a standard multi-step paradigm with new lower and upper proximity bounds, we presented an efficient algorithm to solve the problem. The algorithm is implemented in several different modes. Our experiment results not only give a comparison among them but also illustrate the efficiency of the algorithm.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A major task of traditional temporal event sequence mining is to find all frequent event patterns from a long temporal sequence. In many real applications, however, events are often grouped into different types, and not all types are of equal importance. In this paper, we consider the problem of efficient mining of temporal event sequences which lead to an instance of a specific type of event. Temporal constraints are used to ensure sensibility of the mining results. We will first generalise and formalise the problem of event-oriented temporal sequence data mining. After discussing some unique issues in this new problem, we give a set of criteria, which are adapted from traditional data mining techniques, to measure the quality of patterns to be discovered. Finally we present an algorithm to discover potentially interesting patterns.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Pattern discovery in temporal event sequences is of great importance in many application domains, such as telecommunication network fault analysis. In reality, not every type of event has an accurate timestamp. Some of them, defined as inaccurate events may only have an interval as possible time of occurrence. The existence of inaccurate events may cause uncertainty in event ordering. The traditional support model cannot deal with this uncertainty, which would cause some interesting patterns to be missing. A new concept, precise support, is introduced to evaluate the probability of a pattern contained in a sequence. Based on this new metric, we define the uncertainty model and present an algorithm to discover interesting patterns in the sequence database that has one type of inaccurate event. In our model, the number of types of inaccurate events can be extended to k readily, however, at a cost of increasing computational complexity.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Purpose – Academic writing is often considered to be a weakness in contemporary students, while good reporting and writing skills are highly valued by graduate employers. A number of universities have introduced writing centres aimed at addressing this problem; however, the evaluation of such centres is usually qualitative. The paper seeks to consider the efficacy of a writing centre by looking at the impact of attendance on two “real world” quantitative outcomes – achievement and progression. Design/methodology/approach – Data mining was used to obtain records of 806 first-year students, of whom 45 had attended the writing centre and 761 had not. Findings – A highly significant association between writing centre attendance and achievement was found. Progression to year two was also significantly associated with writing centre attendance. Originality/value – Further, quantitative evaluation of writing centres is advocated using random allocation to a comparison condition to control for potential confounds such as motivation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In the last two decades there have been substantial developments in the mathematical theory of inverse optimization problems, and their applications have expanded greatly. In parallel, time series analysis and forecasting have become increasingly important in various fields of research such as data mining, economics, business, engineering, medicine, politics, and many others. Despite the large uses of linear programming in forecasting models there is no a single application of inverse optimization reported in the forecasting literature when the time series data is available. Thus the goal of this paper is to introduce inverse optimization into forecasting field, and to provide a streamlined approach to time series analysis and forecasting using inverse linear programming. An application has been used to demonstrate the use of inverse forecasting developed in this study. © 2007 Elsevier Ltd. All rights reserved.