802 resultados para outlier detection, data mining, gpgpu, gpu computing, supercomputing
Resumo:
Comunicación presentada en las XVI Jornadas de Ingeniería del Software y Bases de Datos, JISBD 2011, A Coruña, 5-7 septiembre 2011.
Resumo:
Em época de crise financeira, as ferramentas open source de data mining representam uma nova tendência na investigação, educação e nas aplicações industriais, especialmente para as pequenas e médias empresas. Com o software open source, estas podem facilmente iniciar um projeto de data mining usando as tecnologias mais recentes, sem se preocuparem com os custos de aquisição das mesmas, podendo apostar na aprendizagem dos seus colaboradores. Os sistemas open source proporcionam o acesso ao código, facilitando aos colaboradores a compreensão dos sistemas e algoritmos e permitindo que estes o adaptem às necessidades dos seus projetos. No entanto, existem algumas questões inerentes ao uso deste tipo de ferramenta. Uma das mais importantes é a diversidade, e descobrir, tardiamente, que a ferramenta escolhida é inapropriada para os objetivos do nosso negócio pode ser um problema grave. Como o número de ferramentas de data mining continua a crescer, a escolha sobre aquela que é realmente mais apropriada ao nosso negócio torna-se cada vez mais difícil. O presente estudo aborda um conjunto de ferramentas de data mining, de acordo com as suas características e funcionalidades. As ferramentas abordadas provém da listagem do KDnuggets referente a Software Suites de Data Mining. Posteriormente, são identificadas as que reúnem melhores condições de trabalho, que por sua vez são as mais populares nas comunidades, e é feito um teste prático com datasets reais. Os testes pretendem identificar como reagem as ferramentas a cenários diferentes do tipo: performance no processamento de grandes volumes de dados; precisão de resultados; etc. Nos tempos que correm, as ferramentas de data mining open source representam uma oportunidade para os seus utilizadores, principalmente para as pequenas e médias empresas, deste modo, os resultados deste estudo pretendem ajudar no processo de tomada de decisão relativamente às mesmas.
Resumo:
This paper proposes a novel application of fuzzy logic to web data mining for two basic problems of a website: popularity and satisfaction. Popularity means that people will visit the website while satisfaction refers to the usefulness of the site. We will illustrate that the popularity of a website is a fuzzy logic problem. It is an important characteristic of a website in order to survive in Internet commerce. The satisfaction of a website is also a fuzzy logic problem that represents the degree of success in the application of information technology to the business. We propose a framework of fuzzy logic for the representation of these two problems based on web data mining techniques to fuzzify the attributes of a website.
Resumo:
Electricity market price forecast is a changeling yet very important task for electricity market managers and participants. Due to the complexity and uncertainties in the power grid, electricity prices are highly volatile and normally carry with spikes. which may be (ens or even hundreds of times higher than the normal price. Such electricity spikes are very difficult to be predicted. So far. most of the research on electricity price forecast is based on the normal range electricity prices. This paper proposes a data mining based electricity price forecast framework, which can predict the normal price as well as the price spikes. The normal price can be, predicted by a previously proposed wavelet and neural network based forecast model, while the spikes are forecasted based on a data mining approach. This paper focuses on the spike prediction and explores the reasons for price spikes based on the measurement of a proposed composite supply-demand balance index (SDI) and relative demand index (RDI). These indices are able to reflect the relationship among electricity demand, electricity supply and electricity reserve capacity. The proposed model is based on a mining database including market clearing price, trading hour. electricity), demand, electricity supply and reserve. Bayesian classification and similarity searching techniques are used to mine the database to find out the internal relationships between electricity price spikes and these proposed. The mining results are used to form the price spike forecast model. This proposed model is able to generate forecasted price spike, level of spike and associated forecast confidence level. The model is tested with the Queensland electricity market data with promising results. Crown Copyright (C) 2004 Published by Elsevier B.V. All rights reserved.
Resumo:
Fuzzy data has grown to be an important factor in data mining. Whenever uncertainty exists, simulation can be used as a model. Simulation is very flexible, although it can involve significant levels of computation. This article discusses fuzzy decision-making using the grey related analysis method. Fuzzy models are expected to better reflect decision-making uncertainty, at some cost in accuracy relative to crisp models. Monte Carlo simulation is used to incorporate experimental levels of uncertainty into the data and to measure the impact of fuzzy decision tree models using categorical data. Results are compared with decision tree models based on crisp continuous data.
Resumo:
Today, the data available to tackle many scientific challenges is vast in quantity and diverse in nature. The exploration of heterogeneous information spaces requires suitable mining algorithms as well as effective visual interfaces. Most existing systems concentrate either on mining algorithms or on visualization techniques. Though visual methods developed in information visualization have been helpful, for improved understanding of a complex large high-dimensional dataset, there is a need for an effective projection of such a dataset onto a lower-dimension (2D or 3D) manifold. This paper introduces a flexible visual data mining framework which combines advanced projection algorithms developed in the machine learning domain and visual techniques developed in the information visualization domain. The framework follows Shneiderman’s mantra to provide an effective user interface. The advantage of such an interface is that the user is directly involved in the data mining process. We integrate principled projection methods, such as Generative Topographic Mapping (GTM) and Hierarchical GTM (HGTM), with powerful visual techniques, such as magnification factors, directional curvatures, parallel coordinates, billboarding, and user interaction facilities, to provide an integrated visual data mining framework. Results on a real life high-dimensional dataset from the chemoinformatics domain are also reported and discussed. Projection results of GTM are analytically compared with the projection results from other traditional projection methods, and it is also shown that the HGTM algorithm provides additional value for large datasets. The computational complexity of these algorithms is discussed to demonstrate their suitability for the visual data mining framework.
Resumo:
We introduce a flexible visual data mining framework which combines advanced projection algorithms from the machine learning domain and visual techniques developed in the information visualization domain. The advantage of such an interface is that the user is directly involved in the data mining process. We integrate principled projection algorithms, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), with powerful visual techniques, such as magnification factors, directional curvatures, parallel coordinates and billboarding, to provide a visual data mining framework. Results on a real-life chemoinformatics dataset using GTM are promising and have been analytically compared with the results from the traditional projection methods. It is also shown that the HGTM algorithm provides additional value for large datasets. The computational complexity of these algorithms is discussed to demonstrate their suitability for the visual data mining framework. Copyright 2006 ACM.
Resumo:
The purpose of this paper is to explain the notion of clustering and a concrete clustering method- agglomerative hierarchical clustering algorithm. It shows how a data mining method like clustering can be applied to the analysis of stocks, traded on the Bulgarian Stock Exchange in order to identify similar temporal behavior of the traded stocks. This problem is solved with the aid of a data mining tool that is called XLMiner™ for Microsoft Excel Office.
Resumo:
* The work is partially supported by Grant no. NIP917 of the Ministry of Science and Education – Republic of Bulgaria.
Resumo:
Dimensionality reduction is a very important step in the data mining process. In this paper, we consider feature extraction for classification tasks as a technique to overcome problems occurring because of “the curse of dimensionality”. Three different eigenvector-based feature extraction approaches are discussed and three different kinds of applications with respect to classification tasks are considered. The summary of obtained results concerning the accuracy of classification schemes is presented with the conclusion about the search for the most appropriate feature extraction method. The problem how to discover knowledge needed to integrate the feature extraction and classification processes is stated. A decision support system to aid in the integration of the feature extraction and classification processes is proposed. The goals and requirements set for the decision support system and its basic structure are defined. The means of knowledge acquisition needed to build up the proposed system are considered.
Resumo:
Categorising visitors based on their interaction with a website is a key problem in Web content usage. The clickstreams generated by various users often follow distinct patterns, the knowledge of which may help in providing customised content. This paper proposes an approach to clustering weblog data, based on ART2 neural networks. Due to the characteristics of the ART2 neural network model, the proposed approach can be used for unsupervised and self-learning data mining, which makes it adaptable to dynamically changing websites.