844 resultados para Data mining and knowledge discovery
Resumo:
A descoberta de conhecimento em dados hoje em dia é um ponto forte para as empresas. Atualmente a CardMobili não dispõe de qualquer sistema de mineração de dados, sendo a existência deste uma mais-valia para as suas operações de marketing diárias, nomeadamente no lançamento de cupões a um grupo restrito de clientes com uma elevada probabilidade que os mesmos os utilizem. Para isso foi analisada a base de dados da aplicação tentando extrair o maior número de dados e aplicadas as transformações necessárias para posteriormente serem processados pelos algoritmos de mineração de dados. Durante a etapa de mineração de dados foram aplicadas as técnicas de associação e classificação, sendo que os melhores resultados foram obtidos com técnicas de associação. Desta maneira pretende-se que os resultados obtidos auxiliem o decisor na sua tomada de decisões.
Resumo:
Doctoral Thesis in Information Systems and Technologies Area of Engineering and Manag ement Information Systems
Resumo:
This paper presents an electricity medium voltage (MV) customer characterization framework supportedby knowledge discovery in database (KDD). The main idea is to identify typical load profiles (TLP) of MVconsumers and to develop a rule set for the automatic classification of new consumers. To achieve ourgoal a methodology is proposed consisting of several steps: data pre-processing; application of severalclustering algorithms to segment the daily load profiles; selection of the best partition, corresponding tothe best consumers’ segmentation, based on the assessments of several clustering validity indices; andfinally, a classification model is built based on the resulting clusters. To validate the proposed framework,a case study which includes a real database of MV consumers is performed.
Resumo:
This paper presents the Realistic Scenarios Generator (RealScen), a tool that processes data from real electricity markets to generate realistic scenarios that enable the modeling of electricity market players’ characteristics and strategic behavior. The proposed tool provides significant advantages to the decision making process in an electricity market environment, especially when coupled with a multi-agent electricity markets simulator. The generation of realistic scenarios is performed using mechanisms for intelligent data analysis, which are based on artificial intelligence and data mining algorithms. These techniques allow the study of realistic scenarios, adapted to the existing markets, and improve the representation of market entities as software agents, enabling a detailed modeling of their profiles and strategies. This work contributes significantly to the understanding of the interactions between the entities acting in electricity markets by increasing the capability and realism of market simulations.
Resumo:
Complex systems, i.e. systems composed of a large set of elements interacting in a non-linear way, are constantly found all around us. In the last decades, different approaches have been proposed toward their understanding, one of the most interesting being the Complex Network perspective. This legacy of the 18th century mathematical concepts proposed by Leonhard Euler is still current, and more and more relevant in real-world problems. In recent years, it has been demonstrated that network-based representations can yield relevant knowledge about complex systems. In spite of that, several problems have been detected, mainly related to the degree of subjectivity involved in the creation and evaluation of such network structures. In this Thesis, we propose addressing these problems by means of different data mining techniques, thus obtaining a novel hybrid approximation intermingling complex networks and data mining. Results indicate that such techniques can be effectively used to i) enable the creation of novel network representations, ii) reduce the dimensionality of analyzed systems by pre-selecting the most important elements, iii) describe complex networks, and iv) assist in the analysis of different network topologies. The soundness of such approach is validated through different validation cases drawn from actual biomedical problems, e.g. the diagnosis of cancer from tissue analysis, or the study of the dynamics of the brain under different neurological disorders.
Resumo:
Special issue guest editorial, June, 2015.
Resumo:
Football is considered nowadays one of the most popular sports. In the betting world, it has acquired an outstanding position, which moves millions of euros during the period of a single football match. The lack of profitability of football betting users has been stressed as a problem. This lack gave origin to this research proposal, which it is going to analyse the possibility of existing a way to support the users to increase their profits on their bets. Data mining models were induced with the purpose of supporting the gamblers to increase their profits in the medium/long term. Being conscience that the models can fail, the results achieved by four of the seven targets in the models are encouraging and suggest that the system can help to increase the profits. All defined targets have two possible classes to predict, for example, if there are more or less than 7.5 corners in a single game. The data mining models of the targets, more or less than 7.5 corners, 8.5 corners, 1.5 goals and 3.5 goals achieved the pre-defined thresholds. The models were implemented in a prototype, which it is a pervasive decision support system. This system was developed with the purpose to be an interface for any user, both for an expert user as to a user who has no knowledge in football games.
Resumo:
The DNA microarray technology has arguably caught the attention of the worldwide life science community and is now systematically supporting major discoveries in many fields of study. The majority of the initial technical challenges of conducting experiments are being resolved, only to be replaced with new informatics hurdles, including statistical analysis, data visualization, interpretation, and storage. Two systems of databases, one containing expression data and one containing annotation data are quickly becoming essential knowledge repositories of the research community. This present paper surveys several databases, which are considered "pillars" of research and important nodes in the network. This paper focuses on a generalized workflow scheme typical for microarray experiments using two examples related to cancer research. The workflow is used to reference appropriate databases and tools for each step in the process of array experimentation. Additionally, benefits and drawbacks of current array databases are addressed, and suggestions are made for their improvement.
Resumo:
Recent advances in machine learning methods enable increasingly the automatic construction of various types of computer assisted methods that have been difficult or laborious to program by human experts. The tasks for which this kind of tools are needed arise in many areas, here especially in the fields of bioinformatics and natural language processing. The machine learning methods may not work satisfactorily if they are not appropriately tailored to the task in question. However, their learning performance can often be improved by taking advantage of deeper insight of the application domain or the learning problem at hand. This thesis considers developing kernel-based learning algorithms incorporating this kind of prior knowledge of the task in question in an advantageous way. Moreover, computationally efficient algorithms for training the learning machines for specific tasks are presented. In the context of kernel-based learning methods, the incorporation of prior knowledge is often done by designing appropriate kernel functions. Another well-known way is to develop cost functions that fit to the task under consideration. For disambiguation tasks in natural language, we develop kernel functions that take account of the positional information and the mutual similarities of words. It is shown that the use of this information significantly improves the disambiguation performance of the learning machine. Further, we design a new cost function that is better suitable for the task of information retrieval and for more general ranking problems than the cost functions designed for regression and classification. We also consider other applications of the kernel-based learning algorithms such as text categorization, and pattern recognition in differential display. We develop computationally efficient algorithms for training the considered learning machines with the proposed kernel functions. We also design a fast cross-validation algorithm for regularized least-squares type of learning algorithm. Further, an efficient version of the regularized least-squares algorithm that can be used together with the new cost function for preference learning and ranking tasks is proposed. In summary, we demonstrate that the incorporation of prior knowledge is possible and beneficial, and novel advanced kernels and cost functions can be used in algorithms efficiently.
Resumo:
Formal Concept Analysis is an unsupervised learning technique for conceptual clustering. We introduce the notion of iceberg concept lattices and show their use in Knowledge Discovery in Databases (KDD). Iceberg lattices are designed for analyzing very large databases. In particular they serve as a condensed representation of frequent patterns as known from association rule mining. In order to show the interplay between Formal Concept Analysis and association rule mining, we discuss the algorithm TITANIC. We show that iceberg concept lattices are a starting point for computing condensed sets of association rules without loss of information, and are a visualization method for the resulting rules.
Resumo:
In a world of almost permanent and rapidly increasing electronic data availability, techniques of filtering, compressing, and interpreting this data to transform it into valuable and easily comprehensible information is of utmost importance. One key topic in this area is the capability to deduce future system behavior from a given data input. This book brings together for the first time the complete theory of data-based neurofuzzy modelling and the linguistic attributes of fuzzy logic in a single cohesive mathematical framework. After introducing the basic theory of data-based modelling, new concepts including extended additive and multiplicative submodels are developed and their extensions to state estimation and data fusion are derived. All these algorithms are illustrated with benchmark and real-life examples to demonstrate their efficiency. Chris Harris and his group have carried out pioneering work which has tied together the fields of neural networks and linguistic rule-based algortihms. This book is aimed at researchers and scientists in time series modeling, empirical data modeling, knowledge discovery, data mining, and data fusion.
Resumo:
Pocket Data Mining (PDM) describes the full process of analysing data streams in mobile ad hoc distributed environments. Advances in mobile devices like smart phones and tablet computers have made it possible for a wide range of applications to run in such an environment. In this paper, we propose the adoption of data stream classification techniques for PDM. Evident by a thorough experimental study, it has been proved that running heterogeneous/different, or homogeneous/similar data stream classification techniques over vertically partitioned data (data partitioned according to the feature space) results in comparable performance to batch and centralised learning techniques.
Resumo:
Advances in hardware and software technology enable us to collect, store and distribute large quantities of data on a very large scale. Automatically discovering and extracting hidden knowledge in the form of patterns from these large data volumes is known as data mining. Data mining technology is not only a part of business intelligence, but is also used in many other application areas such as research, marketing and financial analytics. For example medical scientists can use patterns extracted from historic patient data in order to determine if a new patient is likely to respond positively to a particular treatment or not; marketing analysts can use extracted patterns from customer data for future advertisement campaigns; finance experts have an interest in patterns that forecast the development of certain stock market shares for investment recommendations. However, extracting knowledge in the form of patterns from massive data volumes imposes a number of computational challenges in terms of processing time, memory, bandwidth and power consumption. These challenges have led to the development of parallel and distributed data analysis approaches and the utilisation of Grid and Cloud computing. This chapter gives an overview of parallel and distributed computing approaches and how they can be used to scale up data mining to large datasets.
Resumo:
Human brain imaging techniques, such as Magnetic Resonance Imaging (MRI) or Diffusion Tensor Imaging (DTI), have been established as scientific and diagnostic tools and their adoption is growing in popularity. Statistical methods, machine learning and data mining algorithms have successfully been adopted to extract predictive and descriptive models from neuroimage data. However, the knowledge discovery process typically requires also the adoption of pre-processing, post-processing and visualisation techniques in complex data workflows. Currently, a main problem for the integrated preprocessing and mining of MRI data is the lack of comprehensive platforms able to avoid the manual invocation of preprocessing and mining tools, that yields to an error-prone and inefficient process. In this work we present K-Surfer, a novel plug-in of the Konstanz Information Miner (KNIME) workbench, that automatizes the preprocessing of brain images and leverages the mining capabilities of KNIME in an integrated way. K-Surfer supports the importing, filtering, merging and pre-processing of neuroimage data from FreeSurfer, a tool for human brain MRI feature extraction and interpretation. K-Surfer automatizes the steps for importing FreeSurfer data, reducing time costs, eliminating human errors and enabling the design of complex analytics workflow for neuroimage data by leveraging the rich functionalities available in the KNIME workbench.
Resumo:
The domain of Knowledge Discovery (KD) and Data Mining (DM) is of growing importance in a time where more and more data is produced and knowledge is one of the most precious assets. Having explored both the existing underlying theory, the results of the ongoing research in academia and the industry practices in the domain of KD and DM, we have found that this is a domain that still lacks some systematization. We also found that this systematization exists to a greater degree in the Software Engineering and Requirements Engineering domains, probably due to being more mature areas. We believe that it is possible to improve and facilitate the participation of enterprise stakeholders in the requirements engineering for KD projects by systematizing requirements engineering process for such projects. This will, in turn, result in more projects that end successfully, that is, with satisfied stakeholders, including in terms of time and budget constraints. With this in mind and based on all information found in the state-of-the art, we propose SysPRE - Systematized Process for Requirements Engineering in KD projects. We begin by proposing an encompassing generic description of the KD process, where the main focus is on the Requirements Engineering activities. This description is then used as a base for the application of the Design and Engineering Methodology for Organizations (DEMO) so that we can specify a formal ontology for this process. The resulting SysPRE ontology can serve as a base that can be used not only to make enterprises become aware of their own KD process and requirements engineering process in the KD projects, but also to improve such processes in reality, namely in terms of success rate.