54 resultados para Multi-relational data mining
Resumo:
O aumento de tecnologias disponíveis na Web favoreceu o aparecimento de diversas formas de informação, recursos e serviços. Este aumento aliado à constante necessidade de formação e evolução das pessoas, quer a nível pessoal como profissional, incentivou o desenvolvimento área de sistemas de hipermédia adaptativa educacional - SHAE. Estes sistemas têm a capacidade de adaptar o ensino consoante o modelo do aluno, características pessoais, necessidades, entre outros aspetos. Os SHAE permitiram introduzir mudanças relativamente à forma de ensino, passando do ensino tradicional que se restringia apenas ao uso de livros escolares até à utilização de ferramentas informáticas que através do acesso à internet disponibilizam material didático, privilegiando o ensino individualizado. Os SHAE geram grande volume de dados, informação contida no modelo do aluno e todos os dados relativos ao processo de aprendizagem de cada aluno. Facilmente estes dados são ignorados e não se procede a uma análise cuidada que permita melhorar o conhecimento do comportamento dos alunos durante o processo de ensino, alterando a forma de aprendizagem de acordo com o aluno e favorecendo a melhoria dos resultados obtidos. O objetivo deste trabalho foi selecionar e aplicar algumas técnicas de Data Mining a um SHAE, PCMAT - Mathematics Collaborative Educational System. A aplicação destas técnicas deram origem a modelos de dados que transformaram os dados em informações úteis e compreensíveis, essenciais para a geração de novos perfis de alunos, padrões de comportamento de alunos, regras de adaptação e pedagógicas. Neste trabalho foram criados alguns modelos de dados recorrendo à técnica de Data Mining de classificação, abordando diferentes algoritmos. Os resultados obtidos permitirão definir novas regras de adaptação e padrões de comportamento dos alunos, poderá melhorar o processo de aprendizagem disponível num SHAE.
Resumo:
More than ever, there is an increase of the number of decision support methods and computer aided diagnostic systems applied to various areas of medicine. In breast cancer research, many works have been done in order to reduce false-positives when used as a double reading method. In this study, we aimed to present a set of data mining techniques that were applied to approach a decision support system in the area of breast cancer diagnosis. This method is geared to assist clinical practice in identifying mammographic findings such as microcalcifications, masses and even normal tissues, in order to avoid misdiagnosis. In this work a reliable database was used, with 410 images from about 115 patients, containing previous reviews performed by radiologists as microcalcifications, masses and also normal tissue findings. Throughout this work, two feature extraction techniques were used: the gray level co-occurrence matrix and the gray level run length matrix. For classification purposes, we considered various scenarios according to different distinct patterns of injuries and several classifiers in order to distinguish the best performance in each case described. The many classifiers used were Naïve Bayes, Support Vector Machines, k-nearest Neighbors and Decision Trees (J48 and Random Forests). The results in distinguishing mammographic findings revealed great percentages of PPV and very good accuracy values. Furthermore, it also presented other related results of classification of breast density and BI-RADS® scale. The best predictive method found for all tested groups was the Random Forest classifier, and the best performance has been achieved through the distinction of microcalcifications. The conclusions based on the several tested scenarios represent a new perspective in breast cancer diagnosis using data mining techniques.
Resumo:
This paper consists in the characterization of medium voltage (MV) electric power consumers based on a data clustering approach. It is intended to identify typical load profiles by selecting the best partition of a power consumption database among a pool of data partitions produced by several clustering algorithms. The best partition is selected using several cluster validity indices. These methods are intended to be used in a smart grid environment to extract useful knowledge about customers’ behavior. The data-mining-based methodology presented throughout the paper consists in several steps, namely the pre-processing data phase, clustering algorithms application and the evaluation of the quality of the partitions. To validate our approach, a case study with a real database of 1.022 MV consumers was used.
Resumo:
This paper presents an electricity medium voltage (MV) customer characterization framework supportedby knowledge discovery in database (KDD). The main idea is to identify typical load profiles (TLP) of MVconsumers and to develop a rule set for the automatic classification of new consumers. To achieve ourgoal a methodology is proposed consisting of several steps: data pre-processing; application of severalclustering algorithms to segment the daily load profiles; selection of the best partition, corresponding tothe best consumers’ segmentation, based on the assessments of several clustering validity indices; andfinally, a classification model is built based on the resulting clusters. To validate the proposed framework,a case study which includes a real database of MV consumers is performed.
Resumo:
Harnessing idle PCs CPU cycles, storage space and other resources of networked computers to collaborative are mainly fixated on for all major grid computing research projects. Most of the university computers labs are occupied with the high puissant desktop PC nowadays. It is plausible to notice that most of the time machines are lying idle or wasting their computing power without utilizing in felicitous ways. However, for intricate quandaries and for analyzing astronomically immense amounts of data, sizably voluminous computational resources are required. For such quandaries, one may run the analysis algorithms in very puissant and expensive computers, which reduces the number of users that can afford such data analysis tasks. Instead of utilizing single expensive machines, distributed computing systems, offers the possibility of utilizing a set of much less expensive machines to do the same task. BOINC and Condor projects have been prosperously utilized for solving authentic scientific research works around the world at a low cost. In this work the main goal is to explore both distributed computing to implement, Condor and BOINC, and utilize their potency to harness the ideal PCs resources for the academic researchers to utilize in their research work. In this thesis, Data mining tasks have been performed in implementation of several machine learning algorithms on the distributed computing environment.
Resumo:
Data Mining (DM) methods are being increasingly used in prediction with time series data, in addition to traditional statistical approaches. This paper presents a literature review of the use of DM with time series data, focusing on short- time stocks prediction. This is an area that has been attracting a great deal of attention from researchers in the field. The main contribution of this paper is to provide an outline of the use of DM with time series data, using mainly examples related with short-term stocks prediction. This is important to a better understanding of the field. Some of the main trends and open issues will also be introduced.
Resumo:
This document presents a tool able to automatically gather data provided by real energy markets and to generate scenarios, capture and improve market players’ profiles and strategies by using knowledge discovery processes in databases supported by artificial intelligence techniques, data mining algorithms and machine learning methods. It provides the means for generating scenarios with different dimensions and characteristics, ensuring the representation of real and adapted markets, and their participating entities. The scenarios generator module enhances the MASCEM (Multi-Agent Simulator of Competitive Electricity Markets) simulator, endowing a more effective tool for decision support. The achievements from the implementation of the proposed module enables researchers and electricity markets’ participating entities to analyze data, create real scenarios and make experiments with them. On the other hand, applying knowledge discovery techniques to real data also allows the improvement of MASCEM agents’ profiles and strategies resulting in a better representation of real market players’ behavior. This work aims to improve the comprehension of electricity markets and the interactions among the involved entities through adequate multi-agent simulation.
Resumo:
This paper presents a Multi-Agent Market simulator designed for developing new agent market strategies based on a complete understanding of buyer and seller behaviors, preference models and pricing algorithms, considering user risk preferences and game theory for scenario analysis. This tool studies negotiations based on different market mechanisms and, time and behavior dependent strategies. The results of the negotiations between agents are analyzed by data mining algorithms in order to extract rules that give agents feedback to improve their strategies. The system also includes agents that are capable of improving their performance with their own experience, by adapting to the market conditions, and capable of considering other agent reactions.
Resumo:
Mestrado em Engenharia Informática, Área de Especialização em Tecnologias do Conhecimento e da Decisão
Resumo:
A classificação automática de sons urbanos é importante para o monitoramento ambiental. Este trabalho apresenta uma nova metodologia para classificar sons urbanos, que se baseia na descoberta de padrões frequentes (motifs) nos sinais sonoros e utiliza-los como atributos para a classificação. Para extrair os motifs é utilizado um método de descoberta multi-resolução baseada em SAX. Para a classificação são usadas árvores de decisão e SVMs. Esta nova metodologia é comparada com outra bastante utilizada baseada em MFCC. Para a realização de experiências foi utilizado o dataset UrbanSound disponível publicamente. Realizadas as experiências, foi possível concluir que os atributos motif são melhores que os MFCC a discriminar sons com timbres semelhantes e que os melhores resultados são conseguidos com ambos os tipos de atributos combinados. Neste trabalho foi também desenvolvida uma aplicação móvel para Android que permite utilizar os métodos de classificação desenvolvidos num contexto de vida real e expandir o dataset.
Resumo:
C3S2E '16 Proceedings of the Ninth International C* Conference on Computer Science & Software Engineering
Resumo:
In the last years there has been a huge growth and consolidation of the Data Mining field. Some efforts are being done that seek the establishment of standards in the area. Included on these efforts there can be enumerated SEMMA and CRISP-DM. Both grow as industrial standards and define a set of sequential steps that pretends to guide the implementation of data mining applications. The question of the existence of substantial differences between them and the traditional KDD process arose. In this paper, is pretended to establish a parallel between these and the KDD process as well as an understanding of the similarities between them.
Resumo:
In the last years there has been a huge growth and consolidation of the Data Mining field. Some efforts are being done that seek the establishment of standards in the area. Included on these efforts there can be enumerated SEMMA and CRISP-DM. Both grow as industrial standards and define a set of sequential steps that pretends to guide the implementation of data mining applications. The question of the existence of substantial differences between them and the traditional KDD process arose. In this paper, is pretended to establish a parallel between these and the KDD process as well as an understanding of the similarities between them.
Resumo:
The present research paper presents five different clustering methods to identify typical load profiles of medium voltage (MV) electricity consumers. These methods are intended to be used in a smart grid environment to extract useful knowledge about customer’s behaviour. The obtained knowledge can be used to support a decision tool, not only for utilities but also for consumers. Load profiles can be used by the utilities to identify the aspects that cause system load peaks and enable the development of specific contracts with their customers. The framework presented throughout the paper consists in several steps, namely the pre-processing data phase, clustering algorithms application and the evaluation of the quality of the partition, which is supported by cluster validity indices. The process ends with the analysis of the discovered knowledge. To validate the proposed framework, a case study with a real database of 208 MV consumers is used.
Resumo:
With the electricity market liberalization, distribution and retail companies are looking for better market strategies based on adequate information upon the consumption patterns of its electricity customers. In this environment all consumers are free to choose their electricity supplier. A fair insight on the customer´s behaviour will permit the definition of specific contract aspects based on the different consumption patterns. In this paper Data Mining (DM) techniques are applied to electricity consumption data from a utility client’s database. To form the different customer´s classes, and find a set of representative consumption patterns, we have used the Two-Step algorithm which is a hierarchical clustering algorithm. Each consumer class will be represented by its load profile resulting from the clustering operation. Next, to characterize each consumer class a classification model will be constructed with the C5.0 classification algorithm.