43 resultados para KDD


Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the last years there has been a huge growth and consolidation of the Data Mining field. Some efforts are being done that seek the establishment of standards in the area. Included on these efforts there can be enumerated SEMMA and CRISP-DM. Both grow as industrial standards and define a set of sequential steps that pretends to guide the implementation of data mining applications. The question of the existence of substantial differences between them and the traditional KDD process arose. In this paper, is pretended to establish a parallel between these and the KDD process as well as an understanding of the similarities between them.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the last years there has been a huge growth and consolidation of the Data Mining field. Some efforts are being done that seek the establishment of standards in the area. Included on these efforts there can be enumerated SEMMA and CRISP-DM. Both grow as industrial standards and define a set of sequential steps that pretends to guide the implementation of data mining applications. The question of the existence of substantial differences between them and the traditional KDD process arose. In this paper, is pretended to establish a parallel between these and the KDD process as well as an understanding of the similarities between them.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

To be competitive in contemporary turbulent environments, firms must be capable of processing huge amounts of information, and effectively convert it into actionable knowledge. This is particularly the case in the marketing context, where problems are also usually highly complex, unstructured and ill-defined. In recent years, the development of marketing management support systems has paralleled this evolution in informational problems faced by managers, leading to a growth in the study (and use) of artificial intelligence and soft computing methodologies. Here, we present and implement a novel intelligent system that incorporates fuzzy logic and genetic algorithms to operate in an unsupervised manner. This approach allows the discovery of interesting association rules, which can be linguistically interpreted, in large scale databases (KDD or Knowledge Discovery in Databases.) We then demonstrate its application to a distribution channel problem. It is shown how the proposed system is able to return a number of novel and potentially-interesting associations among variables. Thus, it is argued that our method has significant potential to improve the analysis of marketing and business databases in practice, especially in non-programmed decisional scenarios, as well as to assist scholarly researchers in their exploratory analysis. © 2013 Elsevier Inc.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Alzheimer’s Disease and other dementias are one of the most challenging illnesses confronting countries with ageing populations. Treatment options for dementia are limited, and the costs are significant. There is a growing need to develop new treatments for dementia, especially for the elderly. There is also growing evidence that centrally acting angiotensin converting enzyme (ACE) inhibitors, which cross the blood-brain barrier, are associated with a reduced rate of cognitive and functional decline in dementia, especially in Alzheimer’s disease (AD). The aim of this research is to investigate the effects of centrally acting ACE inhibitors (CACE-Is) on the rate of cognitive and functional decline in dementia, using a three phased KDD process. KDD, as a scientific way to process and analysis clinical data, is used to find useful insights from a variety of clinical databases. The data used are from three clinic databases: Geriatric Assessment Tool (GAT), the Doxycycline and Rifampin for Alzheimer’s Disease (DARAD), and the Qmci validation databases, which were derived from several different geriatric clinics in Canada. This research involves patients diagnosed with AD, vascular or mixed dementia only. Patients were included if baseline and end-point (at least six months apart) Standardised Mini-Mental State Examination (SMMSE), Quick Mild Cognitive Impairment (Qmci) or Activities Daily Living (ADL) scores were available. Basically, the rates of change are compared between patients taking CACE-Is, and those not currently treated with CACE-Is. The results suggest that there is a statistically significant difference in the rate of decline in cognitive and functional scores between CACE-I and NoCACE-I patients. This research also validates that the Qmci, a new short assessment test, has potential to replace the current popular screening tests for cognition in the clinic and clinical trials.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The new technologies for Knowledge Discovery from Databases (KDD) and data mining promise to bring new insights into a voluminous growing amount of biological data. KDD technology is complementary to laboratory experimentation and helps speed up biological research. This article contains an introduction to KDD, a review of data mining tools, and their biological applications. We discuss the domain concepts related to biological data and databases, as well as current KDD and data mining developments in biology.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper deals with the establishment of a characterization methodology of electric power profiles of medium voltage (MV) consumers. The characterization is supported on the data base knowledge discovery process (KDD). Data Mining techniques are used with the purpose of obtaining typical load profiles of MV customers and specific knowledge of their customers’ consumption habits. In order to form the different customers’ classes and to find a set of representative consumption patterns, a hierarchical clustering algorithm and a clustering ensemble combination approach (WEACS) are used. Taking into account the typical consumption profile of the class to which the customers belong, new tariff options were defined and new energy coefficients prices were proposed. Finally, and with the results obtained, the consequences that these will have in the interaction between customer and electric power suppliers are analyzed.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a methodology supported on the data base knowledge discovery process (KDD), in order to find out the failure probability of electrical equipments’, which belong to a real electrical high voltage network. Data Mining (DM) techniques are used to discover a set of outcome failure probability and, therefore, to extract knowledge concerning to the unavailability of the electrical equipments such us power transformers and high-voltages power lines. The framework includes several steps, following the analysis of the real data base, the pre-processing data, the application of DM algorithms, and finally, the interpretation of the discovered knowledge. To validate the proposed methodology, a case study which includes real databases is used. This data have a heavy uncertainty due to climate conditions for this reason it was used fuzzy logic to determine the set of the electrical components failure probabilities in order to reestablish the service. The results reflect an interesting potential of this approach and encourage further research on the topic.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Trabalho de Projeto para obtenção do grau de Mestre em Engenharia Informática e de Computadores

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents an electricity medium voltage (MV) customer characterization framework supportedby knowledge discovery in database (KDD). The main idea is to identify typical load profiles (TLP) of MVconsumers and to develop a rule set for the automatic classification of new consumers. To achieve ourgoal a methodology is proposed consisting of several steps: data pre-processing; application of severalclustering algorithms to segment the daily load profiles; selection of the best partition, corresponding tothe best consumers’ segmentation, based on the assessments of several clustering validity indices; andfinally, a classification model is built based on the resulting clusters. To validate the proposed framework,a case study which includes a real database of MV consumers is performed.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents the characterization of high voltage (HV) electric power consumers based on a data clustering approach. The typical load profiles (TLP) are obtained selecting the best partition of a power consumption database among a pool of data partitions produced by several clustering algorithms. The choice of the best partition is supported using several cluster validity indices. The proposed data-mining (DM) based methodology, that includes all steps presented in the process of knowledge discovery in databases (KDD), presents an automatic data treatment application in order to preprocess the initial database in an automatic way, allowing time saving and better accuracy during this phase. These methods are intended to be used in a smart grid environment to extract useful knowledge about customers’ consumption behavior. To validate our approach, a case study with a real database of 185 HV consumers was used.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

O projeto tem como objetivo desenvolver e avaliar um modelo que facilita o acesso para pessoas surdas ou com deficiência auditiva, o acesso ao conteúdo digital - em particular o conteúdo educacional e objetos de aprendizagem – a criação de condições para uma maior inclusão social de surdos e deficientes auditivos. Pretende-se criar um modelo bidirecional, em que permite a pessoas com deficiências auditivas, possam se comunicar com outras pessoas, com a tradução da Língua Gestual Portuguesa (LGP) para a Língua Portuguesa (LP) e que outras pessoas não portadoras de qualquer deficiência auditiva possam por sua vez comunicar com os surdos ou deficientes auditivos através da tradução da LP para a LGP. Há um conjunto de técnicas que poderíamos nos apoiar para desenvolver o modelo e implementar a API de tradução da LGP em LP. Muitos estudos são feitos com base nos modelos escondidos de Markov (HMM) para efetuar o reconhecimento. Recentemente os estudos estão a caminhar para o uso de técnicas como o “Dynamic Time Warping” (DTW), que tem tido mais sucesso do que outras técnicas em termos de performance e de precisão. Neste projeto optamos por desenvolver a API e o Modelo, com base na técnica de aprendizagem Support Vector Machines (SVM) por ser uma técnica simples de implementar e com bons resultados demonstrados em reconhecimento de padrões. Os resultados obtidos utilizando esta técnica de aprendizagem foram bastante ótimos, como iremos descrever no decorrer do capítulo 4, mesmo sabendo que utilizamos dois dispositivos para capturar dados de descrição de cada gesto. Toda esta tese integra-se no âmbito do projeto científico/ investigação a decorrer no grupo de investigação GILT, sob a coordenação da professora Paula Escudeiro e suportado pela Fundação para Ciência e Tecnologia (FCT).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this project a research both in finding predictors via clustering techniques and in reviewing the Data Mining free software is achieved. The research is based in a case of study, from where additionally to the KDD free software used by the scientific community; a new free tool for pre-processing the data is presented. The predictors are intended for the e-learning domain as the data from where these predictors have to be inferred are student qualifications from different e-learning environments. Through our case of study not only clustering algorithms are tested but also additional goals are proposed.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

O presente trabalho cujo Título é técnicas de Data e Text Mining para a anotação dum Arquivo Digital, tem como objectivo testar a viabilidade da utilização de técnicas de processamento automático de texto para a anotação das sessões dos debates parlamentares da Assembleia da República de Portugal. Ao longo do trabalho abordaram-se conceitos como tecnologias de descoberta do conhecimento (KDD), o processo da descoberta do conhecimento em texto, a caracterização das várias etapas do processamento de texto e a descrição de algumas ferramentas open souce para a mineração de texto. A metodologia utilizada baseou-se na experimentação de várias técnicas de processamento textual utilizando a open source R/tm. Apresentam-se, como resultados, a influência do pré-processamento, tamanho dos documentos e tamanhos dos corpora no resultado do processamento utilizando o algoritmo knnflex.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Companies require information in order to gain an improved understanding of their customers. Data concerning customers, their interests and behavior are collected through different loyalty programs. The amount of data stored in company data bases has increased exponentially over the years and become difficult to handle. This research area is the subject of much current interest, not only in academia but also in practice, as is shown by several magazines and blogs that are covering topics on how to get to know your customers, Big Data, information visualization, and data warehousing. In this Ph.D. thesis, the Self-Organizing Map and two extensions of it – the Weighted Self-Organizing Map (WSOM) and the Self-Organizing Time Map (SOTM) – are used as data mining methods for extracting information from large amounts of customer data. The thesis focuses on how data mining methods can be used to model and analyze customer data in order to gain an overview of the customer base, as well as, for analyzing niche-markets. The thesis uses real world customer data to create models for customer profiling. Evaluation of the built models is performed by CRM experts from the retailing industry. The experts considered the information gained with help of the models to be valuable and useful for decision making and for making strategic planning for the future.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Formal Concept Analysis is an unsupervised learning technique for conceptual clustering. We introduce the notion of iceberg concept lattices and show their use in Knowledge Discovery in Databases (KDD). Iceberg lattices are designed for analyzing very large databases. In particular they serve as a condensed representation of frequent patterns as known from association rule mining. In order to show the interplay between Formal Concept Analysis and association rule mining, we discuss the algorithm TITANIC. We show that iceberg concept lattices are a starting point for computing condensed sets of association rules without loss of information, and are a visualization method for the resulting rules.