916 resultados para Knowledge Discovery in Databases


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a new hyper-heuristic method using Case-Based Reasoning (CBR) for solving course timetabling problems. The term Hyper-heuristics has recently been employed to refer to 'heuristics that choose heuristics' rather than heuristics that operate directly on given problems. One of the overriding motivations of hyper-heuristic methods is the attempt to develop techniques that can operate with greater generality than is currently possible. The basic idea behind this is that we maintain a case base of information about the most successful heuristics for a range of previous timetabling problems to predict the best heuristic for the new problem in hand using the previous knowledge. Knowledge discovery techniques are used to carry out the training on the CBR system to improve the system performance on the prediction. Initial results presented in this paper are good and we conclude by discussing the con-siderable promise for future work in this area.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes an application of Social Network Analysis methods for identification of knowledge demands in public organisations. Affiliation networks established in a postgraduate programme were analysed. The course was executed in a distance education mode and its students worked on public agencies. Relations established among course participants were mediated through a virtual learning environment using Moodle. Data available in Moodle may be extracted using knowledge discovery in databases techniques. Potential degrees of closeness existing among different organisations and among researched subjects were assessed. This suggests how organisations could cooperate for knowledge management and also how to identify their common interests. The study points out that closeness among organisations and research topics may be assessed through affiliation networks. This opens up opportunities for applying knowledge management between organisations and creating communities of practice. Concepts of knowledge management and social network analysis provide the theoretical and methodological basis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract With the phenomenal growth of electronic data and information, there are many demands for the development of efficient and effective systems (tools) to perform the issue of data mining tasks on multidimensional databases. Association rules describe associations between items in the same transactions (intra) or in different transactions (inter). Association mining attempts to find interesting or useful association rules in databases: this is the crucial issue for the application of data mining in the real world. Association mining can be used in many application areas, such as the discovery of associations between customers’ locations and shopping behaviours in market basket analysis. Association mining includes two phases. The first phase, called pattern mining, is the discovery of frequent patterns. The second phase, called rule generation, is the discovery of interesting and useful association rules in the discovered patterns. The first phase, however, often takes a long time to find all frequent patterns; these also include much noise. The second phase is also a time consuming activity that can generate many redundant rules. To improve the quality of association mining in databases, this thesis provides an alternative technique, granule-based association mining, for knowledge discovery in databases, where a granule refers to a predicate that describes common features of a group of transactions. The new technique first transfers transaction databases into basic decision tables, then uses multi-tier structures to integrate pattern mining and rule generation in one phase for both intra and inter transaction association rule mining. To evaluate the proposed new technique, this research defines the concept of meaningless rules by considering the co-relations between data-dimensions for intratransaction-association rule mining. It also uses precision to evaluate the effectiveness of intertransaction association rules. The experimental results show that the proposed technique is promising.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

It is a big challenge to find useful associations in databases for user specific needs. The essential issue is how to provide efficient methods for describing meaningful associations and pruning false discoveries or meaningless ones. One major obstacle is the overwhelmingly large volume of discovered patterns. This paper discusses an alternative approach called multi-tier granule mining to improve frequent association mining. Rather than using patterns, it uses granules to represent knowledge implicitly contained in databases. It also uses multi-tier structures and association mappings to represent association rules in terms of granules. Consequently, association rules can be quickly accessed and meaningless association rules can be justified according to the association mappings. Moreover, the proposed structure is also an precise compression of patterns which can restore the original supports. The experimental results shows that the proposed approach is promising.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a lattice-based visual metaphor for knowledge discovery in electronic mail. It allows a user to navigate email using a visual lattice metaphor rather than a tree structure. By using such a conceptual multi-hierarchy, the content and shape of the lattice can be varied to accommodate any number of queries against the email collection. The system provides more flexibility in retrieving stored emails and can be generalised to any electronic documents. The paper presents the underlying mathematical structures, and a number of examples of the lattice and multi-hierarchy working with a prototypical email collection.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents load profiles of electricity customers, using the knowledge discovery in databases (KDD) procedure, a data mining technique, to determine the load profiles for different types of customers. In this paper, the current load profiling methods are compared using data mining techniques, by analysing and evaluating these classification techniques. The objective of this study is to determine the best load profiling methods and data mining techniques to classify, detect and predict non-technical losses in the distribution sector, due to faulty metering and billing errors, as well as to gather knowledge on customer behaviour and preferences so as to gain a competitive advantage in the deregulated market. This paper focuses mainly on the comparative analysis of the classification techniques selected; a forthcoming paper will focus on the detection and prediction methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The primary aim of this dissertation is to develop data mining tools for knowledge discovery in biomedical data when multiple (homogeneous or heterogeneous) sources of data are available. The central hypothesis is that, when information from multiple sources of data are used appropriately and effectively, knowledge discovery can be better achieved than what is possible from only a single source. ^ Recent advances in high-throughput technology have enabled biomedical researchers to generate large volumes of diverse types of data on a genome-wide scale. These data include DNA sequences, gene expression measurements, and much more; they provide the motivation for building analysis tools to elucidate the modular organization of the cell. The challenges include efficiently and accurately extracting information from the multiple data sources; representing the information effectively, developing analytical tools, and interpreting the results in the context of the domain. ^ The first part considers the application of feature-level integration to design classifiers that discriminate between soil types. The machine learning tools, SVM and KNN, were used to successfully distinguish between several soil samples. ^ The second part considers clustering using multiple heterogeneous data sources. The resulting Multi-Source Clustering (MSC) algorithm was shown to have a better performance than clustering methods that use only a single data source or a simple feature-level integration of heterogeneous data sources. ^ The third part proposes a new approach to effectively incorporate incomplete data into clustering analysis. Adapted from K-means algorithm, the Generalized Constrained Clustering (GCC) algorithm makes use of incomplete data in the form of constraints to perform exploratory analysis. Novel approaches for extracting constraints were proposed. For sufficiently large constraint sets, the GCC algorithm outperformed the MSC algorithm. ^ The last part considers the problem of providing a theme-specific environment for mining multi-source biomedical data. The database called PlasmoTFBM, focusing on gene regulation of Plasmodium falciparum, contains diverse information and has a simple interface to allow biologists to explore the data. It provided a framework for comparing different analytical tools for predicting regulatory elements and for designing useful data mining tools. ^ The conclusion is that the experiments reported in this dissertation strongly support the central hypothesis.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Postprint

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent research on multiple kernel learning has lead to a number of approaches for combining kernels in regularized risk minimization. The proposed approaches include different formulations of objectives and varying regularization strategies. In this paper we present a unifying optimization criterion for multiple kernel learning and show how existing formulations are subsumed as special cases. We also derive the criterion’s dual representation, which is suitable for general smooth optimization algorithms. Finally, we evaluate multiple kernel learning in this framework analytically using a Rademacher complexity bound on the generalization error and empirically in a set of experiments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we consider the task of prototype selection whose primary goal is to reduce the storage and computational requirements of the Nearest Neighbor classifier while achieving better classification accuracies. We propose a solution to the prototype selection problem using techniques from cooperative game theory and show its efficacy experimentally.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

No presente trabalho foram utilizados modelos de classificação para minerar dados relacionados à aprendizagem de Matemática e ao perfil de professores do ensino fundamental. Mais especificamente, foram abordados os fatores referentes aos educadores do Estado do Rio de Janeiro que influenciam positivamente e negativamente no desempenho dos alunos do 9 ano do ensino básico nas provas de Matemática. Os dados utilizados para extrair estas informações são disponibilizados pelo Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira que avalia o sistema educacional brasileiro em diversos níveis e modalidades de ensino, incluindo a Educação Básica, cuja avaliação, que foi foco deste estudo, é realizada pela Prova Brasil. A partir desta base, foi aplicado o processo de Descoberta de Conhecimento em Bancos de Dados (KDD - Knowledge Discovery in Databases), composto das etapas de preparação, mineração e pós-processamento dos dados. Os padrões foram extraídos dos modelos de classificação gerados pelas técnicas árvore de decisão, indução de regras e classificadores Bayesianos, cujos algoritmos estão implementados no software Weka (Waikato Environment for Knowledge Analysis). Além disso, foram aplicados métodos de grupos e uma metodologia para tornar as classes uniformemente distribuídas, afim de melhorar a precisão dos modelos obtidos. Os resultados apresentaram importantes fatores que contribuem para o ensino-aprendizagem de Matemática, assim como evidenciaram aspectos que comprometem negativamente o desempenho dos discentes. Por fim, os resultados extraídos fornecem ao educador e elaborador de políticas públicas fatores para uma análise que os auxiliem em posteriores tomadas de decisão.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nos dias atuais, a maioria das operações feitas por empresas e organizações é armazenada em bancos de dados que podem ser explorados por pesquisadores com o objetivo de se obter informações úteis para auxílio da tomada de decisão. Devido ao grande volume envolvido, a extração e análise dos dados não é uma tarefa simples. O processo geral de conversão de dados brutos em informações úteis chama-se Descoberta de Conhecimento em Bancos de Dados (KDD - Knowledge Discovery in Databases). Uma das etapas deste processo é a Mineração de Dados (Data Mining), que consiste na aplicação de algoritmos e técnicas estatísticas para explorar informações contidas implicitamente em grandes bancos de dados. Muitas áreas utilizam o processo KDD para facilitar o reconhecimento de padrões ou modelos em suas bases de informações. Este trabalho apresenta uma aplicação prática do processo KDD utilizando a base de dados de alunos do 9 ano do ensino básico do Estado do Rio de Janeiro, disponibilizada no site do INEP, com o objetivo de descobrir padrões interessantes entre o perfil socioeconômico do aluno e seu desempenho obtido em Matemática na Prova Brasil 2011. Neste trabalho, utilizando-se da ferramenta chamada Weka (Waikato Environment for Knowledge Analysis), foi aplicada a tarefa de mineração de dados conhecida como associação, onde se extraiu regras por intermédio do algoritmo Apriori. Neste estudo foi possível descobrir, por exemplo, que alunos que já foram reprovados uma vez tendem a tirar uma nota inferior na prova de matemática, assim como alunos que nunca foram reprovados tiveram um melhor desempenho. Outros fatores, como a sua pretensão futura, a escolaridade dos pais, a preferência de matemática, o grupo étnico o qual o aluno pertence, se o aluno lê sites frequentemente, também influenciam positivamente ou negativamente no aprendizado do discente. Também foi feita uma análise de acordo com a infraestrutura da escola onde o aluno estuda e com isso, pôde-se afirmar que os padrões descobertos ocorrem independentemente se estes alunos estudam em escolas que possuem infraestrutura boa ou ruim. Os resultados obtidos podem ser utilizados para traçar perfis de estudantes que tem um melhor ou um pior desempenho em matemática e para a elaboração de políticas públicas na área de educação, voltadas ao ensino fundamental.