40 resultados para Knowledge Discovery in Databases
em University of Queensland eSpace - Australia
Resumo:
The new technologies for Knowledge Discovery from Databases (KDD) and data mining promise to bring new insights into a voluminous growing amount of biological data. KDD technology is complementary to laboratory experimentation and helps speed up biological research. This article contains an introduction to KDD, a review of data mining tools, and their biological applications. We discuss the domain concepts related to biological data and databases, as well as current KDD and data mining developments in biology.
Resumo:
Data mining is the process to identify valid, implicit, previously unknown, potentially useful and understandable information from large databases. It is an important step in the process of knowledge discovery in databases, (Olaru & Wehenkel, 1999). In a data mining process, input data can be structured, seme-structured, or unstructured. Data can be in text, categorical or numerical values. One of the important characteristics of data mining is its ability to deal data with large volume, distributed, time variant, noisy, and high dimensionality. A large number of data mining algorithms have been developed for different applications. For example, association rules mining can be useful for market basket problems, clustering algorithms can be used to discover trends in unsupervised learning problems, classification algorithms can be applied in decision-making problems, and sequential and time series mining algorithms can be used in predicting events, fault detection, and other supervised learning problems (Vapnik, 1999). Classification is among the most important tasks in the data mining, particularly for data mining applications into engineering fields. Together with regression, classification is mainly for predictive modelling. So far, there have been a number of classification algorithms in practice. According to (Sebastiani, 2002), the main classification algorithms can be categorized as: decision tree and rule based approach such as C4.5 (Quinlan, 1996); probability methods such as Bayesian classifier (Lewis, 1998); on-line methods such as Winnow (Littlestone, 1988) and CVFDT (Hulten 2001), neural networks methods (Rumelhart, Hinton & Wiliams, 1986); example-based methods such as k-nearest neighbors (Duda & Hart, 1973), and SVM (Cortes & Vapnik, 1995). Other important techniques for classification tasks include Associative Classification (Liu et al, 1998) and Ensemble Classification (Tumer, 1996).
Resumo:
This paper presents load profiles of electricity customers, using the knowledge discovery in databases (KDD) procedure, a data mining technique, to determine the load profiles for different types of customers. In this paper, the current load profiling methods are compared using data mining techniques, by analysing and evaluating these classification techniques. The objective of this study is to determine the best load profiling methods and data mining techniques to classify, detect and predict non-technical losses in the distribution sector, due to faulty metering and billing errors, as well as to gather knowledge on customer behaviour and preferences so as to gain a competitive advantage in the deregulated market. This paper focuses mainly on the comparative analysis of the classification techniques selected; a forthcoming paper will focus on the detection and prediction methods.
Resumo:
Pattern discovery in temporal event sequences is of great importance in many application domains, such as telecommunication network fault analysis. In reality, not every type of event has an accurate timestamp. Some of them, defined as inaccurate events may only have an interval as possible time of occurrence. The existence of inaccurate events may cause uncertainty in event ordering. The traditional support model cannot deal with this uncertainty, which would cause some interesting patterns to be missing. A new concept, precise support, is introduced to evaluate the probability of a pattern contained in a sequence. Based on this new metric, we define the uncertainty model and present an algorithm to discover interesting patterns in the sequence database that has one type of inaccurate event. In our model, the number of types of inaccurate events can be extended to k readily, however, at a cost of increasing computational complexity.
Resumo:
This paper discusses a document discovery tool based on Conceptual Clustering by Formal Concept Analysis. The program allows users to navigate e-mail using a visual lattice metaphor rather than a tree. It implements a virtual. le structure over e-mail where files and entire directories can appear in multiple positions. The content and shape of the lattice formed by the conceptual ontology can assist in e-mail discovery. The system described provides more flexibility in retrieving stored e-mails than what is normally available in e-mail clients. The paper discusses how conceptual ontologies can leverage traditional document retrieval systems and aid knowledge discovery in document collections.
Resumo:
At a broad level, it has been shown that different institutional contexts, policy regimes and business systems affect the kinds of activities in which a nation specialises. This paper is concerned with the way in which different national business systems affect the nature of participation of a nation in the knowledge economy. The paper seeks to explain cross-national variations in the knowledge economy in the Australia, Denmark and Sweden with reference to dominant characteristics of the business system. Although Australia, Denmark and Sweden are all small wealthy countries, they each have quite distinctive business systems. Australia has been regarded as a variant of the competitive business system and has generally been described as an entrepreneurial economy with a large small firm population. In contrast Sweden has a coordinated business system that has favoured large industrial firms. The Danish variant of the coordinated model, with its well-developed vocational training system, is distinguishable by its large population of networked small and medium size enterprises. The three countries also differ significantly on two dimensions of participation in the knowledge economy. First, there is cross-national variation in patterns of specialisation in knowledge intensive industries and services. Second, the institutional infrastructure of the knowledge economy (or the existing stock of knowledge and competence in the economy, the potential for generation and diffusion a new knowledge and the capacity for commercialisation of new ideas) differs across the three countries. This paper seeks to explain variations in these two dimensions of the knowledge economy with reference to characteristics of the business system in the three countries.
Resumo:
This paper highlights the importance of design expertise, for designing liquid retaining structures, including subjective judgments and professional experience. Design of liquid retaining structures has special features different from the others. Being more vulnerable to corrosion problem, they have stringent requirements against serviceability limit state of crack. It is the premise of the study to transferring expert knowledge in a computerized blackboard system. Hybrid knowledge representation schemes, including production rules, object-oriented programming, and procedural methods, are employed to express engineering heuristics and standard design knowledge during the development of the knowledge-based system (KBS) for design of liquid retaining structures. This approach renders it possible to take advantages of the characteristics of each method. The system can provide the user with advice on preliminary design, loading specification, optimized configuration selection and detailed design analysis of liquid retaining structure. It would be beneficial to the field of retaining structure design by focusing on the acquisition and organization of expert knowledge through the development of recent artificial intelligence technology. (C) 2003 Elsevier Ltd. All rights reserved.
Resumo:
Pattern discovery in a long temporal event sequence is of great importance in many application domains. Most of the previous work focuses on identifying positive associations among time stamped event types. In this paper, we introduce the problem of defining and discovering negative associations that, as positive rules, may also serve as a source of knowledge discovery. In general, an event-oriented pattern is a pattern that associates with a selected type of event, called a target event. As a counter-part of previous research, we identify patterns that have a negative relationship with the target events. A set of criteria is defined to evaluate the interestingness of patterns associated with such negative relationships. In the process of counting the frequency of a pattern, we propose a new approach, called unique minimal occurrence, which guarantees that the Apriori property holds for all patterns in a long sequence. Based on the interestingness measures, algorithms are proposed to discover potentially interesting patterns for this negative rule problem. Finally, the experiment is made for a real application.