956 resultados para knowledge discovery


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Knowledge discovery support environments include beside classical data analysis tools also data mining tools. For supporting both kinds of tools, a unified knowledge representation is needed. We show that concept lattices which are used as knowledge representation in Conceptual Information Systems can also be used for structuring the results of mining association rules. Vice versa, we use ideas of association rules for reducing the complexity of the visualization of Conceptual Information Systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we discuss Conceptual Knowledge Discovery in Databases (CKDD) in its connection with Data Analysis. Our approach is based on Formal Concept Analysis, a mathematical theory which has been developed and proven useful during the last 20 years. Formal Concept Analysis has led to a theory of conceptual information systems which has been applied by using the management system TOSCANA in a wide range of domains. In this paper, we use such an application in database marketing to demonstrate how methods and procedures of CKDD can be applied in Data Analysis. In particular, we show the interplay and integration of data mining and data analysis techniques based on Formal Concept Analysis. The main concern of this paper is to explain how the transition from data to knowledge can be supported by a TOSCANA system. To clarify the transition steps we discuss their correspondence to the five levels of knowledge representation established by R. Brachman and to the steps of empirically grounded theory building proposed by A. Strauss and J. Corbin.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

n the past decade, the analysis of data has faced the challenge of dealing with very large and complex datasets and the real-time generation of data. Technologies to store and access these complex and large datasets are in place. However, robust and scalable analysis technologies are needed to extract meaningful information from these datasets. The research field of Information Visualization and Visual Data Analytics addresses this need. Information visualization and data mining are often used complementary to each other. Their common goal is the extraction of meaningful information from complex and possibly large data. However, though data mining focuses on the usage of silicon hardware, visualization techniques also aim to access the powerful image-processing capabilities of the human brain. This article highlights the research on data visualization and visual analytics techniques. Furthermore, we highlight existing visual analytics techniques, systems, and applications including a perspective on the field from the chemical process industry.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Over the past decade, advances in the Internet and media technology have literally brought people closer than ever before. It is interesting to note that traditional sociological definitions of a community have been outmoded, for community has extended far beyond the geographical boundaries that were held by traditional definitions (Wellman & Gulia, 1999). Virtual or online community was defined in such a context to describe various forms of computer-mediated communication (CMC). Although virtual communities do not necessarily arise from the Internet, the overwhelming popularity of the Internet is one of the main reasons that virtual communities receive so much attention (Rheingold, 1999). The beginning of virtual communities is attributed to scientists who exchanged information and cooperatively conduct research during the 1970s. There are four needs of participants in a virtual community: member interest, social interaction, imagination, and transaction (Hagel & Armstrong, 1997). The first two focus more on the information exchange and knowledge discovery; the imagination is for entertainment; and the transaction is for commerce strategy. In this article, we investigate the function of information exchange and knowledge discovery in virtual communities. There are two important inherent properties embedded in virtual communities (Wellman, 2001):

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Paediatric asthma represents a significant public health problem. To date, clinical data sets have typically been examined using traditional data analysis techniques. While such traditional statistical methods are invariably widespread, large volumes of data may overwhelm such approaches. The new generation of knowledge discovery techniques may therefore be a more appropriate means of analysis. The primary purpose of this study was to investigate an asthma data set, with the application of various data mining techniques for knowledge discovery. The current study utilises data from an asthma data set (n ≈ 17000). The findings revealed a number of factors and patterns of interest.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multi-databases mining is an urgent task. This thesis solves 4 key problems in multi-databases mining: Application-independent database classification - Local instance analysis model - Useful pattern discovery - Pattern synthesis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Knowing what to do with the massive amount of data collected has always been an ongoing issue for many organizations. While data mining has been touted to be the solution, it has failed to deliver the impact despite its successes in many areas. One reason is that data mining algorithms were not designed for the real world, i.e., they usually assume a static view of the data and a stable execution environment where resources are abundant. The reality however is that data are constantly changing and the execution environment is dynamic. Hence, it becomes difficult for data mining to truly deliver timely and relevant results. Recently, the processing of stream data has received many attention. What is interesting is that the methodology to design stream-based algorithms may well be the solution to the above problem. In this entry, we discuss this issue and present an overview of recent works.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In knowledge discovery in single sequences, different results could be discovered from the same sequence when different frequency measures are adopted. It is natural to raise such questions as (1) do these frequency measures reflect actual frequencies accurately? (2) what impacts do frequency measures have on discovered knowledge? (3) are discovered results accurate and reliable? and (4) which measures are appropriate for reflecting frequencies accurately? In this paper, taking three major factors (anti-monotonicity, maximum-frequency and window-width restriction) into account, we identify inaccuracies inherent in seven existing frequency measures, and investigate their impacts on the soundness and completeness of two kinds of knowledge, frequent episodes and episode rules, discovered from single sequences. In order to obtain more accurate frequencies and knowledge, we provide three recommendations for defining appropriate frequency measures. Following the recommendations, we introduce a more appropriate frequency measure. Empirical evaluation reveals the inaccuracies and verifies our findings. 

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Subsequence frequency measurement is a basic and essential problem in knowledge discovery in single sequences. Frequency based knowledge discovery in single sequences tends to be unreliable since different resulting sets may be obtained from a same sequence when different frequency metrics are adopted. In this chapter, we investigate subsequence frequency measurement and its impact on the reliability of knowledge discovery in single sequences. We analyse seven previous frequency metrics, identify their inherent inaccuracies, and explore their impacts on two kinds of knowledge discovered from single sequences, frequent episodes and episode rules. We further give three suggestions for frequency metrics and introduce a new frequency metric in order to improve the reliability. Empirical evaluation reveals the inaccuracies and verifies our findings.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The web is a rich resource for information discovery, as a result web mining is a hot topic. However, a reliable mining result depends on the reliability of the data set. For every single second, the web generate huge amount of data, such as web page requests, file transportation. The data reflect human behavior in the cyber space and therefore valuable for our analysis in various disciplines, e.g. social science, network security. How to deposit the data is a challenge. An usual strategy is to save the abstract of the data, such as using aggregation functions to preserve the features of the original data with much smaller space. A key problem, however is that such information can be distorted by the presence of illegitimate traffic, e.g. botnet recruitment scanning, DDoS attack traffic, etc. An important consideration in web related knowledge discovery then is the robustness of the aggregation method , which in turn may be affected by the reliability of network traffic data. In this chapter, we first present the methods of aggregation functions, and then we employe information distances to filter out anomaly data as a preparation for web data mining.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a Minimal Causal Model Inducer that can be used for the reliable knowledge discovery. The minimal-model semantics of causal discovery is an essential concept for the identification of a best fitting model in the sense of satisfactory consistent with the given data and be the simpler, less expressive model. Consistency is one of major measures of reliability in knowledge discovery. Therefore to develop an algorithm being able to derive a minimal model is an interesting topic in the are of reliable knowledge discovery. various causal induction algorithms and tools developed so far can not guarantee that the derived model is minimal and consistent. It was proved the MML induction approach introduced by Wallace, Keven and Honghua Dai is a minimal causal model learner. In this paper, we further prove that the developed minimal causal model learner is reliable in the sense of satisfactory consistency. The experimental results obtained from the tests on a number of both artificial and real models provided in this paper confirm this theoretical result.