161 resultados para contrast mining


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Current data mining techniques may not be helpful for mining some companies/organizations such as nuclear power plants and earthquake bureaus, which have only small databases. Apparently, these companies/organizations also expect to apply data mining techniques to extract useful patterns in their databases so as to make their decisions. However, data in these databases such as the accident database of a nuclear power plant and the earthquake database in an earthquake bureau, may not be large enough to form any patterns. To meet the applications, we present a new mining model in this paper, which is based on the collecting knowledge from such as Web, journals, and newspapers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data collecting is necessary to some organizations such as nuclear power plants and earthquake bureaus, which have very small databases. Traditional data collecting is to obtain necessary data from internal and external data-sources and join all data together to create a homogeneous huge database. Because collected data may be untrusty, it can disguise really useful patterns in data. In this paper, breaking away traditional data collecting mode that deals with internal and external data equally, we argue that the first step for utilizing external data is to identify quality data in data-sources for given mining tasks. Pre- and post-analysis techniques are thus advocated for generating quality data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper introduces an incremental FP-Growth approach for Web content based data mining and its application in solving a real world problem The problem is solved in the following ways. Firstly, we obtain the semi-structured data from the Web pages of Chinese car market and structure them and save them in local database. Secondly, we use an incremental FP-Growth algorithm for mining association rules to discover Chinese consumers' car consumption preference. To find more general regularities, an attribute-oriented induction method is also utilized to find customer's consumption preference among a range of car categories. Experimental results have revealed some interesting consumption preferences that are useful for the decision makers to make the policy to encourage and guide car consumption. Although the current data we used may not be the best representative of the actual market in practice, it is still good enough for the decision making purpose in terms of reflecting the real situation of car consumption preference under the two assumptions in the context.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Automating Software Engineering is the dream of software Engineers for decades. To make this dream to come to true, data mining can play an important role. Our recent research has shown that to increase the productivity and to reduce the cost of software development, it is essential to have an effective and efficient mechanism to store, manage and utilize existing software resources, and thus to automate software analysis, testing, evaluation and to make use of existing software for new problems. This paper firstly provides a brief overview of traditional data mining followed by a presentation on data mining in broader sense. Secondly, it presents the idea and the technology of software warehouse as an innovative approach in managing software resources using the idea of data warehouse where software assets are systematically accumulated, deposited, retrieved, packaged, managed and utilized driven by data mining and OLAP technologies. Thirdly, we presented the concepts and technology and their applications of data mining and data matrix including software warehouse to software engineering. The perspectives of the role of software warehouse and software mining in modern software development are addressed. We expect that the results will lead to a streamlined high efficient software development process and enhance the productivity in response to modern challenges of the design and development of software applications.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The approaches proposed in the past for discovering sequential patterns mainly focused on single sequential data. In the real world, however, some sequential patterns hide their essences among multi-sequential event data. It has been noted that knowledge discovery with either user-specified constraints, or templates, or skeletons is receiving wide attention because it is more efficient and avoids the tedious selection of useful patterns from the mass-produced results. In this paper, a novel pattern in multi-sequential event data that are correlated and its mining approach are presented. We call this pattern sequential causal pattern. A group of skeletons of sequential causal patterns, which may be specified by the user or generated by the program, are verified or mined by embedding them into the mining engine. Experiments show that this method, when applied to discovering the occurring regularities of a crop pest in a region, is successful in mining sequential causal patterns with user-specified skeletons in multi-sequential event data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The development of the Internet has boosted prosperity of the World Wide Web, which is now a huge information source. Because of characteristics of the web, in most cases, traditional databasebased technologies are no longer suitable for web information retrieval and management. To effectively manage web information, it is necessary to reveal intrinsic relationships/structures among web information objects by eliminating noise factors. This paper proposes a mechanism that could be widely used in information processing, including web information processing and noise factor elimination for getting more intrinsic relationships. As an application case of this mechanism, one relevant web page finding algorithm is proposed to uncover intrinsic relationship among web pages from their hyperlink patterns, and find more semantic relevant web pages. The experimental evaluation shows the feasibility and effectiveness of the algorithm and demonstrates the potential of the proposed mechanism in web applications.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data mining refers to extracting or "mining" knowledge from large amounts of data. It is an increasingly popular field that uses statistical, visualization, machine learning, and other data manipulation and knowledge extraction techniques aimed at gaining an insight into the relationships and patterns hidden in the data. Availability of digital data within picture archiving and communication systems raises a possibility of health care and research enhancement associated with manipulation, processing and handling of data by computers.That is the basis for computer-assisted radiology development. Further development of computer-assisted radiology is associated with the use of new intelligent capabilities such as multimedia support and data mining in order to discover the relevant knowledge for diagnosis. It is very useful if results of data mining can be communicated to humans in an understandable way. In this paper, we present our work on data mining in medical image archiving systems. We investigate the use of a very efficient data mining technique, a decision tree, in order to learn the knowledge for computer-assisted image analysis. We apply our method to the classification of x-ray images for lung cancer diagnosis. The proposed technique is based on an inductive decision tree learning algorithm that has low complexity with high transparency and accuracy. The results show that the proposed algorithm is robust, accurate, fast, and it produces a comprehensible structure, summarizing the knowledge it induces.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper follows How (2000) who examined 130 Australian mining and energy initial public offerings (IPOs) from 1979 to 1990 to report an average 107.18 % underpricing return by those IPOs. This study updates that report by investigating 127 Australian mining and energy IPOs from 1994 to 2001 to find a substantially lower 17.93 % average first day return. These updated findings have implications for both new companies seeking to float and also for the subscribers wishing to invest in these new listings.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The mining and energy sectors are particularly publicly sensitive sectors and subject to a high degree of public scrutiny. Evan and Freeman (1993) suggest that such public scrutiny needs may be better met by having direct public stakeholder representation on the board of directors. Similarly, Bilimoria (2000) argues a strong commercial case for engaging women on boards. This paper investigates the number and proportion of non equity holding public stakeholder directors and the number and proportion of women directors on the boards of Australian mining and energy company initial public offerings (IPOs) and reports a paucity of public stakeholder directors and also a low proportional female representation on such IPO boards.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In text categorization applications, class imbalance, which refers to an uneven data distribution where one class is represented by far more less instances than the others, is a commonly encountered problem. In such a situation, conventional classifiers tend to have a strong performance bias, which results in high accuracy rate on the majority class but very low rate on the minorities. An extreme strategy for unbalanced, learning is to discard the majority instances and apply one-class classification to the minority class. However, this could easily cause another type of bias, which increases the accuracy rate on minorities by sacrificing the majorities. This paper aims to investigate approaches that reduce these two types of performance bias and improve the reliability of discovered classification rules. Experimental results show that the inexact field learning method and parameter optimized one-class classifiers achieve more balanced performance than the standard approaches.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Automated adversarial detection systems can fail when under attack by adversaries. As part of a resilient data stream mining system to reduce the possibility of such failure, adaptive spike detection is attribute ranking and selection without class-labels. The first part of adaptive spike detection requires weighing all attributes for spiky-ness to rank them. The second part involves filtering some attributes with extreme weights to choose the best ones for computing each example’s suspicion score. Within an identity crime detection domain, adaptive spike detection is validated on a few million real credit applications with adversarial activity. The results are F-measure curves on eleven experiments and relative weights discussion on the best experiment. The results reinforce adaptive spike detection’s effectiveness for class-label-free attribute ranking and selection.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Traditional approaches such as theorem proving and model checking have been successfully used to analyze security protocols. Ideally, they assume the data communication is reliable and require the user to predetermine authentication goals. However, missing and inconsistent data have been greatly ignored, and the increasingly complicated security protocol makes it difficult to predefine such goals. This paper presents a novel approach to analyze security protocols using association rule mining. It is able to not only validate the reliability of transactions but also discover potential correlations between secure messages. The algorithm and experiment demonstrate that our approaches are useful and promising.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper tells a story of synergism of two cutting edge technologies — agents and data mining. By integrating these two technologies, the power for each of them is enhanced. Integrating agents into data mining systems, or constructing data mining systems from agent perspectives, the flexibility of data mining systems can be greatly improved. New data mining techniques can add to the systems dynamically in the form of agents, while the out-of-date ones can also be deleted from systems at run-time. Equipping agents with data mining capabilities, the agents are much smarter and more adaptable. In this way, the performance of these agent systems can be improved. A new way to integrate these two techniques –ontology-based integration is also discussed. Case studies will be given to demonstrate such mutual enhancement.