151 resultados para Incremental mining


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Automating Software Engineering is the dream of software Engineers for decades. To make this dream to come to true, data mining can play an important role. Our recent research has shown that to increase the productivity and to reduce the cost of software development, it is essential to have an effective and efficient mechanism to store, manage and utilize existing software resources, and thus to automate software analysis, testing, evaluation and to make use of existing software for new problems. This paper firstly provides a brief overview of traditional data mining followed by a presentation on data mining in broader sense. Secondly, it presents the idea and the technology of software warehouse as an innovative approach in managing software resources using the idea of data warehouse where software assets are systematically accumulated, deposited, retrieved, packaged, managed and utilized driven by data mining and OLAP technologies. Thirdly, we presented the concepts and technology and their applications of data mining and data matrix including software warehouse to software engineering. The perspectives of the role of software warehouse and software mining in modern software development are addressed. We expect that the results will lead to a streamlined high efficient software development process and enhance the productivity in response to modern challenges of the design and development of software applications.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The approaches proposed in the past for discovering sequential patterns mainly focused on single sequential data. In the real world, however, some sequential patterns hide their essences among multi-sequential event data. It has been noted that knowledge discovery with either user-specified constraints, or templates, or skeletons is receiving wide attention because it is more efficient and avoids the tedious selection of useful patterns from the mass-produced results. In this paper, a novel pattern in multi-sequential event data that are correlated and its mining approach are presented. We call this pattern sequential causal pattern. A group of skeletons of sequential causal patterns, which may be specified by the user or generated by the program, are verified or mined by embedding them into the mining engine. Experiments show that this method, when applied to discovering the occurring regularities of a crop pest in a region, is successful in mining sequential causal patterns with user-specified skeletons in multi-sequential event data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The development of the Internet has boosted prosperity of the World Wide Web, which is now a huge information source. Because of characteristics of the web, in most cases, traditional databasebased technologies are no longer suitable for web information retrieval and management. To effectively manage web information, it is necessary to reveal intrinsic relationships/structures among web information objects by eliminating noise factors. This paper proposes a mechanism that could be widely used in information processing, including web information processing and noise factor elimination for getting more intrinsic relationships. As an application case of this mechanism, one relevant web page finding algorithm is proposed to uncover intrinsic relationship among web pages from their hyperlink patterns, and find more semantic relevant web pages. The experimental evaluation shows the feasibility and effectiveness of the algorithm and demonstrates the potential of the proposed mechanism in web applications.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data mining refers to extracting or "mining" knowledge from large amounts of data. It is an increasingly popular field that uses statistical, visualization, machine learning, and other data manipulation and knowledge extraction techniques aimed at gaining an insight into the relationships and patterns hidden in the data. Availability of digital data within picture archiving and communication systems raises a possibility of health care and research enhancement associated with manipulation, processing and handling of data by computers.That is the basis for computer-assisted radiology development. Further development of computer-assisted radiology is associated with the use of new intelligent capabilities such as multimedia support and data mining in order to discover the relevant knowledge for diagnosis. It is very useful if results of data mining can be communicated to humans in an understandable way. In this paper, we present our work on data mining in medical image archiving systems. We investigate the use of a very efficient data mining technique, a decision tree, in order to learn the knowledge for computer-assisted image analysis. We apply our method to the classification of x-ray images for lung cancer diagnosis. The proposed technique is based on an inductive decision tree learning algorithm that has low complexity with high transparency and accuracy. The results show that the proposed algorithm is robust, accurate, fast, and it produces a comprehensible structure, summarizing the knowledge it induces.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper follows How (2000) who examined 130 Australian mining and energy initial public offerings (IPOs) from 1979 to 1990 to report an average 107.18 % underpricing return by those IPOs. This study updates that report by investigating 127 Australian mining and energy IPOs from 1994 to 2001 to find a substantially lower 17.93 % average first day return. These updated findings have implications for both new companies seeking to float and also for the subscribers wishing to invest in these new listings.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The mining and energy sectors are particularly publicly sensitive sectors and subject to a high degree of public scrutiny. Evan and Freeman (1993) suggest that such public scrutiny needs may be better met by having direct public stakeholder representation on the board of directors. Similarly, Bilimoria (2000) argues a strong commercial case for engaging women on boards. This paper investigates the number and proportion of non equity holding public stakeholder directors and the number and proportion of women directors on the boards of Australian mining and energy company initial public offerings (IPOs) and reports a paucity of public stakeholder directors and also a low proportional female representation on such IPO boards.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In text categorization applications, class imbalance, which refers to an uneven data distribution where one class is represented by far more less instances than the others, is a commonly encountered problem. In such a situation, conventional classifiers tend to have a strong performance bias, which results in high accuracy rate on the majority class but very low rate on the minorities. An extreme strategy for unbalanced, learning is to discard the majority instances and apply one-class classification to the minority class. However, this could easily cause another type of bias, which increases the accuracy rate on minorities by sacrificing the majorities. This paper aims to investigate approaches that reduce these two types of performance bias and improve the reliability of discovered classification rules. Experimental results show that the inexact field learning method and parameter optimized one-class classifiers achieve more balanced performance than the standard approaches.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Automated adversarial detection systems can fail when under attack by adversaries. As part of a resilient data stream mining system to reduce the possibility of such failure, adaptive spike detection is attribute ranking and selection without class-labels. The first part of adaptive spike detection requires weighing all attributes for spiky-ness to rank them. The second part involves filtering some attributes with extreme weights to choose the best ones for computing each example’s suspicion score. Within an identity crime detection domain, adaptive spike detection is validated on a few million real credit applications with adversarial activity. The results are F-measure curves on eleven experiments and relative weights discussion on the best experiment. The results reinforce adaptive spike detection’s effectiveness for class-label-free attribute ranking and selection.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Traditional approaches such as theorem proving and model checking have been successfully used to analyze security protocols. Ideally, they assume the data communication is reliable and require the user to predetermine authentication goals. However, missing and inconsistent data have been greatly ignored, and the increasingly complicated security protocol makes it difficult to predefine such goals. This paper presents a novel approach to analyze security protocols using association rule mining. It is able to not only validate the reliability of transactions but also discover potential correlations between secure messages. The algorithm and experiment demonstrate that our approaches are useful and promising.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper tells a story of synergism of two cutting edge technologies — agents and data mining. By integrating these two technologies, the power for each of them is enhanced. Integrating agents into data mining systems, or constructing data mining systems from agent perspectives, the flexibility of data mining systems can be greatly improved. New data mining techniques can add to the systems dynamically in the form of agents, while the out-of-date ones can also be deleted from systems at run-time. Equipping agents with data mining capabilities, the agents are much smarter and more adaptable. In this way, the performance of these agent systems can be improved. A new way to integrate these two techniques –ontology-based integration is also discussed. Case studies will be given to demonstrate such mutual enhancement.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Athletes commonly attempt to enhance performance by training in normoxia but sleeping in hypoxia [live high and train low (LHTL)]. However, chronic hypoxia reduces muscle Na+-K+-ATPase content, whereas fatiguing contractions reduce Na+-K+-ATPase activity, which each may impair performance. We examined whether LHTL and intense exercise would decrease muscle Na+-K+-ATPase activity and whether these effects would be additive and sufficient to impair performance or plasma K+ regulation. Thirteen subjects were randomly assigned to two fitness-matched groups, LHTL (n = 6) or control (Con, n = 7). LHTL slept at simulated moderate altitude (3,000 m, inspired O2 fraction = 15.48%) for 23 nights and lived and trained by day under normoxic conditions in Canberra (altitude ~600 m). Con lived, trained, and slept in normoxia. A standardized incremental exercise test was conducted before and after LHTL. A vastus lateralis muscle biopsy was taken at rest and after exercise, before and after LHTL or Con, and analyzed for maximal Na+-K+-ATPase activity [K+-stimulated 3-O-methylfluorescein phosphatase (3-O-MFPase)] and Na+-K+-ATPase content ([3H]ouabain binding sites). 3-O-MFPase activity was decreased by –2.9 ± 2.6% in LHTL (P < 0.05) and was depressed immediately after exercise (P < 0.05) similarly in Con and LHTL (–13.0 ± 3.2 and –11.8 ± 1.5%, respectively). Plasma K+ concentration during exercise was unchanged by LHTL; [3H]ouabain binding was unchanged with LHTL or exercise. Peak oxygen consumption was reduced in LHTL (P < 0.05) but not in Con, whereas exercise work was unchanged in either group. Thus LHTL had a minor effect on, and incremental exercise reduced, Na+-K+-ATPase activity. However, the small LHTL-induced depression of 3-O-MFPase activity was insufficient to adversely affect either K+ regulation or total work performed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In data stream applications, a good approximation obtained in a timely  manner is often better than the exact answer that’s delayed beyond the window of opportunity. Of course, the quality of the approximate is as important as its timely delivery. Unfortunately, algorithms capable of online processing do not conform strictly to a precise error guarantee. Since online processing is essential and so is the precision of the error, it is necessary that stream algorithms meet both criteria. Yet, this is not the case for mining frequent sets in data streams. We present EStream, a novel algorithm that allows online processing while producing results strictly within the error bound. Our theoretical and experimental results show that EStream is a better candidate for finding frequent sets in data streams, when both constraints need to be satisfied.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Most algorithms that focus on discovering frequent patterns from data streams assumed that the machinery is capable of managing all the incoming transactions without any delay; or without the need to drop transactions. However, this assumption is often impractical due to the inherent characteristics of data stream environments. Especially under high load conditions, there is often a shortage of system resources to process the incoming transactions. This causes unwanted latencies that in turn, affects the applicability of the data mining models produced – which often has a small window of opportunity. We propose a load shedding algorithm to address this issue. The algorithm adaptively detects overload situations and drops transactions from data streams using a probabilistic model. We tested our algorithm on both synthetic and real-life datasets to verify the feasibility of our algorithm.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we propose a model for discovering frequent sequential patterns, phrases, which can be used as profile descriptors of documents. It is indubitable that we can obtain numerous phrases using data mining algorithms. However, it is difficult to use these phrases effectively for answering what users want. Therefore, we present a pattern taxonomy extraction model which performs the task of extracting descriptive frequent sequential patterns by pruning the meaningless ones. The model then is extended and tested by applying it to the information filtering system. The results of the experiment show that pattern-based methods outperform the keyword-based methods. The results also indicate that removal of meaningless patterns not only reduces the cost of computation but also improves the effectiveness of the system.