998 resultados para stream mining


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The approaches proposed in the past for discovering sequential patterns mainly focused on single sequential data. In the real world, however, some sequential patterns hide their essences among multi-sequential event data. It has been noted that knowledge discovery with either user-specified constraints, or templates, or skeletons is receiving wide attention because it is more efficient and avoids the tedious selection of useful patterns from the mass-produced results. In this paper, a novel pattern in multi-sequential event data that are correlated and its mining approach are presented. We call this pattern sequential causal pattern. A group of skeletons of sequential causal patterns, which may be specified by the user or generated by the program, are verified or mined by embedding them into the mining engine. Experiments show that this method, when applied to discovering the occurring regularities of a crop pest in a region, is successful in mining sequential causal patterns with user-specified skeletons in multi-sequential event data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The development of the Internet has boosted prosperity of the World Wide Web, which is now a huge information source. Because of characteristics of the web, in most cases, traditional databasebased technologies are no longer suitable for web information retrieval and management. To effectively manage web information, it is necessary to reveal intrinsic relationships/structures among web information objects by eliminating noise factors. This paper proposes a mechanism that could be widely used in information processing, including web information processing and noise factor elimination for getting more intrinsic relationships. As an application case of this mechanism, one relevant web page finding algorithm is proposed to uncover intrinsic relationship among web pages from their hyperlink patterns, and find more semantic relevant web pages. The experimental evaluation shows the feasibility and effectiveness of the algorithm and demonstrates the potential of the proposed mechanism in web applications.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data mining refers to extracting or "mining" knowledge from large amounts of data. It is an increasingly popular field that uses statistical, visualization, machine learning, and other data manipulation and knowledge extraction techniques aimed at gaining an insight into the relationships and patterns hidden in the data. Availability of digital data within picture archiving and communication systems raises a possibility of health care and research enhancement associated with manipulation, processing and handling of data by computers.That is the basis for computer-assisted radiology development. Further development of computer-assisted radiology is associated with the use of new intelligent capabilities such as multimedia support and data mining in order to discover the relevant knowledge for diagnosis. It is very useful if results of data mining can be communicated to humans in an understandable way. In this paper, we present our work on data mining in medical image archiving systems. We investigate the use of a very efficient data mining technique, a decision tree, in order to learn the knowledge for computer-assisted image analysis. We apply our method to the classification of x-ray images for lung cancer diagnosis. The proposed technique is based on an inductive decision tree learning algorithm that has low complexity with high transparency and accuracy. The results show that the proposed algorithm is robust, accurate, fast, and it produces a comprehensible structure, summarizing the knowledge it induces.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper follows How (2000) who examined 130 Australian mining and energy initial public offerings (IPOs) from 1979 to 1990 to report an average 107.18 % underpricing return by those IPOs. This study updates that report by investigating 127 Australian mining and energy IPOs from 1994 to 2001 to find a substantially lower 17.93 % average first day return. These updated findings have implications for both new companies seeking to float and also for the subscribers wishing to invest in these new listings.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The mining and energy sectors are particularly publicly sensitive sectors and subject to a high degree of public scrutiny. Evan and Freeman (1993) suggest that such public scrutiny needs may be better met by having direct public stakeholder representation on the board of directors. Similarly, Bilimoria (2000) argues a strong commercial case for engaging women on boards. This paper investigates the number and proportion of non equity holding public stakeholder directors and the number and proportion of women directors on the boards of Australian mining and energy company initial public offerings (IPOs) and reports a paucity of public stakeholder directors and also a low proportional female representation on such IPO boards.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In text categorization applications, class imbalance, which refers to an uneven data distribution where one class is represented by far more less instances than the others, is a commonly encountered problem. In such a situation, conventional classifiers tend to have a strong performance bias, which results in high accuracy rate on the majority class but very low rate on the minorities. An extreme strategy for unbalanced, learning is to discard the majority instances and apply one-class classification to the minority class. However, this could easily cause another type of bias, which increases the accuracy rate on minorities by sacrificing the majorities. This paper aims to investigate approaches that reduce these two types of performance bias and improve the reliability of discovered classification rules. Experimental results show that the inexact field learning method and parameter optimized one-class classifiers achieve more balanced performance than the standard approaches.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Glenelg-Hopkins area is a large regional watershed (2.6 million ha) in southwest Victoria that has been extensively cleared for agriculture. In-stream electrical conductivity (EC) in relation to remnant native vegetation is examined from the headwaters to the upper extent of the estuary of the Glenelg River. Five water quality gauging stations were selected. Their contributing subcatchments represent a continuum of disturbance. Proportions of native vegetation ranged from ∼100% at the headwaters of the river to ∼30% at the furthest downstream gauge station. The relationship between remnant vegetation and in-stream EC was examined using aggregated and non-aggregated land use statistics over a period of 22 years from three land use maps. Increased proportions of native vegetation were significantly negatively correlated with in-stream EC and were consistent across all scenarios investigated.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data streams are usually generated in an online fashion characterized by huge volume, rapid unpredictable rates, and fast changing data characteristics. It has been hence recognized that mining over streaming data requires the problem of limited computational resources to be adequately addressed. Since the arrival rate of data streams can significantly increase and exceed the CPU capacity, the machinery must adapt to this change to guarantee the timeliness of the results. We present an online algorithm to approximate a set of frequent patterns from a sliding window over the underlying data stream - given apriori CPU capacity. The algorithm automatically detects overload situations and can adaptively shed unprocessed data to guarantee the timely results. We theoretically prove, using probabilistic and deterministic techniques, that the error on the output results is bounded within a pre-specified threshold. The empirical results on various datasets also confirmed the feasiblity of our proposal.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Traditional approaches such as theorem proving and model checking have been successfully used to analyze security protocols. Ideally, they assume the data communication is reliable and require the user to predetermine authentication goals. However, missing and inconsistent data have been greatly ignored, and the increasingly complicated security protocol makes it difficult to predefine such goals. This paper presents a novel approach to analyze security protocols using association rule mining. It is able to not only validate the reliability of transactions but also discover potential correlations between secure messages. The algorithm and experiment demonstrate that our approaches are useful and promising.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper tells a story of synergism of two cutting edge technologies — agents and data mining. By integrating these two technologies, the power for each of them is enhanced. Integrating agents into data mining systems, or constructing data mining systems from agent perspectives, the flexibility of data mining systems can be greatly improved. New data mining techniques can add to the systems dynamically in the form of agents, while the out-of-date ones can also be deleted from systems at run-time. Equipping agents with data mining capabilities, the agents are much smarter and more adaptable. In this way, the performance of these agent systems can be improved. A new way to integrate these two techniques –ontology-based integration is also discussed. Case studies will be given to demonstrate such mutual enhancement.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present an algebraic attack approach to a family of irregularly clock-controlled bit-based linear feedback shift register systems. In the general set-up, we assume that the output bit of one shift register controls the clocking of other registers in the system and produces a family of equations relating the output bits to the internal state bits. We then apply this general theory to four specific stream ciphers: the (strengthened) stop-and-go generator, the alternating step generator, the self-decimated generator and the step1/step2 generator. In the case of the strengthened stop-and-go generator and of the self-decimated generator, we obtain the initial state of the registers in a significantly faster time than any other known attack. In the other two situations, we do better than or as well as all attacks but the correlation attack. In all cases, we demonstrate that the degree of a functional relationship between the registers can be bounded by two. Finally, we determine the effective key length of all four systems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we propose a model for discovering frequent sequential patterns, phrases, which can be used as profile descriptors of documents. It is indubitable that we can obtain numerous phrases using data mining algorithms. However, it is difficult to use these phrases effectively for answering what users want. Therefore, we present a pattern taxonomy extraction model which performs the task of extracting descriptive frequent sequential patterns by pruning the meaningless ones. The model then is extended and tested by applying it to the information filtering system. The results of the experiment show that pattern-based methods outperform the keyword-based methods. The results also indicate that removal of meaningless patterns not only reduces the cost of computation but also improves the effectiveness of the system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data mining is playing an important role in decision making for business activities and governmental administration. Since many organizations or their divisions do not possess the in-house expertise and infrastructure for data mining, it is beneficial to delegate data mining tasks to external service providers. However, the organizations or divisions may lose of private information during the delegating process. In this paper, we present a Bloom filter based solution to enable organizations or their divisions to delegate the tasks of mining association rules while protecting data privacy. Our approach can achieve high precision in data mining by only trading-off storage requirements, instead of by trading-off the level of privacy preserving.