994 resultados para Mega-mining


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Class imbalance in textual data is one important factor that affects the reliability of text mining. For imbalanced textual data, conventional classifiers tend to have a strong performance bias, which results in high accuracy rate on the majority class but very low rate on the minorities. An extreme strategy for unbalanced learning is to discard the majority instances and apply one-class classification to the minority class. However, this could easily cause another type of bias, which increases the accuracy rate on minorities by sacrificing the majorities. This chapter aims to investigate approaches that reduce these two types of performance bias and improve the reliability of discovered classification rules. Experimental results show that the inexact field learning method and parameter optimized one class classifiers achieve more balanced performance than the standard approaches.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Video event detection is an effective way to automatically understand the semantic content of the video. However, due to the mismatch between low-level visual features and high-level semantics, the research of video event detection encounters a number of challenges, such as how to extract the suitable information from video, how to represent the event, how to build up reasoning mechanism to infer the event according to video information. In this paper, we propose a novel event detection method. The method detects the video event based on the semantic trajectory, which is a high-level semantic description of the moving object’s trajectory in the video. The proposed method consists of three phases to transform low-level visual features to middle-level raw trajectory information and then to high-level semantic trajectory information. Event reasoning is then carried out with the assistance of semantic trajectory information and background knowledge. Additionally, to release the users’ burden in manual event definition, a method is further proposed to automatically discover the event-related semantic trajectory pattern from the sample semantic trajectories. Furthermore, in order to effectively use the discovered semantic trajectory patterns, the associative classification-based event detection framework is adopted to discover the possibly occurred event. Empirical studies show our methods can effectively and efficiently detect video events.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The problem of extracting infrequent patterns from streams and building associations between these patterns is becoming increasingly relevant today as many events of interest such as attacks in network data or unusual stories in news data occur rarely. The complexity of the problem is compounded when a system is required to deal with data from multiple streams. To address these problems, we present a framework that combines the time based association mining with a pyramidal structure that allows a rolling analysis of the stream and maintains a synopsis of the data without requiring increasing memory resources. We apply the algorithms and show the usefulness of the techniques. © 2007 Crown Copyright.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Many previous approaches to frequent episode discovery only accept simple sequences. Although a recent approach has been able to nd frequent episodes from complex sequences, the discovered sets are neither condensed nor accurate. This paper investigates the discovery of condensed sets of frequent episodes from complex sequences. We adopt a novel anti-monotonic frequency measure based on non-redundant occurrences, and dene a condensed set, nDaCF (the set of non-derivable approximately closed frequent episodes) within a given maximal error bound of support. We then introduce a series of effective pruning strategies, and develop a method, nDaCF-Miner, for discovering nDaCF sets. Experimental results show that, when the error bound is somewhat high, the discovered nDaCF sets are two orders of magnitude smaller than complete sets, and nDaCF-miner is more efficient than previous mining approaches. In addition, the nDaCF sets are more accurate than the sets found by previous approaches.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The development and application of computational data mining techniques in financial fraud detection and business failure prediction has become a popular cross-disciplinary research area in recent times involving financial economists, forensic accountants and computational modellers. Some of the computational techniques popularly used in the context of - financial fraud detection and business failure prediction can also be effectively applied in the detection of fraudulent insurance claims and therefore, can be of immense practical value to the insurance industry. We provide a comparative analysis of prediction performance of a battery of data mining techniques using real-life automotive insurance fraud data. While the data we have used in our paper is US-based, the computational techniques we have tested can be adapted and generally applied to detect similar insurance frauds in other countries as well where an organized automotive insurance industry exists.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The thesis has researched a set of critical problems in data mining and has proposed four advanced pattern mining algorithm to discover the most interesting and useful data patterns highly relevant to the user’s application targets from the data is represented in complex structures.