998 resultados para stream mining


Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we study a challenging problem of mining data generating rules and state transforming rules (i.e., semantics) underneath multiple correlated time series streams. A novel Correlation field-based Semantics Learning Framework (CfSLF) is proposed to learn the semantic. In the framework, we use Hidden Markov Random Field (HMRF) method to model relationship between latent states and observations in multiple correlated time series to learn data generating rules. The transforming rules are learned from corresponding latent state sequence of multiple time series based on Markov chain character. The reusable semantics learned by CfSLF can be fed into various analysis tools, such as prediction or anomaly detection. Moreover, we present two algorithms based on the semantics, which can later be applied to next-n step prediction and anomaly detection. Experiments on real world data sets demonstrate the efficiency and effectiveness of the proposed method. © Springer-Verlag 2013.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Ensemble Stream Modeling and Data-cleaning are sensor information processing systems have different training and testing methods by which their goals are cross-validated. This research examines a mechanism, which seeks to extract novel patterns by generating ensembles from data. The main goal of label-less stream processing is to process the sensed events to eliminate the noises that are uncorrelated, and choose the most likely model without over fitting thus obtaining higher model confidence. Higher quality streams can be realized by combining many short streams into an ensemble which has the desired quality. The framework for the investigation is an existing data mining tool. First, to accommodate feature extraction such as a bush or natural forest-fire event we make an assumption of the burnt area (BA*), sensed ground truth as our target variable obtained from logs. Even though this is an obvious model choice the results are disappointing. The reasons for this are two: One, the histogram of fire activity is highly skewed. Two, the measured sensor parameters are highly correlated. Since using non descriptive features does not yield good results, we resort to temporal features. By doing so we carefully eliminate the averaging effects; the resulting histogram is more satisfactory and conceptual knowledge is learned from sensor streams. Second is the process of feature induction by cross-validating attributes with single or multi-target variables to minimize training error. We use F-measure score, which combines precision and accuracy to determine the false alarm rate of fire events. The multi-target data-cleaning trees use information purity of the target leaf-nodes to learn higher order features. A sensitive variance measure such as f-test is performed during each node’s split to select the best attribute. Ensemble stream model approach proved to improve when using complicated features with a simpler tree classifier. The ensemble framework for data-cleaning and the enhancements to quantify quality of fitness (30% spatial, 10% temporal, and 90% mobility reduction) of sensor led to the formation of streams for sensor-enabled applications. Which further motivates the novelty of stream quality labeling and its importance in solving vast amounts of real-time mobile streams generated today.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador: