23 resultados para frequent episodes

em Indian Institute of Science - Bangalore - Índia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Frequent episode discovery is a popular framework for mining data available as a long sequence of events. An episode is essentially a short ordered sequence of event types and the frequency of an episode is some suitable measure of how often the episode occurs in the data sequence. Recently,we proposed a new frequency measure for episodes based on the notion of non-overlapped occurrences of episodes in the event sequence, and showed that, such a definition, in addition to yielding computationally efficient algorithms, has some important theoretical properties in connecting frequent episode discovery with HMM learning. This paper presents some new algorithms for frequent episode discovery under this non-overlapped occurrences-based frequency definition. The algorithms presented here are better (by a factor of N, where N denotes the size of episodes being discovered) in terms of both time and space complexities when compared to existing methods for frequent episode discovery. We show through some simulation experiments, that our algorithms are very efficient. The new algorithms presented here have arguably the least possible orders of spaceand time complexities for the task of frequent episode discovery.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Discovering patterns in temporal data is an important task in Data Mining. A successful method for this was proposed by Mannila et al. [1] in 1997. In their framework, mining for temporal patterns in a database of sequences of events is done by discovering the so called frequent episodes. These episodes characterize interesting collections of events occurring relatively close to each other in some partial order. However, in this framework(and in many others for finding patterns in event sequences), the ordering of events in an event sequence is the only allowed temporal information. But there are many applications where the events are not instantaneous; they have time durations. Interesting episodesthat we want to discover may need to contain information regarding event durations etc. In this paper we extend Mannila et al.’s framework to tackle such issues. In our generalized formulation, episodes are defined so that much more temporal information about events can be incorporated into the structure of an episode. This significantly enhances the expressive capability of the rules that can be discovered in the frequent episode framework. We also present algorithms for discovering such generalized frequent episodes.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Frequent episode discovery is a popular framework for temporal pattern discovery in event streams. An episode is a partially ordered set of nodes with each node associated with an event type. Currently algorithms exist for episode discovery only when the associated partial order is total order (serial episode) or trivial (parallel episode). In this paper, we propose efficient algorithms for discovering frequent episodes with unrestricted partial orders when the associated event-types are unique. These algorithms can be easily specialized to discover only serial or parallel episodes. Also, the algorithms are flexible enough to be specialized for mining in the space of certain interesting subclasses of partial orders. We point out that frequency alone is not a sufficient measure of interestingness in the context of partial order mining. We propose a new interestingness measure for episodes with unrestricted partial orders which, when used along with frequency, results in an efficient scheme of data mining. Simulations are presented to demonstrate the effectiveness of our algorithms.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper we consider the process of discovering frequent episodes in event sequences. The most computationally intensive part of this process is that of counting the frequencies of a set of candidate episodes. We present two new frequency counting algorithms for speeding up this part. These, referred to as non-overlapping and non-inteleaved frequency counts, are based on directly counting suitable subsets of the occurrences of an episode. Hence they are different from the frequency counts of Mannila et al [1], where they count the number of windows in which the episode occurs. Our new frequency counts offer a speed-up factor of 7 or more on real and synthetic datasets. We also show how the new frequency counts can be used when the events in episodes have time-durations as well.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Most pattern mining methods yield a large number of frequent patterns, and isolating a small relevant subset of patterns is a challenging problem of current interest. In this paper, we address this problem in the context of discovering frequent episodes from symbolic time-series data. Motivated by the Minimum Description Length principle, we formulate the problem of selecting relevant subset of patterns as one of searching for a subset of patterns that achieves best data compression. We present algorithms for discovering small sets of relevant non-redundant episodes that achieve good data compression. The algorithms employ a novel encoding scheme and use serial episodes with inter-event constraints as the patterns. We present extensive simulation studies with both synthetic and real data, comparing our method with the existing schemes such as GoKrimp and SQS. We also demonstrate the effectiveness of these algorithms on event sequences from a composable conveyor system; this system represents a new application area where use of frequent patterns for compressing the event sequence is likely to be important for decision support and control.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Understanding the functioning of a neural system in terms of its underlying circuitry is an important problem in neuroscience. Recent d evelopments in electrophysiology and imaging allow one to simultaneously record activities of hundreds of neurons. Inferring the underlying neuronal connectivity patterns from such multi-neuronal spike train data streams is a challenging statistical and computational problem. This task involves finding significant temporal patterns from vast amounts of symbolic time series data. In this paper we show that the frequent episode mining methods from the field of temporal data mining can be very useful in this context. In the frequent episode discovery framework, the data is viewed as a sequence of events, each of which is characterized by an event type and its time of occurrence and episodes are certain types of temporal patterns in such data. Here we show that, using the set of discovered frequent episodes from multi-neuronal data, one can infer different types of connectivity patterns in the neural system that generated it. For this purpose, we introduce the notion of mining for frequent episodes under certain temporal constraints; the structure of these temporal constraints is motivated by the application. We present algorithms for discovering serial and parallel episodes under these temporal constraints. Through extensive simulation studies we demonstrate that these methods are useful for unearthing patterns of neuronal network connectivity.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Frequent episode discovery framework is a popular framework in temporal data mining with many applications. Over the years, many different notions of frequencies of episodes have been proposed along with different algorithms for episode discovery. In this paper, we present a unified view of all the apriori-based discoverymethods for serial episodes under these different notions of frequencies. Specifically, we present a unified view of the various frequency counting algorithms. We propose a generic counting algorithm such that all current algorithms are special cases of it. This unified view allows one to gain insights into different frequencies, and we present quantitative relationships among different frequencies.Our unified view also helps in obtaining correctness proofs for various counting algorithms as we show here. It also aids in understanding and obtaining the anti-monotonicity properties satisfied by the various frequencies, the properties exploited by the candidate generation step of any apriori-based method. We also point out how our unified view of counting helps to consider generalization of the algorithm to count episodes with general partial orders.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Frequent episode discovery framework is a popular framework in temporal data mining with many applications. Over the years, many different notions of frequencies of episodes have been proposed along with different algorithms for episode discovery. In this paper, we present a unified view of all the apriori-based discovery methods for serial episodes under these different notions of frequencies. Specifically, we present a unified view of the various frequency counting algorithms. We propose a generic counting algorithm such that all current algorithms are special cases of it. This unified view allows one to gain insights into different frequencies, and we present quantitative relationships among different frequencies. Our unified view also helps in obtaining correctness proofs for various counting algorithms as we show here. It also aids in understanding and obtaining the anti-monotonicity properties satisfied by the various frequencies, the properties exploited by the candidate generation step of any apriori-based method. We also point out how our unified view of counting helps to consider generalization of the algorithm to count episodes with general partial orders.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Frequent episode discovery is a popular framework for pattern discovery from sequential data. It has found many applications in domains like alarm management in telecommunication networks, fault analysis in the manufacturing plants, predicting user behavior in web click streams and so on. In this paper, we address the discovery of serial episodes. In the episodes context, there have been multiple ways to quantify the frequency of an episode. Most of the current algorithms for episode discovery under various frequencies are apriori-based level-wise methods. These methods essentially perform a breadth-first search of the pattern space. However currently there are no depth-first based methods of pattern discovery in the frequent episode framework under many of the frequency definitions. In this paper, we try to bridge this gap. We provide new depth-first based algorithms for serial episode discovery under non-overlapped and total frequencies. Under non-overlapped frequency, we present algorithms that can take care of span constraint and gap constraint on episode occurrences. Under total frequency we present an algorithm that can handle span constraint. We provide proofs of correctness for the proposed algorithms. We demonstrate the effectiveness of the proposed algorithms by extensive simulations. We also give detailed run-time comparisons with the existing apriori-based methods and illustrate scenarios under which the proposed pattern-growth algorithms perform better than their apriori counterparts. (C) 2013 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Frequent episode discovery is one of the methods used for temporal pattern discovery in sequential data. An episode is a partially ordered set of nodes with each node associated with an event type. For more than a decade, algorithms existed for episode discovery only when the associated partial order is total (serial episode) or trivial (parallel episode). Recently, the literature has seen algorithms for discovering episodes with general partial orders. In frequent pattern mining, the threshold beyond which a pattern is inferred to be interesting is typically user-defined and arbitrary. One way of addressing this issue in the pattern mining literature has been based on the framework of statistical hypothesis testing. This paper presents a method of assessing statistical significance of episode patterns with general partial orders. A method is proposed to calculate thresholds, on the non-overlapped frequency, beyond which an episode pattern would be inferred to be statistically significant. The method is first explained for the case of injective episodes with general partial orders. An injective episode is one where event-types are not allowed to repeat. Later it is pointed out how the method can be extended to the class of all episodes. The significance threshold calculations for general partial order episodes proposed here also generalize the existing significance results for serial episodes. Through simulations studies, the usefulness of these statistical thresholds in pruning uninteresting patterns is illustrated. (C) 2014 Elsevier Inc. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With the emergence of large-volume and high-speed streaming data, the recent techniques for stream mining of CFIpsilas (closed frequent itemsets) will become inefficient. When concept drift occurs at a slow rate in high speed data streams, the rate of change of information across different sliding windows will be negligible. So, the user wonpsilat be devoid of change in information if we slide window by multiple transactions at a time. Therefore, we propose a novel approach for mining CFIpsilas cumulatively by making sliding width(ges1) over high speed data streams. However, it is nontrivial to mine CFIpsilas cumulatively over stream, because such growth may lead to the generation of exponential number of candidates for closure checking. In this study, we develop an efficient algorithm, stream-close, for mining CFIpsilas over stream by exploring some interesting properties. Our performance study reveals that stream-close achieves good scalability and has promising results.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The region around Waclakkancheri, in the province of Kerala, India, which lies in the vicinity of Palghat-Cauvery ;hear zone (within the Precambrian crystalline terrain), has been a site of microseismic activity since 1989. Earlier studies had identified a prominent WNW-ESE structure overprinting on the E-W trending lineaments associated with Palghat-Cauvery shear zone. We have mapped this structure, located in a chamockite quarry near Desamangalam, Waclakkancheri, which we identify as a ca. 30 km-long south dipping reverse fault. This article presents the characteristics of this fault zone exposed on the exhumed crystalline basement and discusses its significance in understanding the earthquake potential of the region. This brittle deformation zone consists of fracture sets with small-scale displacement and slip planes with embedded fault gouges. The macroscopic as well as the microscopic studies of this fault zone indicate that it evolved through different episodes of faulting in the presence of fluids. The distinct zones within consolidated gouge and the cross cutting relationship of fractures indicate episodic fault activity. At least four faulting episodes can be recognized based on the sequential development of different structural elements in the fault rocks. The repeated ruptures are evident along this shear zone and the cyclic behavior of this fault consists of co-seismic ruptures alternating with inter-seismic periods, which is characterized by the sealed fractures and consolidated gouge. The fault zone shows a minimum accumulated dip/oblique slip of 2.1 m in the reverse direction with a possible characteristic slip of 52 cm (for each event). The ESR dating of fault gouge indicates that the deformation zone records a major event in the Middle Quaternary. The empirical relationships between fault length and slip show that this fault may generate events M >= 6. The above factors suggest that this fault may be characterized as potentially active. Our study offers some new pointers that can be used in other slow deforming cratonic hinterlands in exploring the discrete active faults.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The nonlinear singular integral equation of transonic flow is examined in the free-stream Mach number range where only solutions with shocks are known to exist. It is shown that, by the addition of an artificial viscosity term to the integral equation, even the direct iterative scheme, with the linear solution as the initial iterate, leads to convergence. Detailed tables indicating how the solution varies with changes in the parameters of the artificial viscosity term are also given. In the best cases (when the artificial viscosity is smallest), the solutions compare well with known results, their characteristic feature being the representation of the shock by steep gradients rather than by abrupt discontinuities. However, 'sharp-shock solutions' have also been obtained by the implementation of a quadratic iterative scheme with the 'artificial viscosity solution' as the initial iterate; the converged solution with a sharp shock is obtained with only a few more iterates. Finally, a review is given of various shock-capturing and shock-fitting schemes for the transonic flow equations in general, and for the transonic integral equation in particular, frequent comparisons being made with the approach of this paper.