61 resultados para Frequent itemset

em Indian Institute of Science - Bangalore - Índia


Relevância:

20.00% 20.00%

Publicador:

Resumo:

With the emergence of large-volume and high-speed streaming data, the recent techniques for stream mining of CFIpsilas (closed frequent itemsets) will become inefficient. When concept drift occurs at a slow rate in high speed data streams, the rate of change of information across different sliding windows will be negligible. So, the user wonpsilat be devoid of change in information if we slide window by multiple transactions at a time. Therefore, we propose a novel approach for mining CFIpsilas cumulatively by making sliding width(ges1) over high speed data streams. However, it is nontrivial to mine CFIpsilas cumulatively over stream, because such growth may lead to the generation of exponential number of candidates for closure checking. In this study, we develop an efficient algorithm, stream-close, for mining CFIpsilas over stream by exploring some interesting properties. Our performance study reveals that stream-close achieves good scalability and has promising results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Frequent episode discovery is a popular framework for mining data available as a long sequence of events. An episode is essentially a short ordered sequence of event types and the frequency of an episode is some suitable measure of how often the episode occurs in the data sequence. Recently,we proposed a new frequency measure for episodes based on the notion of non-overlapped occurrences of episodes in the event sequence, and showed that, such a definition, in addition to yielding computationally efficient algorithms, has some important theoretical properties in connecting frequent episode discovery with HMM learning. This paper presents some new algorithms for frequent episode discovery under this non-overlapped occurrences-based frequency definition. The algorithms presented here are better (by a factor of N, where N denotes the size of episodes being discovered) in terms of both time and space complexities when compared to existing methods for frequent episode discovery. We show through some simulation experiments, that our algorithms are very efficient. The new algorithms presented here have arguably the least possible orders of spaceand time complexities for the task of frequent episode discovery.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Frequent episode discovery framework is a popular framework in temporal data mining with many applications. Over the years, many different notions of frequencies of episodes have been proposed along with different algorithms for episode discovery. In this paper, we present a unified view of all the apriori-based discoverymethods for serial episodes under these different notions of frequencies. Specifically, we present a unified view of the various frequency counting algorithms. We propose a generic counting algorithm such that all current algorithms are special cases of it. This unified view allows one to gain insights into different frequencies, and we present quantitative relationships among different frequencies.Our unified view also helps in obtaining correctness proofs for various counting algorithms as we show here. It also aids in understanding and obtaining the anti-monotonicity properties satisfied by the various frequencies, the properties exploited by the candidate generation step of any apriori-based method. We also point out how our unified view of counting helps to consider generalization of the algorithm to count episodes with general partial orders.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we consider the process of discovering frequent episodes in event sequences. The most computationally intensive part of this process is that of counting the frequencies of a set of candidate episodes. We present two new frequency counting algorithms for speeding up this part. These, referred to as non-overlapping and non-inteleaved frequency counts, are based on directly counting suitable subsets of the occurrences of an episode. Hence they are different from the frequency counts of Mannila et al [1], where they count the number of windows in which the episode occurs. Our new frequency counts offer a speed-up factor of 7 or more on real and synthetic datasets. We also show how the new frequency counts can be used when the events in episodes have time-durations as well.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Discovering patterns in temporal data is an important task in Data Mining. A successful method for this was proposed by Mannila et al. [1] in 1997. In their framework, mining for temporal patterns in a database of sequences of events is done by discovering the so called frequent episodes. These episodes characterize interesting collections of events occurring relatively close to each other in some partial order. However, in this framework(and in many others for finding patterns in event sequences), the ordering of events in an event sequence is the only allowed temporal information. But there are many applications where the events are not instantaneous; they have time durations. Interesting episodesthat we want to discover may need to contain information regarding event durations etc. In this paper we extend Mannila et al.’s framework to tackle such issues. In our generalized formulation, episodes are defined so that much more temporal information about events can be incorporated into the structure of an episode. This significantly enhances the expressive capability of the rules that can be discovered in the frequent episode framework. We also present algorithms for discovering such generalized frequent episodes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Frequent episode discovery framework is a popular framework in temporal data mining with many applications. Over the years, many different notions of frequencies of episodes have been proposed along with different algorithms for episode discovery. In this paper, we present a unified view of all the apriori-based discovery methods for serial episodes under these different notions of frequencies. Specifically, we present a unified view of the various frequency counting algorithms. We propose a generic counting algorithm such that all current algorithms are special cases of it. This unified view allows one to gain insights into different frequencies, and we present quantitative relationships among different frequencies. Our unified view also helps in obtaining correctness proofs for various counting algorithms as we show here. It also aids in understanding and obtaining the anti-monotonicity properties satisfied by the various frequencies, the properties exploited by the candidate generation step of any apriori-based method. We also point out how our unified view of counting helps to consider generalization of the algorithm to count episodes with general partial orders.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Frequent episode discovery is a popular framework for pattern discovery from sequential data. It has found many applications in domains like alarm management in telecommunication networks, fault analysis in the manufacturing plants, predicting user behavior in web click streams and so on. In this paper, we address the discovery of serial episodes. In the episodes context, there have been multiple ways to quantify the frequency of an episode. Most of the current algorithms for episode discovery under various frequencies are apriori-based level-wise methods. These methods essentially perform a breadth-first search of the pattern space. However currently there are no depth-first based methods of pattern discovery in the frequent episode framework under many of the frequency definitions. In this paper, we try to bridge this gap. We provide new depth-first based algorithms for serial episode discovery under non-overlapped and total frequencies. Under non-overlapped frequency, we present algorithms that can take care of span constraint and gap constraint on episode occurrences. Under total frequency we present an algorithm that can handle span constraint. We provide proofs of correctness for the proposed algorithms. We demonstrate the effectiveness of the proposed algorithms by extensive simulations. We also give detailed run-time comparisons with the existing apriori-based methods and illustrate scenarios under which the proposed pattern-growth algorithms perform better than their apriori counterparts. (C) 2013 Elsevier B.V. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The nonlinear singular integral equation of transonic flow is examined in the free-stream Mach number range where only solutions with shocks are known to exist. It is shown that, by the addition of an artificial viscosity term to the integral equation, even the direct iterative scheme, with the linear solution as the initial iterate, leads to convergence. Detailed tables indicating how the solution varies with changes in the parameters of the artificial viscosity term are also given. In the best cases (when the artificial viscosity is smallest), the solutions compare well with known results, their characteristic feature being the representation of the shock by steep gradients rather than by abrupt discontinuities. However, 'sharp-shock solutions' have also been obtained by the implementation of a quadratic iterative scheme with the 'artificial viscosity solution' as the initial iterate; the converged solution with a sharp shock is obtained with only a few more iterates. Finally, a review is given of various shock-capturing and shock-fitting schemes for the transonic flow equations in general, and for the transonic integral equation in particular, frequent comparisons being made with the approach of this paper.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A simple triggered vacuum gap has previously been described by the authors in this journal (see ibid., vol.5, 415, 1972). Further studies have resulted in improvement of the performance with regard to sensitivity and consistency of the trigger characteristics and immunity from bridging due to metal particles eroded from the arc. The earlier design suffered from rather frequent bridging of the auxiliary gap and showed rather wide scatter in its trigger characteristics. In the present design thermally stable materials like fused quartz, machinable ceramic 'Supramica 500' (Mycalex Corporation of America), lead titanate, barium titanate (LCC HTD) and silicon carbide have been used to insulate the trigger electrode from the cathode. Consistent triggerings free from bridging, at relatively low voltages of 200-400 V have been obtained.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Understanding the functioning of a neural system in terms of its underlying circuitry is an important problem in neuroscience. Recent d evelopments in electrophysiology and imaging allow one to simultaneously record activities of hundreds of neurons. Inferring the underlying neuronal connectivity patterns from such multi-neuronal spike train data streams is a challenging statistical and computational problem. This task involves finding significant temporal patterns from vast amounts of symbolic time series data. In this paper we show that the frequent episode mining methods from the field of temporal data mining can be very useful in this context. In the frequent episode discovery framework, the data is viewed as a sequence of events, each of which is characterized by an event type and its time of occurrence and episodes are certain types of temporal patterns in such data. Here we show that, using the set of discovered frequent episodes from multi-neuronal data, one can infer different types of connectivity patterns in the neural system that generated it. For this purpose, we introduce the notion of mining for frequent episodes under certain temporal constraints; the structure of these temporal constraints is motivated by the application. We present algorithms for discovering serial and parallel episodes under these temporal constraints. Through extensive simulation studies we demonstrate that these methods are useful for unearthing patterns of neuronal network connectivity.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The southern Western Ghats tropical montane cloud forest sites (Gavi, Periyar, High wavys and Venniyar), which are characterized by frequent or seasonal cloud cover at the vegetation level, are considered one of the most threatened ecosystems in India and the world. Three out of four montane cloud forest sites studied in the southern Western Ghats had experienced diminishing trends of seasonal average and total rainfall, especially during summer monsoon season. The highest level of reduction for summer monsoon season was observed at Gavi rainforest station (>20 mm/14 years) in Kerala followed by Venniyar (>20 mm/20 years) site in Tamil Nadu. Average annual and total precipitation increased during the study period irrespective of the seasons over Periyar area, and the greatest values were recorded for season 2 (>25 mm/28 years). Positive trends for winter monsoon rainfall has been observed for three stations (Periyar, High wavys and Venniyar) except Gavi, and the trend was positive and significant (90%) for Periyar and High wavys. Increase in summer monsoon rainfall was observed for Periyar site and the trend was found to be significant (95%).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Tropical forest ruminants disperse several plants; yet, their effectiveness as seed dispersers is not systematically quantified. Information on frequency and extent of frugivory by ruminants is lacking. Techniques such as tree watches or fruit traps adapted from avian frugivore studies are not suitable to study terrestrial frugivores, and conventional camera traps provide little quantitative information. We used a novel time-delay camera-trap technique to assess the effectiveness of ruminants as seed dispersers for Phyllanthus emblica at Mudumalai, southern India. After being triggered by animal movement, cameras were programmed to take pictures every 2 min for the next 6 min, yielding a sequence of four pictures. Actual frugivores were differentiated from mere visitors, who did not consume fruit, by comparing the number of fruit remaining across the time-delay photograph sequence. During a 2-year study using this technique, we found that six terrestrial mammals consumed fallen P. emblica fruit. Additionally, seven mammals and one bird species visited fruiting trees but did not consume fallen fruit. Two ruminants, the Indian chevrotain Moschiola indica and chital Axis axis, were P. emblica's most frequent frugivores and they accounted for over 95% of fruit removal, while murid rodents accounted for less than 1%. Plants like P. emblica that are dispersed mainly by large mammalian frugivores are likely to have limited ability to migrate across fragmented landscapes in response to rapidly changing climates. We hope that more quantitative information on ruminant frugivory will become available with a wider application of our time-delay camera-trap technique.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Pricing is an effective tool to control congestion and achieve quality of service (QoS) provisioning for multiple differentiated levels of service. In this paper, we consider the problem of pricing for congestion control in the case of a network of nodes under a single service class and multiple queues, and present a multi-layered pricing scheme. We propose an algorithm for finding the optimal state dependent price levels for individual queues, at each node. The pricing policy used depends on a weighted average queue length at each node. This helps in reducing frequent price variations and is in the spirit of the random early detection (RED) mechanism used in TCP/IP networks. We observe in our numerical results a considerable improvement in performance using our scheme over that of a recently proposed related scheme in terms of both throughput and delay performance. In particular, our approach exhibits a throughput improvement in the range of 34 to 69 percent in all cases studied (over all routes) over the above scheme.