962 resultados para Emerging pattern mining
Resumo:
This thesis presents a promising boundary setting method for solving challenging issues in text classification to produce an effective text classifier. A classifier must identify boundary between classes optimally. However, after the features are selected, the boundary is still unclear with regard to mixed positive and negative documents. A classifier combination method to boost effectiveness of the classification model is also presented. The experiments carried out in the study demonstrate that the proposed classifier is promising.
Resumo:
Frequent episode discovery is one of the methods used for temporal pattern discovery in sequential data. An episode is a partially ordered set of nodes with each node associated with an event type. For more than a decade, algorithms existed for episode discovery only when the associated partial order is total (serial episode) or trivial (parallel episode). Recently, the literature has seen algorithms for discovering episodes with general partial orders. In frequent pattern mining, the threshold beyond which a pattern is inferred to be interesting is typically user-defined and arbitrary. One way of addressing this issue in the pattern mining literature has been based on the framework of statistical hypothesis testing. This paper presents a method of assessing statistical significance of episode patterns with general partial orders. A method is proposed to calculate thresholds, on the non-overlapped frequency, beyond which an episode pattern would be inferred to be statistically significant. The method is first explained for the case of injective episodes with general partial orders. An injective episode is one where event-types are not allowed to repeat. Later it is pointed out how the method can be extended to the class of all episodes. The significance threshold calculations for general partial order episodes proposed here also generalize the existing significance results for serial episodes. Through simulations studies, the usefulness of these statistical thresholds in pruning uninteresting patterns is illustrated. (C) 2014 Elsevier Inc. All rights reserved.
Resumo:
Most pattern mining methods yield a large number of frequent patterns, and isolating a small relevant subset of patterns is a challenging problem of current interest. In this paper, we address this problem in the context of discovering frequent episodes from symbolic time-series data. Motivated by the Minimum Description Length principle, we formulate the problem of selecting relevant subset of patterns as one of searching for a subset of patterns that achieves best data compression. We present algorithms for discovering small sets of relevant non-redundant episodes that achieve good data compression. The algorithms employ a novel encoding scheme and use serial episodes with inter-event constraints as the patterns. We present extensive simulation studies with both synthetic and real data, comparing our method with the existing schemes such as GoKrimp and SQS. We also demonstrate the effectiveness of these algorithms on event sequences from a composable conveyor system; this system represents a new application area where use of frequent patterns for compressing the event sequence is likely to be important for decision support and control.
Resumo:
A distinct, 1- to 2-cm-thick flood deposit found in Santa Barbara Basin with a varve-date of 1605 AD ± 5 years testifies to an intensity of precipitation that remains unmatched for later periods when historical or instrumental records can be compared against the varve record. The 1605 AD ± 5 event correlates well with Enzel's (1992) finding of a Silver Lake playa perennial lake at the terminus of the Mojave River (carbon-14-dated 1560 AD ± 90 years), in relative proximity to the rainfall catchment area draining into Santa Barbara Basin. According to Enzel, such a persistent flooding of the Silver Lake playa occurred only once during the last 3,500 years and required a sequence of floods, each comparable in magnitude to the largest floods in the modern record. To gain confidence in dating of the 1605 AD ± 5 event, we compare Southern California's sedimentary evidence against historical reports and multi-proxy time-series that indicate unusual climatic events or are sensitive to changes in large-scale atmospheric circulation patterns. The emerging pattern supports previous suggestions that the first decade of the 17th century was marked by a rapid cooling of the Northern Hemisphere, with some indications for global coverage. A burst of volcanism and the occurrence of El Nino seem to have contributed to the severity of the events. The synopsis of the 1605 AD ± 5 years flood deposit in Santa Barbara Basin, the substantial freshwater body at Silver Lake playa, and much additional paleoclimatic, global evidence testifies for an equatorward shift of global wind patterns as the world experienced an interval of rapid, intense, and widespread cooling.
Resumo:
基于序贯频繁模式挖掘,提出并实现了一种宏观网络流量异常检测的方法。定义了一个新的频繁模式和相对应的异常度概念。对863—917网络安全监测平台提供的全国流量数据进行了实验,得出对应于“橙色八月”的2006年8月上旬流量严重异常的结论。通过与相关的其他传统算法进行对比,如使用绝对流量的算法和简单使用不同小时流量排名的算法,进一步说明序贯频繁模式对网络流量分析的实用性。
Resumo:
In this paper, moving flock patterns are mined from spatio- temporal datasets by incorporating a clustering algorithm. A flock is defined as the set of data that move together for a certain continuous amount of time. Finding out moving flock patterns using clustering algorithms is a potential method to find out frequent patterns of movement in large trajectory datasets. In this approach, SPatial clusteRing algoRithm thrOugh sWarm intelligence (SPARROW) is the clustering algorithm used. The advantage of using SPARROW algorithm is that it can effectively discover clusters of widely varying sizes and shapes from large databases. Variations of the proposed method are addressed and also the experimental results show that the problem of scalability and duplicate pattern formation is addressed. This method also reduces the number of patterns produced
Resumo:
Herbivore-induced plant volatiles are important host finding cues for larval parasitoids, and similarly, insect oviposition might elicit the release of plant volatiles functioning as host finding cues for egg parasitoids. We hypothesized that egg parasitoids also might utilize HIPVs of emerging larvae to locate plants with host eggs. We, therefore, assessed the olfactory response of two egg parasitoids, a generalist, Trichogramma pretiosum (Tricogrammatidae), and a specialist, Telenomus remus (Scelionidae) to HIPVs. We used a Y-tube olfactometer to tests the wasps’ responses to volatiles released by young maize plants that were treated with regurgitant from caterpillars of the moth Spodoptera frugiperda (Noctuidae) or were directly attacked by the caterpillars. The results show that the generalist egg parasitoid Tr. pretiosum is innately attracted by volatiles from freshly-damaged plants 0–1 and 2–3 h after regurgitant treatment. During this interval, the volatile blend consisted of green leaf volatiles (GLVs) and a blend of aromatic compounds, mono- and homoterpenes, respectively. Behavioral assays with synthetic GLVs confirmed their attractiveness to Tr. pretiosum. The generalist learned the more complex volatile blends released 6–7 h after induction, which consisted mainly of sesquiterpenes. The specialist T. remus on the other hand was attracted only to volatiles emitted from fresh and old damage after associating these volatiles with oviposition. Taken together, these results strengthen the emerging pattern that egg and larval parasitoids behave in a similar way in that generalists can respond innately to HIPVs, while specialists seems to rely more on associative learning.
Resumo:
Web transaction data between Web visitors and Web functionalities usually convey user task-oriented behavior pattern. Mining such type of click-stream data will lead to capture usage pattern information. Nowadays Web usage mining technique has become one of most widely used methods for Web recommendation, which customizes Web content to user-preferred style. Traditional techniques of Web usage mining, such as Web user session or Web page clustering, association rule and frequent navigational path mining can only discover usage pattern explicitly. They, however, cannot reveal the underlying navigational activities and identify the latent relationships that are associated with the patterns among Web users as well as Web pages. In this work, we propose a Web recommendation framework incorporating Web usage mining technique based on Probabilistic Latent Semantic Analysis (PLSA) model. The main advantages of this method are, not only to discover usage-based access pattern, but also to reveal the underlying latent factor as well. With the discovered user access pattern, we then present user more interested content via collaborative recommendation. To validate the effectiveness of proposed approach, we conduct experiments on real world datasets and make comparisons with some existing traditional techniques. The preliminary experimental results demonstrate the usability of the proposed approach.
Resumo:
This paper extends existing understandings of how actors' constructions of ambiguity shape the emergent process of strategic action. We theoretically elaborate the role of rhetoric in exploiting strategic ambiguity, based on analysis of a longitudinal case study of an internationalization strategy within a business school. Our data show that actors use rhetoric to construct three types of strategic ambiguity: protective ambiguity that appeals to common values in order to protect particular interests, invitational ambiguity that appeals to common values in order to invite participation in particular actions, and adaptive ambiguity that enables the temporary adoption of specific values in order to appeal to a particular audience at one point in time. These rhetorical constructions of ambiguity follow a processual pattern that shapes the emergent process of strategic action. Our findings show that (1) the strategic actions that emerge are shaped by the way actors construct and exploit ambiguity, (2) the ambiguity intrinsic to the action is analytically distinct from ambiguity that is constructed and exploited by actors, and (3) ambiguity construction shifts over time to accommodate the emerging pattern of actions.
Resumo:
Sequential pattern mining is an important subject in data mining with broad applications in many different areas. However, previous sequential mining algorithms mostly aimed to calculate the number of occurrences (the support) without regard to the degree of importance of different data items. In this paper, we propose to explore the search space of subsequences with normalized weights. We are not only interested in the number of occurrences of the sequences (supports of sequences), but also concerned about importance of sequences (weights). When generating subsequence candidates we use both the support and the weight of the candidates while maintaining the downward closure property of these patterns which allows to accelerate the process of candidate generation.
Resumo:
Clustering algorithms, pattern mining techniques and associated quality metrics emerged as reliable methods for modeling learners’ performance, comprehension and interaction in given educational scenarios. The specificity of available data such as missing values, extreme values or outliers, creates a challenge to extract significant user models from an educational perspective. In this paper we introduce a pattern detection mechanism with-in our data analytics tool based on k-means clustering and on SSE, silhouette, Dunn index and Xi-Beni index quality metrics. Experiments performed on a dataset obtained from our online e-learning platform show that the extracted interaction patterns were representative in classifying learners. Furthermore, the performed monitoring activities created a strong basis for generating automatic feedback to learners in terms of their course participation, while relying on their previous performance. In addition, our analysis introduces automatic triggers that highlight learners who will potentially fail the course, enabling tutors to take timely actions.
Resumo:
The XML Document Mining track was launched for exploring two main ideas: (1) identifying key problems and new challenges of the emerging field of mining semi-structured documents, and (2) studying and assessing the potential of Machine Learning (ML) techniques for dealing with generic ML tasks in the structured domain, i.e., classification and clustering of semi-structured documents. This track has run for six editions during INEX 2005, 2006, 2007, 2008, 2009 and 2010. The first five editions have been summarized in previous editions and we focus here on the 2010 edition. INEX 2010 included two tasks in the XML Mining track: (1) unsupervised clustering task and (2) semi-supervised classification task where documents are organized in a graph. The clustering task requires the participants to group the documents into clusters without any knowledge of category labels using an unsupervised learning algorithm. On the other hand, the classification task requires the participants to label the documents in the dataset into known categories using a supervised learning algorithm and a training set. This report gives the details of clustering and classification tasks.