964 resultados para frequent episodes


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Frequent episode discovery is a popular framework for mining data available as a long sequence of events. An episode is essentially a short ordered sequence of event types and the frequency of an episode is some suitable measure of how often the episode occurs in the data sequence. Recently,we proposed a new frequency measure for episodes based on the notion of non-overlapped occurrences of episodes in the event sequence, and showed that, such a definition, in addition to yielding computationally efficient algorithms, has some important theoretical properties in connecting frequent episode discovery with HMM learning. This paper presents some new algorithms for frequent episode discovery under this non-overlapped occurrences-based frequency definition. The algorithms presented here are better (by a factor of N, where N denotes the size of episodes being discovered) in terms of both time and space complexities when compared to existing methods for frequent episode discovery. We show through some simulation experiments, that our algorithms are very efficient. The new algorithms presented here have arguably the least possible orders of spaceand time complexities for the task of frequent episode discovery.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Discovering patterns in temporal data is an important task in Data Mining. A successful method for this was proposed by Mannila et al. [1] in 1997. In their framework, mining for temporal patterns in a database of sequences of events is done by discovering the so called frequent episodes. These episodes characterize interesting collections of events occurring relatively close to each other in some partial order. However, in this framework(and in many others for finding patterns in event sequences), the ordering of events in an event sequence is the only allowed temporal information. But there are many applications where the events are not instantaneous; they have time durations. Interesting episodesthat we want to discover may need to contain information regarding event durations etc. In this paper we extend Mannila et al.’s framework to tackle such issues. In our generalized formulation, episodes are defined so that much more temporal information about events can be incorporated into the structure of an episode. This significantly enhances the expressive capability of the rules that can be discovered in the frequent episode framework. We also present algorithms for discovering such generalized frequent episodes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many previous approaches to frequent episode discovery only accept simple sequences. Although a recent approach has been able to nd frequent episodes from complex sequences, the discovered sets are neither condensed nor accurate. This paper investigates the discovery of condensed sets of frequent episodes from complex sequences. We adopt a novel anti-monotonic frequency measure based on non-redundant occurrences, and dene a condensed set, nDaCF (the set of non-derivable approximately closed frequent episodes) within a given maximal error bound of support. We then introduce a series of effective pruning strategies, and develop a method, nDaCF-Miner, for discovering nDaCF sets. Experimental results show that, when the error bound is somewhat high, the discovered nDaCF sets are two orders of magnitude smaller than complete sets, and nDaCF-miner is more efficient than previous mining approaches. In addition, the nDaCF sets are more accurate than the sets found by previous approaches.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The knowledge embedded in an online data stream is likely to change over time due to the dynamic evolution of the stream. Consequently, infrequent episode mining over an online stream, frequent episodes should be adaptively extracted from recently generated stream segments instead of the whole stream. However, almost all existing frequent episode mining approaches find episodes frequently occurring over the whole sequence. This paper proposes and investigates a new problem: online mining of recently frequent episodes over data streams. In order to meet strict requirements of stream mining such as one-scan, adaptive result update and instant result return, we choose a novel frequency metric and define a highly condensed set called the base of recently frequent episodes. We then introduce a one-pass method for mining bases of recently frequent episodes. Experimental results show that the proposed method is capable of finding bases of recently frequent episodes quickly and adaptively. The proposed method outperforms the previous approaches with the advantages of one-pass, instant result update and return, more condensed resulting sets and less space usage.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Frequent episode discovery is a popular framework for temporal pattern discovery in event streams. An episode is a partially ordered set of nodes with each node associated with an event type. Currently algorithms exist for episode discovery only when the associated partial order is total order (serial episode) or trivial (parallel episode). In this paper, we propose efficient algorithms for discovering frequent episodes with unrestricted partial orders when the associated event-types are unique. These algorithms can be easily specialized to discover only serial or parallel episodes. Also, the algorithms are flexible enough to be specialized for mining in the space of certain interesting subclasses of partial orders. We point out that frequency alone is not a sufficient measure of interestingness in the context of partial order mining. We propose a new interestingness measure for episodes with unrestricted partial orders which, when used along with frequency, results in an efficient scheme of data mining. Simulations are presented to demonstrate the effectiveness of our algorithms.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper we consider the process of discovering frequent episodes in event sequences. The most computationally intensive part of this process is that of counting the frequencies of a set of candidate episodes. We present two new frequency counting algorithms for speeding up this part. These, referred to as non-overlapping and non-inteleaved frequency counts, are based on directly counting suitable subsets of the occurrences of an episode. Hence they are different from the frequency counts of Mannila et al [1], where they count the number of windows in which the episode occurs. Our new frequency counts offer a speed-up factor of 7 or more on real and synthetic datasets. We also show how the new frequency counts can be used when the events in episodes have time-durations as well.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Most pattern mining methods yield a large number of frequent patterns, and isolating a small relevant subset of patterns is a challenging problem of current interest. In this paper, we address this problem in the context of discovering frequent episodes from symbolic time-series data. Motivated by the Minimum Description Length principle, we formulate the problem of selecting relevant subset of patterns as one of searching for a subset of patterns that achieves best data compression. We present algorithms for discovering small sets of relevant non-redundant episodes that achieve good data compression. The algorithms employ a novel encoding scheme and use serial episodes with inter-event constraints as the patterns. We present extensive simulation studies with both synthetic and real data, comparing our method with the existing schemes such as GoKrimp and SQS. We also demonstrate the effectiveness of these algorithms on event sequences from a composable conveyor system; this system represents a new application area where use of frequent patterns for compressing the event sequence is likely to be important for decision support and control.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Understanding the functioning of a neural system in terms of its underlying circuitry is an important problem in neuroscience. Recent d evelopments in electrophysiology and imaging allow one to simultaneously record activities of hundreds of neurons. Inferring the underlying neuronal connectivity patterns from such multi-neuronal spike train data streams is a challenging statistical and computational problem. This task involves finding significant temporal patterns from vast amounts of symbolic time series data. In this paper we show that the frequent episode mining methods from the field of temporal data mining can be very useful in this context. In the frequent episode discovery framework, the data is viewed as a sequence of events, each of which is characterized by an event type and its time of occurrence and episodes are certain types of temporal patterns in such data. Here we show that, using the set of discovered frequent episodes from multi-neuronal data, one can infer different types of connectivity patterns in the neural system that generated it. For this purpose, we introduce the notion of mining for frequent episodes under certain temporal constraints; the structure of these temporal constraints is motivated by the application. We present algorithms for discovering serial and parallel episodes under these temporal constraints. Through extensive simulation studies we demonstrate that these methods are useful for unearthing patterns of neuronal network connectivity.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background: Otitis media (OM) is one of the most common childhood diseases. Approximately every third child suffers from recurrent acute otitis media (RAOM), and 5% of all children have persistent middle ear effusion for months during their childhood. Despite numerous studies on the prevention and treatment of OM during the past decades, its management remains challenging and controversial. In this study, the effect of adenoidectomy on the risk for OM, the potential risk factors influencing the development of OM and the frequency of asthma among otitis-prone children were investigated. Subjects and methods: One prospective randomized trial and two retrospective studies were conducted. In the prospective trial, 217 children with RAOM or chronic otitis media with effusion (COME) were randomized to have tympanostomy with or without adenoidectomy. The age of the children at recruitment was between 1 and 4 years. RAOM was defined as having at least 3 episodes of AOM during the last 6 months or at least 5 episodes of AOM during the last 12 months. COME was defined as having persistent middle ear effusion for 2-3 months. The children were followed up for one year. In the first retrospective study, the frequency of childhood infections and allergy was evaluated by a questionnaire among 819 individuals. In the second retrospective study, data of asthma diagnosis were analysed from hospital discharge records of 1616 children who underwent adenoidectomy or had probing of the nasolacrimal duct. Results: In the prospective randomized study, adenoidectomy had no beneficial effect on the prevention of subsequent episodes of AOM. Parental smoking was found to be a significant risk factor for OM even after the insertion of tympanostomy tubes. The frequencies of exposure to tobacco smoke and day-care attendance at the time of randomization were similar among children with RAOM and COME. However, the frequencies of allergy to animal dust and pollen and parental asthma were lower among children with COME than those with RAOM. The questionnaire survey and the hospital discharge data revealed that children who had frequent episodes of OM had an increased risk for asthma. Conclusions: The first surgical intervention to treat an otitis-prone child younger than 4 years should not include adenoidectomy. Interventions to stop parental smoking could significantly reduce the risk for childhood RAOM. Whether an otitis-prone child develops COME or RAOM, seems to be influenced by genetic predisposition more strongly than by environmental risk factors. Children who suffer from repeated upper respiratory tract infections, like OM, may be at increased risk for developing asthma.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

L’insuffisance cardiaque (IC), une maladie chronique caractérisée par un mauvais fonctionnement du muscle cardiaque, entraîne des symptômes comme l’essoufflement, l’œdème et la fatigue. L’IC nécessite l’adoption de comportements d’auto-soins pour prévenir les épisodes de décompensation. Le but de cette recherche est d’évaluer l’intervention infirmière motivationnelle selon les stades de changements (MSSC) sur les comportements d’auto-soins chez des patients IC. Afin de guider l’intervention MSSC, la théorie spécifique aux auto-soins chez les patients IC de Riegel et Dickson (2008) a été retenue ainsi que le modèle d’intervention de Bédard et al. (2006) combinant le modèle transthéorique (Prochaska & DiClemente, 1984) et l’entrevue motivationnelle (Miller & Rollnick, 2006). Il s’agit d’un devis expérimental randomisé (pré et post-test) avec groupe contrôle (N = 15/groupe). Les patients du groupe contrôle ont reçu les soins usuels et les patients du groupe intervention (GI) ont reçu l’intervention MSSC durant trois entretiens. Les mesures de résultats ont été collectées à un mois suite à la randomisation par une assistante de recherche aveugle à la randomisation. L’effet de l’intervention a été évalué par des analyses de covariance sur cinq mesures de résultats : la réalisation et la gestion (générale et spécifique à l’IC) des auto-soins, la confiance aux auto-soins (générale et spécifique à l’IC) et la conviction. L’acceptabilité et la faisabilité ont été évaluées. Les résultats indiquent un effet significatif sur la mesure de confiance à effectuer les auto-soins spécifiques à l’IC. La majorité des participants du GI ont progressé dans leurs stades de changement. Ces résultats soulignent le potentiel de cette approche pour favoriser l’adoption des auto-soins mais une étude à plus large échelle est proposée afin d’évaluer l’effet de cette approche dans un essai clinique randomisé.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Les proliférations nuisibles de la cyanobactérie filamenteuse benthique Lyngbya wollei qui forme des tapis déposés sur les sédiments ont augmenté en fréquence au cours des 30 dernières années dans les rivières, lacs et sources de l'Amérique du Nord. Lyngbya wollei produit des neurotoxines et des composés organiques volatils (géosmin, 2-méthylisobornéol) qui ont des répercussions sur la santé publique de même que des impacts d'ordre socioéconomiques. Cette cyanobactérie est considérée comme un habitat et une source de nourriture de piètre qualité pour les invertébrés en raison de sa gaine robuste et de sa production de toxines. Les proliférations de L. wollei ont été observées pour la première fois en 2005 dans le fleuve Saint-Laurent (SLR; Québec, Canada). Nous avons jugé important de déterminer sa distribution sur un tronçon de 250 km afin d'élaborer des modèles prédictifs de sa présence et biomasse en se basant sur les caractéristiques chimiques et physiques de l'eau. Lyngbya wollei était généralement observé en aval de la confluence de petits tributaires qui irriguent des terres agricoles. L’écoulement d’eaux enrichies à travers la végétation submergée se traduisait par une diminution de la concentration d’azote inorganique dissous (DIN), alors que les concentrations de carbone organique dissous (DOC) et de phosphore total dissous (TDP) demeuraient élevées, produisant un faible rapport DIN :TDP. Selon nos modèles, DOC (effet positif), TP (effet négatif) et DIN :TDP (effet négatif) sont les variables les plus importantes pour expliquer la répartition de cette cyanobactérie. La probabilité que L. wollei soit présent dans le SLR a été prédite avec exactitude dans 72 % à 92 % des cas pour un ensemble de données indépendantes. Nous avons ensuite examiné si les conditions hydrodynamiques, c'est-à-dire le courant généré par les vagues et l'écoulement du fleuve, contrôlent les variations spatiales et temporelles de biomasse de L. wollei dans un grand système fluvial. Nous avons mesuré la biomasse de L. wollei ainsi que les variables chimiques, physiques et météorologiques durant trois ans à 10 sites le long d'un gradient d'exposition au courant et au vent dans un grand (148 km2) lac fluvial du SLR. L'exposition aux vagues et la vitesse du courant contrôlaient les variations de biomasses spatiales et temporelles. La biomasse augmentait de mai à novembre et persistait durant l'hiver. Les variations interannuelles étaient contrôlées par l'écoulement de la rivière (niveau d'eau) avec la crue printanière qui délogeait les tapis de l'année précédente. Les baisses du niveau d'eau et l'augmentation de l'intensité des tempêtes anticipées par les scénarios de changements climatiques pourraient accroître la superficie colonisée par L. wollei de même que son accumulation sur les berges. Par la suite, nous avons évalué l'importance relative de L. wollei par rapport aux macrophytes et aux épiphytes. Nous avons examiné l'influence structurante de l'échelle spatiale sur les variables environnementales et la biomasse de ces producteurs primaires (PP) benthiques. Nous avons testé si leur biomasse reflétait la nature des agrégats d'habitat basées sur l'écogéomorphologie ou plutôt le continuum fluvial. Pour répondre à ces deux questions, nous avons utilisé un design à 3 échelles spatiales dans le SLR: 1) le long d'un tronçon de 250 km, 2) entre les lacs fluviaux localisés dans ce tronçon, 3) à l'intérieur de chaque lac fluvial. Les facteurs environnementaux (conductivité et TP) et la structure spatiale expliquent 59% de la variation de biomasse des trois PP benthiques. Spécifiquement, les variations de biomasses étaient le mieux expliquées par la conductivité (+) pour les macrophytes, par le ratio DIN:TDP (+) et le coefficient d'extinction lumineuse (+) pour les épiphytes et par le DOC (+) et le NH4+ (-) pour L. wollei. La structure spatiale à l'intérieur des lacs fluviaux était la plus importante composante spatiale pour tous les PP benthiques, suggérant que les effets locaux tels que l'enrichissement par les tributaire plutôt que les gradients amont-aval déterminent la biomasse de PP benthiques. Donc, la dynamique des agrégats d'habitat représente un cadre général adéquat pour expliquer les variations spatiales et la grande variété de conditions environnementales supportant des organismes aquatiques dans les grands fleuves. Enfin, nous avons étudié le rôle écologique des tapis de L. wollei dans les écosystèmes aquatiques, en particulier comme source de nourriture et refuge pour l'amphipode Gammarus fasciatus. Nous avons offert aux amphipodes un choix entre des tapis de L. wollei et soit des chlorophytes filamenteuses ou un tapis artificiel de laine acrylique lors d'expériences en laboratoire. Nous avons aussi reconstitué la diète in situ des amphipodes à l'aide du mixing model (d13C et δ15N). Gammarus fasciatus choisissait le substrat offrant le meilleur refuge face à la lumière (Acrylique>Lyngbya=Rhizoclonium>Spirogyra). La présence de saxitoxines, la composition élémentaire des tissus et l'abondance des épiphytes n'ont eu aucun effet sur le choix de substrat. Lyngbya wollei et ses épiphytes constituaient 36 et 24 % de l'alimentation in situ de G. fasciatus alors que les chlorophytes, les macrophytes et les épiphytes associées représentaient une fraction moins importante de son alimentation. Les tapis de cyanobactéries benthiques devraient être considérés comme un bon refuge et une source de nourriture pour les petits invertébrés omnivores tels que les amphipodes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Recognizing there have been few methodologically rigorous cross-national studies of youth alcohol and drug behaviour, state student samples were compared in Australia and the USA. Sampling methods were matched to recruit two independent, state-representative, cross-sectional samples of students in Grades 5, 7 and 9 in Washington State, USA, (n = 2866) and Victoria, Australia (n = 2864) in 2002. Of Washington students in Grade 5 (age 11), 10.3% (95% CI 7.2–14.7) of boys and 5.2% (95% CI 3.4–7.9) of girls reported alcohol use in the past year. Prevalence rates were markedly higher in Victoria (34.2%, 95% CI 28.8–40.1 boys; 21.0%, 95% CI 17.1–25.5 girls). Relative to Washington, the students in Victoria demonstrated a two to three times increased likelihood of reporting substance use (either alcohol, tobacco or illicit drug use), and by Grade 9, experiences of loss-of-control of alcohol use, binge drinking (frequent episodes of five or more alcoholic drinks), and injuries related to alcohol were two to four times higher. The high rates of early age alcohol use in Victoria were associated with frequent, heavy and harmful alcohol use and higher overall exposure to alcohol or other drug use. These findings reveal considerable variation in international rates of both adolescent alcohol misuse and co-occurring drug use and suggest the need for cross-national research to identify policies and practices that contribute to the lower rate of adolescent alcohol and drug use observed in the USA in this study.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In knowledge discovery in single sequences, different results could be discovered from the same sequence when different frequency measures are adopted. It is natural to raise such questions as (1) do these frequency measures reflect actual frequencies accurately? (2) what impacts do frequency measures have on discovered knowledge? (3) are discovered results accurate and reliable? and (4) which measures are appropriate for reflecting frequencies accurately? In this paper, taking three major factors (anti-monotonicity, maximum-frequency and window-width restriction) into account, we identify inaccuracies inherent in seven existing frequency measures, and investigate their impacts on the soundness and completeness of two kinds of knowledge, frequent episodes and episode rules, discovered from single sequences. In order to obtain more accurate frequencies and knowledge, we provide three recommendations for defining appropriate frequency measures. Following the recommendations, we introduce a more appropriate frequency measure. Empirical evaluation reveals the inaccuracies and verifies our findings. 

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Subsequence frequency measurement is a basic and essential problem in knowledge discovery in single sequences. Frequency based knowledge discovery in single sequences tends to be unreliable since different resulting sets may be obtained from a same sequence when different frequency metrics are adopted. In this chapter, we investigate subsequence frequency measurement and its impact on the reliability of knowledge discovery in single sequences. We analyse seven previous frequency metrics, identify their inherent inaccuracies, and explore their impacts on two kinds of knowledge discovered from single sequences, frequent episodes and episode rules. We further give three suggestions for frequency metrics and introduce a new frequency metric in order to improve the reliability. Empirical evaluation reveals the inaccuracies and verifies our findings.