981 resultados para Frequent Sequential Patterns
Resumo:
Sequential pattern mining is an important subject in data mining with broad applications in many different areas. However, previous sequential mining algorithms mostly aimed to calculate the number of occurrences (the support) without regard to the degree of importance of different data items. In this paper, we propose to explore the search space of subsequences with normalized weights. We are not only interested in the number of occurrences of the sequences (supports of sequences), but also concerned about importance of sequences (weights). When generating subsequence candidates we use both the support and the weight of the candidates while maintaining the downward closure property of these patterns which allows to accelerate the process of candidate generation.
Resumo:
The goal of this thesis is the study of a tool that can help analysts in finding sequential patterns. This tool will have a focus on financial markets. A study will be made on how new and relevant knowledge can be mined from real life information, potentially giving investors, market analysts, and economists new basis to make informed decisions. The Ramex Forum algorithm will be used as a basis for the tool, due to its ability to find sequential patterns in financial data. So that it further adapts to the needs of the thesis, a study of relevant improvements to the algorithm will be made. Another important aspect of this algorithm is the way that it displays the patterns found, even with good results it is difficult to find relevant patterns among all the studied samples without a proper result visualization component. As such, different combinations of parameterizations and ways to visualize data will be evaluated and their influence in the analysis of those patterns will be discussed. In order to properly evaluate the utility of this tool, case studies will be performed as a final test. Real information will be used to produce results and those will be evaluated in regards to their accuracy, interest, and relevance.
Resumo:
Resuscitation and stabilization are key issues in Intensive Care Burn Units and early survival predictions help to decide the best clinical action during these phases. Current survival scores of burns focus on clinical variables such as age or the body surface area. However, the evolution of other parameters (e.g. diuresis or fluid balance) during the first days is also valuable knowledge. In this work we suggest a methodology and we propose a Temporal Data Mining algorithm to estimate the survival condition from the patient’s evolution. Experiments conducted on 480 patients show the improvement of survival prediction.
Resumo:
The theoretical context of this study is related with the observational methodology in the context of group games and sports studies, specifically Handball. Thus, this study intends to analyze the performance of the pivot player in the World Cup 2007 - Germany, European 2008 - Norway 2008 and China OG 2008 in a qualitative dimension. Our purpose was to get as much information as possible about the whole activity of the pivot player, by identifying sequential patterns of behaviour or conduct of the player/game, by using the sequential analysis. The observation instrument used to meet the main purpose of this work consists of a combination of format fields (FF) and systems of categories (SC). The codifications undertaken occurred in several handball games. Using this instrument we have shown that it provides support for the purposes for which it was developed, allowing more research into the offensive process of handball. Besides this, it makes possible the analysis of aspects of the game through perspective and contextual sequences, which we consider to be more accurate, to fit the "reality" of a game such as handball.
Resumo:
A major task of traditional temporal event sequence mining is to find all frequent event patterns from a long temporal sequence. In many real applications, however, events are often grouped into different types, and not all types are of equal importance. In this paper, we consider the problem of efficient mining of temporal event sequences which lead to an instance of a specific type of event. Temporal constraints are used to ensure sensibility of the mining results. We will first generalise and formalise the problem of event-oriented temporal sequence data mining. After discussing some unique issues in this new problem, we give a set of criteria, which are adapted from traditional data mining techniques, to measure the quality of patterns to be discovered. Finally we present an algorithm to discover potentially interesting patterns.
Resumo:
An approach based on a linear rate of increase in harvest index (141) with time after anthesis has been used as a simple means-to predict grain growth and yield in many crop simulation models. When applied to diverse situations, however, this approach has been found to introduce significant error in grain yield predictions. Accordingly, this study was undertaken to examine the stability of the HI approach for yield prediction in sorghum [Sorghum bicolor (L.) Moench]. Four field experiments were conducted under nonlimiting water. and N conditions. The experiments were sown at times that ensured a broad range in temperature and radiation conditions. Treatments consisted of two population densities and three genotypes varying in maturity. Frequent sequential harvests were used to monitor crop growth, yield, and the dynamics of 111. Experiments varied greatly in yield and final HI. There was also a tendency for lower HI with later maturity. Harvest index dynamics also varied among experiments and, to a lesser extent, among treatments within experiments. The variation was associated mostly with the linear rate of increase in HI and timing of cessation of that increase. The average rate of HI increase was 0.0198 d(-1), but this was reduced considerably (0.0147) in one experiment that matured in cool conditions. The variations found in IN dynamics could be largely explained by differences in assimilation during grain filling and remobilization of preanthesis assimilate. We concluded that this level of variation in HI dynamics limited the general applicability of the HI approach in yield prediction and suggested a potential alternative for testing.
Resumo:
Este documento foi redigido no âmbito da Tese, do Mestrado em Engenharia Informática na área de Tecnologias do Conhecimento e Decisão, do Departamento de Engenharia Informática, do ISEP, cujo tema é classificação de sons cardíacos usando motifs. Neste trabalho, apresenta-se um algoritmo de classificação de sons cardíacos, capaz de identificar patologias cardíacas. A classificação do som cardíaco é um trabalho desafiante dada a dificuldade em separar os sons ambiente (vozes, respiração, contacto do microfone com superfícies como pele ou tecidos) ou de ruído dos batimentos cardíacos. Esta abordagem seguiu a metodologia de descoberta de padrões SAX (motifs) mais frequentes, em séries temporais relacionando-os com a ocorrência sistólica (S1) e a ocorrência diastólica (S2) do coração. A metodologia seguida mostrou-se eficaz a distinguir sons normais de sons correspondentes a patologia. Os resultados foram publicados na conferência internacional IDEAS’14 [Oliveira, 2014], em Julho deste ano. Numa fase seguinte, desenvolveu-se uma aplicação móvel, capaz de captar os batimentos cardíacos, de os tratar e os classificar. A classificação dos sons é feita usando o método referido no parágrafo anterior. A aplicação móvel, depois de tratar os sons, envia-os para um servidor, onde o programa de classificação é executado, e recebe a resposta da classificação. É também descrita a arquitetura aplicacional desenhada e as componentes que a constituem, as ferramentas e tecnologias utilizadas.
Resumo:
The main objective of this work was to evaluate the variability of the southern rust pathogen Puccinia polysora in Brazil, based on its virulence on a set of maize (Zea mays) cultivars. Sixty single pustule isolates, from different areas of occurrence of southern rust, were evaluated for their virulence to 50 maize experimental hybrids. Six cultivars showed a clear distinction between susceptible and resistant reaction, and were used to characterize the variability of the pathogen. Seventeen virulence patterns were identified among the 60 isolates tested. The most frequent virulence patterns identified, were observed in all locations of sampling, which suggests the absence of geographical differentiation among prevalent populations of P. polysora in Brazil.
Resumo:
Traditional dictionary learning algorithms are used for finding a sparse representation on high dimensional data by transforming samples into a one-dimensional (1D) vector. This 1D model loses the inherent spatial structure property of data. An alternative solution is to employ Tensor Decomposition for dictionary learning on their original structural form —a tensor— by learning multiple dictionaries along each mode and the corresponding sparse representation in respect to the Kronecker product of these dictionaries. To learn tensor dictionaries along each mode, all the existing methods update each dictionary iteratively in an alternating manner. Because atoms from each mode dictionary jointly make contributions to the sparsity of tensor, existing works ignore atoms correlations between different mode dictionaries by treating each mode dictionary independently. In this paper, we propose a joint multiple dictionary learning method for tensor sparse coding, which explores atom correlations for sparse representation and updates multiple atoms from each mode dictionary simultaneously. In this algorithm, the Frequent-Pattern Tree (FP-tree) mining algorithm is employed to exploit frequent atom patterns in the sparse representation. Inspired by the idea of K-SVD, we develop a new dictionary update method that jointly updates elements in each pattern. Experimental results demonstrate our method outperforms other tensor based dictionary learning algorithms.
Resumo:
Reports on the clinical course of mycophenolic acid (MPA)-related colitis in kidney transplant recipients are scarce. This study aimed at assessing MPA-related colitis incidence, risk factors, and progression after kidney transplantation. All kidney transplant patients taking MPA who had colonic biopsies for persistent chronic diarrhea, between 2000 and 2012, at the Kidney Transplantation Unit of Botucatu Medical School Hospital, Brazil, were included. Cytomegalovirus (CMV) immunohistochemistry was performed in all biopsy specimens. Data on presenting symptoms, medications, immunosuppressive drugs, colonoscopic findings, and follow-up were obtained. Of 580 kidney transplant patients on MPA, 34 underwent colonoscopy. Colonoscopic findings were associated with MPA usage in 16 patients. The most frequent histologic patterns were non-specific colitis (31.3%), inflammatory bowel disease (IBD)-like colitis (25%), normal/near normal (18.8%), graft-versus-host disease-like (18.8%), and ischemia-like colitis (12.5%). All patients had persistent acute diarrhea and weight loss. Six of the 16 MPA-related diarrhea patients (37.5%) showed acute dehydration requiring hospitalization. Diarrhea resolved when MPA was switched to sirolimus (50%), discontinued (18.75%), switched to azathioprine (12.5%), or reduced by 50% (18.75%). No graft loss occurred. Four patients died during the study period. Late-onset MPA was more frequent, and no correlation with MPA dose or formulation was found.
Resumo:
The emergence and maintenance of maternal behavior are under the influence of environmental cues such as light and dark periods. This article discusses the characteristic neurobiology of the behavioral patterns of lactating rats. Specifically, the hormonal basis and neurocircuits that determine whether mother rats show typical sequential patterns of behavioral responses are discussed. During lactation, rats express a sequential pattern of behavioral parameters that may be determined by hormonal variations. Sensorial signals emitted by pups, as well as environmental cues, are suggested to serve as conditioned stimuli for these animals. Finally, the expression of maternal behavior is discussed under neuroeconomic and evolutionary perspectives.
Resumo:
Analisi e applicazione dei processi di data mining al flusso informativo di sistemi real-time. Implementazione e analisi di un algoritmo autoadattivo per la ricerca di frequent patterns su macchine automatiche.
Resumo:
La tesi da me svolta durante questi ultimi sei mesi è stata sviluppata presso i laboratori di ricerca di IMA S.p.a.. IMA (Industria Macchine Automatiche) è una azienda italiana che naque nel 1961 a Bologna ed oggi riveste il ruolo di leader mondiale nella produzione di macchine automatiche per il packaging di medicinali. Vorrei subito mettere in luce che in tale contesto applicativo l’utilizzo di algoritmi di data-mining risulta essere ostico a causa dei due ambienti in cui mi trovo. Il primo è quello delle macchine automatiche che operano con sistemi in tempo reale dato che non presentano a pieno le risorse di cui necessitano tali algoritmi. Il secondo è relativo alla produzione di farmaci in quanto vige una normativa internazionale molto restrittiva che impone il tracciamento di tutti gli eventi trascorsi durante l’impacchettamento ma che non permette la visione al mondo esterno di questi dati sensibili. Emerge immediatamente l’interesse nell’utilizzo di tali informazioni che potrebbero far affiorare degli eventi riconducibili a un problema della macchina o a un qualche tipo di errore al fine di migliorare l’efficacia e l’efficienza dei prodotti IMA. Lo sforzo maggiore per riuscire ad ideare una strategia applicativa è stata nella comprensione ed interpretazione dei messaggi relativi agli aspetti software. Essendo i dati molti, chiusi, e le macchine con scarse risorse per poter applicare a dovere gli algoritmi di data mining ho provveduto ad adottare diversi approcci in diversi contesti applicativi: • Sistema di identificazione automatica di errore al fine di aumentare di diminuire i tempi di correzione di essi. • Modifica di un algoritmo di letteratura per la caratterizzazione della macchina. La trattazione è così strutturata: • Capitolo 1: descrive la macchina automatica IMA Adapta della quale ci sono stati forniti i vari file di log. Essendo lei l’oggetto di analisi per questo lavoro verranno anche riportati quali sono i flussi di informazioni che essa genera. • Capitolo 2: verranno riportati degli screenshoot dei dati in mio possesso al fine di, tramite un’analisi esplorativa, interpretarli e produrre una formulazione di idee/proposte applicabili agli algoritmi di Machine Learning noti in letteratura. • Capitolo 3 (identificazione di errore): in questo capitolo vengono riportati i contesti applicativi da me progettati al fine di implementare una infrastruttura che possa soddisfare il requisito, titolo di questo capitolo. • Capitolo 4 (caratterizzazione della macchina): definirò l’algoritmo utilizzato, FP-Growth, e mostrerò le modifiche effettuate al fine di poterlo impiegare all’interno di macchine automatiche rispettando i limiti stringenti di: tempo di cpu, memoria, operazioni di I/O e soprattutto la non possibilità di aver a disposizione l’intero dataset ma solamente delle sottoporzioni. Inoltre verranno generati dei DataSet per il testing di dell’algoritmo FP-Growth modificato.