976 resultados para Frequent Sequential Patterns
Resumo:
Sequential pattern mining is an important subject in data mining with broad applications in many different areas. However, previous sequential mining algorithms mostly aimed to calculate the number of occurrences (the support) without regard to the degree of importance of different data items. In this paper, we propose to explore the search space of subsequences with normalized weights. We are not only interested in the number of occurrences of the sequences (supports of sequences), but also concerned about importance of sequences (weights). When generating subsequence candidates we use both the support and the weight of the candidates while maintaining the downward closure property of these patterns which allows to accelerate the process of candidate generation.
Resumo:
In many applications, e.g., bioinformatics, web access traces, system utilisation logs, etc., the data is naturally in the form of sequences. People have taken great interest in analysing the sequential data and finding the inherent characteristics or relationships within the data. Sequential association rule mining is one of the possible methods used to analyse this data. As conventional sequential association rule mining very often generates a huge number of association rules, of which many are redundant, it is desirable to find a solution to get rid of those unnecessary association rules. Because of the complexity and temporal ordered characteristics of sequential data, current research on sequential association rule mining is limited. Although several sequential association rule prediction models using either sequence constraints or temporal constraints have been proposed, none of them considered the redundancy problem in rule mining. The main contribution of this research is to propose a non-redundant association rule mining method based on closed frequent sequences and minimal sequential generators. We also give a definition for the non-redundant sequential rules, which are sequential rules with minimal antecedents but maximal consequents. A new algorithm called CSGM (closed sequential and generator mining) for generating closed sequences and minimal sequential generators is also introduced. A further experiment has been done to compare the performance of generating non-redundant sequential rules and full sequential rules, meanwhile, performance evaluation of our CSGM and other closed sequential pattern mining or generator mining algorithms has also been conducted. We also use generated non-redundant sequential rules for query expansion in order to improve recommendations for infrequently purchased products.
Resumo:
With the overwhelming increase in the amount of texts on the web, it is almost impossible for people to keep abreast of up-to-date information. Text mining is a process by which interesting information is derived from text through the discovery of patterns and trends. Text mining algorithms are used to guarantee the quality of extracted knowledge. However, the extracted patterns using text or data mining algorithms or methods leads to noisy patterns and inconsistency. Thus, different challenges arise, such as the question of how to understand these patterns, whether the model that has been used is suitable, and if all the patterns that have been extracted are relevant. Furthermore, the research raises the question of how to give a correct weight to the extracted knowledge. To address these issues, this paper presents a text post-processing method, which uses a pattern co-occurrence matrix to find the relation between extracted patterns in order to reduce noisy patterns. The main objective of this paper is not only reducing the number of closed sequential patterns, but also improving the performance of pattern mining as well. The experimental results on Reuters Corpus Volume 1 data collection and TREC filtering topics show that the proposed method is promising.
Resumo:
We consider the problem of detecting statistically significant sequential patterns in multineuronal spike trains. These patterns are characterized by ordered sequences of spikes from different neurons with specific delays between spikes. We have previously proposed a data-mining scheme to efficiently discover such patterns, which occur often enough in the data. Here we propose a method to determine the statistical significance of such repeating patterns. The novelty of our approach is that we use a compound null hypothesis that not only includes models of independent neurons but also models where neurons have weak dependencies. The strength of interaction among the neurons is represented in terms of certain pair-wise conditional probabilities. We specify our null hypothesis by putting an upper bound on all such conditional probabilities. We construct a probabilistic model that captures the counting process and use this to derive a test of significance for rejecting such a compound null hypothesis. The structure of our null hypothesis also allows us to rank-order different significant patterns. We illustrate the effectiveness of our approach using spike trains generated with a simulator.
Resumo:
Resuscitation and stabilization are key issues in Intensive Care Burn Units and early survival predictions help to decide the best clinical action during these phases. Current survival scores of burns focus on clinical variables such as age or the body surface area. However, the evolution of other parameters (e.g. diuresis or fluid balance) during the first days is also valuable knowledge. In this work we suggest a methodology and we propose a Temporal Data Mining algorithm to estimate the survival condition from the patient’s evolution. Experiments conducted on 480 patients show the improvement of survival prediction.
Resumo:
With the overwhelming increase in the amount of data on the web and data bases, many text mining techniques have been proposed for mining useful patterns in text documents. Extracting closed sequential patterns using the Pattern Taxonomy Model (PTM) is one of the pruning methods to remove noisy, inconsistent, and redundant patterns. However, PTM model treats each extracted pattern as whole without considering included terms, which could affect the quality of extracted patterns. This paper propose an innovative and effective method that extends the random set to accurately weigh patterns based on their distribution in the documents and their terms distribution in patterns. Then, the proposed approach will find the specific closed sequential patterns (SCSP) based on the new calculated weight. The experimental results on Reuters Corpus Volume 1 (RCV1) data collection and TREC topics show that the proposed method significantly outperforms other state-of-the-art methods in different popular measures.
Resumo:
We can recognize objects through receiving continuously huge temporal information including redundancy and noise, and can memorize them. This paper proposes a neural network model which extracts pre-recognized patterns from temporally sequential patterns which include redundancy, and memorizes the patterns temporarily. This model consists of an adaptive resonance system and a recurrent time-delay network. The extraction is executed by the matching mechanism of the adaptive resonance system, and the temporal information is processed and stored by the recurrent network. Simple simulations are examined to exemplify the property of extraction.
Resumo:
A major task of traditional temporal event sequence mining is to find all frequent event patterns from a long temporal sequence. In many real applications, however, events are often grouped into different types, and not all types are of equal importance. In this paper, we consider the problem of efficient mining of temporal event sequences which lead to an instance of a specific type of event. Temporal constraints are used to ensure sensibility of the mining results. We will first generalise and formalise the problem of event-oriented temporal sequence data mining. After discussing some unique issues in this new problem, we give a set of criteria, which are adapted from traditional data mining techniques, to measure the quality of patterns to be discovered. Finally we present an algorithm to discover potentially interesting patterns.
Resumo:
It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of the large number of terms, patterns, and noise. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern-based methods should perform better than term- based ones in describing user preferences, but many experiments do not support this hypothesis. This research presents a promising method, Relevance Feature Discovery (RFD), for solving this challenging issue. It discovers both positive and negative patterns in text documents as high-level features in order to accurately weight low-level features (terms) based on their specificity and their distributions in the high-level features. The thesis also introduces an adaptive model (called ARFD) to enhance the exibility of using RFD in adaptive environment. ARFD automatically updates the system's knowledge based on a sliding window over new incoming feedback documents. It can efficiently decide which incoming documents can bring in new knowledge into the system. Substantial experiments using the proposed models on Reuters Corpus Volume 1 and TREC topics show that the proposed models significantly outperform both the state-of-the-art term-based methods underpinned by Okapi BM25, Rocchio or Support Vector Machine and other pattern-based methods.
Resumo:
Nature has used the all-alpha-polypeptide backbone of proteins to create a remarkable diversity of folded structures. Sequential patterns of 20 distinct amino adds, which differ only in their side chains, determine the shape and form of proteins. Our understanding of these specific secondary structures is over half a century old and is based primarily on the fundamental elements: the Pauling alpha-helix and beta-sheet. Researchers can also generate structural diversity through the synthesis of polypeptide chains containing homologated (omega) amino acid residues, which contain a variable number of backbone atoms. However, incorporating amino adds with more atoms within the backbone introduces additional torsional freedom into the structure, which can complicate the structural analysis. Fortunately, gabapentin (Gpn), a readily available bulk drug, is an achiral beta,beta-disubstituted gamma amino add residue that contains a cyclohexyl ring at the C-beta carbon atom, which dramatically limits the range of torsion angles that can be obtained about the flanking C-C bonds. Limiting conformational flexibility also has the desirable effect of increasing peptide crystallinity, which permits unambiguous structural characterization by X-ray diffraction methods. This Account describes studies carried out in our laboratory that establish Gpn as a valuable residue in the design of specifically folded hybrid peptide structures. The insertion of additional atoms into polypeptide backbones facilitates the formation of intramolecular hydrogen bonds whose directionality is opposite to that observed in canonical alpha-peptide helices. If hybrid structures mimic proteins and biologically active peptides, the proteolytic stability conferred by unusual backbones can be a major advantage in the area of medicinal chemistry. We have demonstrated a variety of internally hydrogen-bonded structures in the solid state for Gpn-containing peptides, including the characterization of the C-7 and C-9 hydrogen bonds, which can lead to ribbons in homo-oligomeric sequences. In hybrid alpha gamma sequences, district C-12 hydrogen-bonded turn structures support formation of peptide helices and hairpins in longer sequences. Some peptides that include the Gpn residue have hydrogen-bond directionality that matches alpha-peptide helices, while others have the opposite directionality. We expect that expansion of the polypeptide backbone will lead to new classes of foldamer structures, which are thus far unknown to the world of alpha-polypeptides. The diversity of internally hydrogen-bonded structures observed in hybrid sequences containing Gpn shows promise for the rational design of novel peptide structures incorporating hybrid backbones.
Resumo:
Servlet缓存能够有效地提高Servlet容器的吞吐量,缩短用户请求的响应时间.然而,Servlet缓存的性能受到缓存替换算法的影响.Servlet容器中的Servlet对应着一定的业务功能,挖掘Servlet之间的业务关联来指导缓存替换算法的设计可以提高Servlet缓存的命中率,进而提高Servlet容器的性能.然而,目前常见的LRU(least recently used),LFU(least frequently used),GDSF(greedy dual size frequency)等缓存替换算法均没有考虑上述问题.将Servlet对应的业务关联定义为Servlet容器序列模式,并提出k步可缓存转移概率图的概念加以表示,给出了序列模式发现算法KCTPG_Discovery.最后,基于Servlet容器序列模式设计了缓存替换算法KP-LRU(k-steps prediction least recently used)和KP-GDSF(k-steps prediction least frequently used).实验结果表明,KP-LRU与KP-GDSF算法比对应的LRU算法和GDSF算法具有更高的缓存命中率,有效地提高了Servlet容器的性能.
Resumo:
Traditional dictionary learning algorithms are used for finding a sparse representation on high dimensional data by transforming samples into a one-dimensional (1D) vector. This 1D model loses the inherent spatial structure property of data. An alternative solution is to employ Tensor Decomposition for dictionary learning on their original structural form —a tensor— by learning multiple dictionaries along each mode and the corresponding sparse representation in respect to the Kronecker product of these dictionaries. To learn tensor dictionaries along each mode, all the existing methods update each dictionary iteratively in an alternating manner. Because atoms from each mode dictionary jointly make contributions to the sparsity of tensor, existing works ignore atoms correlations between different mode dictionaries by treating each mode dictionary independently. In this paper, we propose a joint multiple dictionary learning method for tensor sparse coding, which explores atom correlations for sparse representation and updates multiple atoms from each mode dictionary simultaneously. In this algorithm, the Frequent-Pattern Tree (FP-tree) mining algorithm is employed to exploit frequent atom patterns in the sparse representation. Inspired by the idea of K-SVD, we develop a new dictionary update method that jointly updates elements in each pattern. Experimental results demonstrate our method outperforms other tensor based dictionary learning algorithms.
Resumo:
Reports on the clinical course of mycophenolic acid (MPA)-related colitis in kidney transplant recipients are scarce. This study aimed at assessing MPA-related colitis incidence, risk factors, and progression after kidney transplantation. All kidney transplant patients taking MPA who had colonic biopsies for persistent chronic diarrhea, between 2000 and 2012, at the Kidney Transplantation Unit of Botucatu Medical School Hospital, Brazil, were included. Cytomegalovirus (CMV) immunohistochemistry was performed in all biopsy specimens. Data on presenting symptoms, medications, immunosuppressive drugs, colonoscopic findings, and follow-up were obtained. Of 580 kidney transplant patients on MPA, 34 underwent colonoscopy. Colonoscopic findings were associated with MPA usage in 16 patients. The most frequent histologic patterns were non-specific colitis (31.3%), inflammatory bowel disease (IBD)-like colitis (25%), normal/near normal (18.8%), graft-versus-host disease-like (18.8%), and ischemia-like colitis (12.5%). All patients had persistent acute diarrhea and weight loss. Six of the 16 MPA-related diarrhea patients (37.5%) showed acute dehydration requiring hospitalization. Diarrhea resolved when MPA was switched to sirolimus (50%), discontinued (18.75%), switched to azathioprine (12.5%), or reduced by 50% (18.75%). No graft loss occurred. Four patients died during the study period. Late-onset MPA was more frequent, and no correlation with MPA dose or formulation was found.
Resumo:
The emergence and maintenance of maternal behavior are under the influence of environmental cues such as light and dark periods. This article discusses the characteristic neurobiology of the behavioral patterns of lactating rats. Specifically, the hormonal basis and neurocircuits that determine whether mother rats show typical sequential patterns of behavioral responses are discussed. During lactation, rats express a sequential pattern of behavioral parameters that may be determined by hormonal variations. Sensorial signals emitted by pups, as well as environmental cues, are suggested to serve as conditioned stimuli for these animals. Finally, the expression of maternal behavior is discussed under neuroeconomic and evolutionary perspectives.