1000 resultados para Sequential patterns


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In many applications, e.g., bioinformatics, web access traces, system utilisation logs, etc., the data is naturally in the form of sequences. People have taken great interest in analysing the sequential data and finding the inherent characteristics or relationships within the data. Sequential association rule mining is one of the possible methods used to analyse this data. As conventional sequential association rule mining very often generates a huge number of association rules, of which many are redundant, it is desirable to find a solution to get rid of those unnecessary association rules. Because of the complexity and temporal ordered characteristics of sequential data, current research on sequential association rule mining is limited. Although several sequential association rule prediction models using either sequence constraints or temporal constraints have been proposed, none of them considered the redundancy problem in rule mining. The main contribution of this research is to propose a non-redundant association rule mining method based on closed frequent sequences and minimal sequential generators. We also give a definition for the non-redundant sequential rules, which are sequential rules with minimal antecedents but maximal consequents. A new algorithm called CSGM (closed sequential and generator mining) for generating closed sequences and minimal sequential generators is also introduced. A further experiment has been done to compare the performance of generating non-redundant sequential rules and full sequential rules, meanwhile, performance evaluation of our CSGM and other closed sequential pattern mining or generator mining algorithms has also been conducted. We also use generated non-redundant sequential rules for query expansion in order to improve recommendations for infrequently purchased products.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the overwhelming increase in the amount of texts on the web, it is almost impossible for people to keep abreast of up-to-date information. Text mining is a process by which interesting information is derived from text through the discovery of patterns and trends. Text mining algorithms are used to guarantee the quality of extracted knowledge. However, the extracted patterns using text or data mining algorithms or methods leads to noisy patterns and inconsistency. Thus, different challenges arise, such as the question of how to understand these patterns, whether the model that has been used is suitable, and if all the patterns that have been extracted are relevant. Furthermore, the research raises the question of how to give a correct weight to the extracted knowledge. To address these issues, this paper presents a text post-processing method, which uses a pattern co-occurrence matrix to find the relation between extracted patterns in order to reduce noisy patterns. The main objective of this paper is not only reducing the number of closed sequential patterns, but also improving the performance of pattern mining as well. The experimental results on Reuters Corpus Volume 1 data collection and TREC filtering topics show that the proposed method is promising.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider the problem of detecting statistically significant sequential patterns in multineuronal spike trains. These patterns are characterized by ordered sequences of spikes from different neurons with specific delays between spikes. We have previously proposed a data-mining scheme to efficiently discover such patterns, which occur often enough in the data. Here we propose a method to determine the statistical significance of such repeating patterns. The novelty of our approach is that we use a compound null hypothesis that not only includes models of independent neurons but also models where neurons have weak dependencies. The strength of interaction among the neurons is represented in terms of certain pair-wise conditional probabilities. We specify our null hypothesis by putting an upper bound on all such conditional probabilities. We construct a probabilistic model that captures the counting process and use this to derive a test of significance for rejecting such a compound null hypothesis. The structure of our null hypothesis also allows us to rank-order different significant patterns. We illustrate the effectiveness of our approach using spike trains generated with a simulator.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the mining and analysis of a single long sequence, one fundamental and important problem is obtaining accurate frequencies of sequential patterns over the sequence. However, we identify that five previous frequency measures suffer from inherent inaccuracies. To obtain more accurate frequencies, we introduce two basic principles called strict anti-monotonicity and maximum-count for frequency measures. Under the two principles, a new frequency measure is presented. An algorithm is also devised to compute it. Both theoretical analysis and empirical evaluation show that more accurate frequencies can be obtained under the new measure

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sequential pattern mining is an important subject in data mining with broad applications in many different areas. However, previous sequential mining algorithms mostly aimed to calculate the number of occurrences (the support) without regard to the degree of importance of different data items. In this paper, we propose to explore the search space of subsequences with normalized weights. We are not only interested in the number of occurrences of the sequences (supports of sequences), but also concerned about importance of sequences (weights). When generating subsequence candidates we use both the support and the weight of the candidates while maintaining the downward closure property of these patterns which allows to accelerate the process of candidate generation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Resuscitation and stabilization are key issues in Intensive Care Burn Units and early survival predictions help to decide the best clinical action during these phases. Current survival scores of burns focus on clinical variables such as age or the body surface area. However, the evolution of other parameters (e.g. diuresis or fluid balance) during the first days is also valuable knowledge. In this work we suggest a methodology and we propose a Temporal Data Mining algorithm to estimate the survival condition from the patient’s evolution. Experiments conducted on 480 patients show the improvement of survival prediction.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The approaches proposed in the past for discovering sequential patterns mainly focused on single sequential data. In the real world, however, some sequential patterns hide their essences among multi-sequential event data. It has been noted that knowledge discovery with either user-specified constraints, or templates, or skeletons is receiving wide attention because it is more efficient and avoids the tedious selection of useful patterns from the mass-produced results. In this paper, a novel pattern in multi-sequential event data that are correlated and its mining approach are presented. We call this pattern sequential causal pattern. A group of skeletons of sequential causal patterns, which may be specified by the user or generated by the program, are verified or mined by embedding them into the mining engine. Experiments show that this method, when applied to discovering the occurring regularities of a crop pest in a region, is successful in mining sequential causal patterns with user-specified skeletons in multi-sequential event data.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

With the overwhelming increase in the amount of data on the web and data bases, many text mining techniques have been proposed for mining useful patterns in text documents. Extracting closed sequential patterns using the Pattern Taxonomy Model (PTM) is one of the pruning methods to remove noisy, inconsistent, and redundant patterns. However, PTM model treats each extracted pattern as whole without considering included terms, which could affect the quality of extracted patterns. This paper propose an innovative and effective method that extends the random set to accurately weigh patterns based on their distribution in the documents and their terms distribution in patterns. Then, the proposed approach will find the specific closed sequential patterns (SCSP) based on the new calculated weight. The experimental results on Reuters Corpus Volume 1 (RCV1) data collection and TREC topics show that the proposed method significantly outperforms other state-of-the-art methods in different popular measures.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We can recognize objects through receiving continuously huge temporal information including redundancy and noise, and can memorize them. This paper proposes a neural network model which extracts pre-recognized patterns from temporally sequential patterns which include redundancy, and memorizes the patterns temporarily. This model consists of an adaptive resonance system and a recurrent time-delay network. The extraction is executed by the matching mechanism of the adaptive resonance system, and the temporal information is processed and stored by the recurrent network. Simple simulations are examined to exemplify the property of extraction.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of the large number of terms, patterns, and noise. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern-based methods should perform better than term- based ones in describing user preferences, but many experiments do not support this hypothesis. This research presents a promising method, Relevance Feature Discovery (RFD), for solving this challenging issue. It discovers both positive and negative patterns in text documents as high-level features in order to accurately weight low-level features (terms) based on their specificity and their distributions in the high-level features. The thesis also introduces an adaptive model (called ARFD) to enhance the exibility of using RFD in adaptive environment. ARFD automatically updates the system's knowledge based on a sliding window over new incoming feedback documents. It can efficiently decide which incoming documents can bring in new knowledge into the system. Substantial experiments using the proposed models on Reuters Corpus Volume 1 and TREC topics show that the proposed models significantly outperform both the state-of-the-art term-based methods underpinned by Okapi BM25, Rocchio or Support Vector Machine and other pattern-based methods.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Nature has used the all-alpha-polypeptide backbone of proteins to create a remarkable diversity of folded structures. Sequential patterns of 20 distinct amino adds, which differ only in their side chains, determine the shape and form of proteins. Our understanding of these specific secondary structures is over half a century old and is based primarily on the fundamental elements: the Pauling alpha-helix and beta-sheet. Researchers can also generate structural diversity through the synthesis of polypeptide chains containing homologated (omega) amino acid residues, which contain a variable number of backbone atoms. However, incorporating amino adds with more atoms within the backbone introduces additional torsional freedom into the structure, which can complicate the structural analysis. Fortunately, gabapentin (Gpn), a readily available bulk drug, is an achiral beta,beta-disubstituted gamma amino add residue that contains a cyclohexyl ring at the C-beta carbon atom, which dramatically limits the range of torsion angles that can be obtained about the flanking C-C bonds. Limiting conformational flexibility also has the desirable effect of increasing peptide crystallinity, which permits unambiguous structural characterization by X-ray diffraction methods. This Account describes studies carried out in our laboratory that establish Gpn as a valuable residue in the design of specifically folded hybrid peptide structures. The insertion of additional atoms into polypeptide backbones facilitates the formation of intramolecular hydrogen bonds whose directionality is opposite to that observed in canonical alpha-peptide helices. If hybrid structures mimic proteins and biologically active peptides, the proteolytic stability conferred by unusual backbones can be a major advantage in the area of medicinal chemistry. We have demonstrated a variety of internally hydrogen-bonded structures in the solid state for Gpn-containing peptides, including the characterization of the C-7 and C-9 hydrogen bonds, which can lead to ribbons in homo-oligomeric sequences. In hybrid alpha gamma sequences, district C-12 hydrogen-bonded turn structures support formation of peptide helices and hairpins in longer sequences. Some peptides that include the Gpn residue have hydrogen-bond directionality that matches alpha-peptide helices, while others have the opposite directionality. We expect that expansion of the polypeptide backbone will lead to new classes of foldamer structures, which are thus far unknown to the world of alpha-polypeptides. The diversity of internally hydrogen-bonded structures observed in hybrid sequences containing Gpn shows promise for the rational design of novel peptide structures incorporating hybrid backbones.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Servlet缓存能够有效地提高Servlet容器的吞吐量,缩短用户请求的响应时间.然而,Servlet缓存的性能受到缓存替换算法的影响.Servlet容器中的Servlet对应着一定的业务功能,挖掘Servlet之间的业务关联来指导缓存替换算法的设计可以提高Servlet缓存的命中率,进而提高Servlet容器的性能.然而,目前常见的LRU(least recently used),LFU(least frequently used),GDSF(greedy dual size frequency)等缓存替换算法均没有考虑上述问题.将Servlet对应的业务关联定义为Servlet容器序列模式,并提出k步可缓存转移概率图的概念加以表示,给出了序列模式发现算法KCTPG_Discovery.最后,基于Servlet容器序列模式设计了缓存替换算法KP-LRU(k-steps prediction least recently used)和KP-GDSF(k-steps prediction least frequently used).实验结果表明,KP-LRU与KP-GDSF算法比对应的LRU算法和GDSF算法具有更高的缓存命中率,有效地提高了Servlet容器的性能.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we propose a model for discovering frequent sequential patterns, phrases, which can be used as profile descriptors of documents. It is indubitable that we can obtain numerous phrases using data mining algorithms. However, it is difficult to use these phrases effectively for answering what users want. Therefore, we present a pattern taxonomy extraction model which performs the task of extracting descriptive frequent sequential patterns by pruning the meaningless ones. The model then is extended and tested by applying it to the information filtering system. The results of the experiment show that pattern-based methods outperform the keyword-based methods. The results also indicate that removal of meaningless patterns not only reduces the cost of computation but also improves the effectiveness of the system.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The emergence and maintenance of maternal behavior are under the influence of environmental cues such as light and dark periods. This article discusses the characteristic neurobiology of the behavioral patterns of lactating rats. Specifically, the hormonal basis and neurocircuits that determine whether mother rats show typical sequential patterns of behavioral responses are discussed. During lactation, rats express a sequential pattern of behavioral parameters that may be determined by hormonal variations. Sensorial signals emitted by pups, as well as environmental cues, are suggested to serve as conditioned stimuli for these animals. Finally, the expression of maternal behavior is discussed under neuroeconomic and evolutionary perspectives.