374 resultados para pruning


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Recently DTW (dynamic time warping) has been recognized as the most robust distance function to measure the similarity between two time series, and this fact has spawned a flurry of research on this topic. Most indexing methods proposed for DTW are based on the R-tree structure. Because of high dimensionality and loose lower bounds for time warping distance, the pruning power of these tree structures are quite weak, resulting in inefficient search. In this paper, we propose a dimensionality reduction method motivated by observations about the inherent character of each time series. A very compact index file is constructed. By scanning the index file, we can get a very small candidate set, so that the number of page access is dramatically reduced. We demonstrate the effectiveness of our approach on real and synthetic datasets.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The tree index structure is a traditional method for searching similar data in large datasets. It is based on the presupposition that most sub-trees are pruned in the searching process. As a result, the number of page accesses is reduced. However, time-series datasets generally have a very high dimensionality. Because of the so-called dimensionality curse, the pruning effectiveness is reduced in high dimensionality. Consequently, the tree index structure is not a suitable method for time-series datasets. In this paper, we propose a two-phase (filtering and refinement) method for searching time-series datasets. In the filtering step, a quantizing time-series is used to construct a compact file which is scanned for filtering out irrelevant. A small set of candidates is translated to the second step for refinement. In this step, we introduce an effective index compression method named grid-based datawise dimensionality reduction (DRR) which attempts to preserve the characteristics of the time-series. An experimental comparison with existing techniques demonstrates the utility of our approach.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Inducing general functions from specific training examples is a central problem in the machine learning. Using sets of If-then rules is the most expressive and readable manner. To find If-then rules, many induction algorithms such as ID3, AQ, CN2 and their variants, were proposed. Sequential covering is the kernel technique of them. To avoid testing all possible selectors, Entropy gain is used to select the best attribute in ID3. Constraint of the size of star was introduced in AQ and beam search was adopted in CN2. These methods speed up their induction algorithms but many good selectors are filtered out. In this work, we introduce a new induction algorithm that is based on enumeration of all possible selectors. Contrary to the previous works, we use pruning power to reduce irrelative selectors. But we can guarantee that no good selectors are filtered out. Comparing with other techniques, the experiment results demonstrate
that the rules produced by our induction algorithm have high consistency and simplicity.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes the design and evaluation of a federated, peer-to-peer indexing system, which can be used to integrate the resources of local systems into a globally addressable index using a distributed hash table. The salient feature of the indexing systems design is the efficient dissemination of term-document indices using a combination of duplicate elimination, leaf set forwarding and conventional techniques such as aggressive index pruning, index compression, and batching. Together these indexing strategies help to reduce the number of RPC operations required to locate the nodes responsible for a section of the index, as well as the bandwidth utilization and the latency of the indexing service. Using empirical observation we evaluate the performance benefits of these cumulative optimizations and show that these design trade-offs can significantly improve indexing performance when using a distributed hash table.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

One of the main problems with Artificial Neural Networks (ANNs) is that their results are not intuitively clear. For example, commonly used hidden neurons with sigmoid activation function can approximate any continuous function, including linear functions, but the coefficients (weights) of this approximation are rather meaningless. To address this problem, current paper presents a novel kind of a neural network that uses transfer functions of various complexities in contrast to mono-transfer functions used in sigmoid and hyperbolic tangent networks. The presence of transfer functions of various complexities in a Mixed Transfer Functions Artificial Neural Network (MTFANN) allow easy conversion of the full model into user-friendly equation format (similar to that of linear regression) without any pruning or simplification of the model. At the same time, MTFANN maintains similar generalization ability to mono-transfer function networks in a global optimization context. The performance and knowledge extraction of MTFANN were evaluated on a realistic simulation of the Puma 560 robot arm and compared to sigmoid, hyperbolic tangent, linear and sinusoidal networks.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Concept learning of text documents can be viewed as the problem of acquiring the definition of a general category of documents. To definite the category of a text document, the Conjunctive of keywords is usually be used. These keywords should be fewer and comprehensible. A naïve method is enumerating all combinations of keywords to extract suitable ones. However, because of the enormous number of keyword combinations, it is impossible to extract the most relevant keywords to describe the categories of documents by enumerating all possible combinations of keywords. Many heuristic methods are proposed, such as GA-base, immune based algorithm. In this work, we introduce pruning power technique and propose a robust enumeration-based concept learning algorithm. Experimental results show that the rules produce by our approach has more comprehensible and simplicity than by other methods.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Data mining refers to extracting or "mining" knowledge from large amounts of data. It is also called a method of "knowledge presentation" where visualization and knowledge representation techniques are used to present the mined knowledge to the user. Efficient algorithms to mine frequent patterns are crucial to many tasks in data mining. Since the Apriori algorithm was proposed in 1994, there have been several methods proposed to improve its performance. However, most still adopt its candidate set generation-and-test approach. In addition, many methods do not generate all frequent patterns, making them inadequate to derive association rules. The Pattern Decomposition (PD) algorithm that can significantly reduce the size of the dataset on each pass makes it more efficient to mine all frequent patterns in a large dataset. This algorithm avoids the costly process of candidate set generation and saves a large amount of counting time to evaluate support with reduced datasets. In this paper, some existing frequent pattern generation algorithms are explored and their comparisons are discussed. The results show that the PD algorithm outperforms an improved version of Apriori named Direct Count of candidates & Prune transactions (DCP) by one order of magnitude and is faster than an improved FP-tree named as Predictive Item Pruning (PIP). Further, PD is also more scalable than both DCP and PIP.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, the impact of the size of the training set on the benefit from ensemble, i.e. the gains obtained by employing ensemble learning paradigms, is empirically studied. Experiments on Bagged/ Boosted J4.8 decision trees with/without pruning show that enlarging the training set tends to improve the benefit from Boosting but does not significantly impact the benefit from Bagging. This phenomenon is then explained from the view of bias-variance reduction. Moreover, it is shown that even for Boosting, the benefit does not always increase consistently along with the increase of the training set size since single learners sometimes may learn relatively more from additional training data that are randomly provided than ensembles do. Furthermore, it is observed that the benefit from ensemble of unpruned decision trees is usually bigger than that from ensemble of pruned decision trees. This phenomenon is then explained from the view of error-ambiguity balance.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we propose a model for discovering frequent sequential patterns, phrases, which can be used as profile descriptors of documents. It is indubitable that we can obtain numerous phrases using data mining algorithms. However, it is difficult to use these phrases effectively for answering what users want. Therefore, we present a pattern taxonomy extraction model which performs the task of extracting descriptive frequent sequential patterns by pruning the meaningless ones. The model then is extended and tested by applying it to the information filtering system. The results of the experiment show that pattern-based methods outperform the keyword-based methods. The results also indicate that removal of meaningless patterns not only reduces the cost of computation but also improves the effectiveness of the system.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes the design and evaluation of a peer-to-peer indexing system to integrate the resources of local document database systems into a globally addressable index using a distributed hash table. The salient feature of the indexing systems design is the efficient dissemination of term-document indices using a combination of duplicate elimination, ring based forwarding and conventional techniques such as aggressive index pruning, and batching. Together these indexing strategies help to reduce, the number of RPC operations required to locate the nodes responsible for a section of the index, the bandwidth utilization and the latency of the indexing service.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Naringinases has attracted a great deal of attention in recent years due to its hydrolytic activities which include the production of rhamnose, and prunin and debittering of citrus fruit juices. While this enzyme is widely distributed in fungi, its production from bacterial sources is less commonly known. Fungal naringinase are very important as they are used industrially in large amounts and have been extensively studied during the past decade. In this article, production of bacterial naringinase and potential biotechnological applications are discussed. Bacterial rhamnosidases are exotype enzymes that hydrolyse terminal non-reducing α-l-rhamnosyl groups from α-l-rhamnose containing polysaccharides and glycosides. Structurally, they are classified into family 78 of glycoside hydrolases and characterized by the presence of Asp567 and Glu841 in their active site. Optimization of fermentation conditions and enzyme engineering will allow the development of improved rhamnosidases for advancing suggested industrial applications.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Many previous approaches to frequent episode discovery only accept simple sequences. Although a recent approach has been able to nd frequent episodes from complex sequences, the discovered sets are neither condensed nor accurate. This paper investigates the discovery of condensed sets of frequent episodes from complex sequences. We adopt a novel anti-monotonic frequency measure based on non-redundant occurrences, and dene a condensed set, nDaCF (the set of non-derivable approximately closed frequent episodes) within a given maximal error bound of support. We then introduce a series of effective pruning strategies, and develop a method, nDaCF-Miner, for discovering nDaCF sets. Experimental results show that, when the error bound is somewhat high, the discovered nDaCF sets are two orders of magnitude smaller than complete sets, and nDaCF-miner is more efficient than previous mining approaches. In addition, the nDaCF sets are more accurate than the sets found by previous approaches.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes the application of an adaptive neural network, called Fuzzy ARTMAP (FAM), to handle fault prediction and condition monitoring problems in a power generation station. The FAM network, which is supplemented with a pruning algorithm, is used as a classifier to predict different machine conditions, in an off-line learning mode. The process under scrutiny in the power plant is the Circulating Water (CW) system, with prime attention to monitoring the heat transfer efficiency of the condensers. Several phases of experiments were conducted to investigate the `optimum' setting of a set of parameters of the FAM classifier for monitoring heat transfer conditions in the power plant.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

ust-Noticeable-Differences (JND) as a dead-band in perceptual analysis has been widely used for more than a decade. This technique has been employed for data reduction in hap tic data transmission systems by several researchers. In fact, researchers use two different JND coefficients that are JNDV and JNDF for velocity and force data respectively. For position data, they usually rely on the resolution of hap tic display device to omit data that are unperceivable to human. In this paper, pruning undesirable position data that are produced by the vibration of the device or subject and/or noise in transmission line is addressed. It is shown that using inverse JNDV for position data can prune undesirable position data. Comparison of the results of the proposed method in this paper with several well known filters and some available methods proposed by other researchers is performed. It is shown that combination of JNDV could provide lower error with desirable curve smoothness, and as little as possible computation effort and complexity. It also has been shown that this method reduces much more data rather than using forward-JNDV.