904 resultados para Pruning algorithms


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Prism is a modular classification rule generation method based on the ‘separate and conquer’ approach that is alternative to the rule induction approach using decision trees also known as ‘divide and conquer’. Prism often achieves a similar level of classification accuracy compared with decision trees, but tends to produce a more compact noise tolerant set of classification rules. As with other classification rule generation methods, a principle problem arising with Prism is that of overfitting due to over-specialised rules. In addition, over-specialised rules increase the associated computational complexity. These problems can be solved by pruning methods. For the Prism method, two pruning algorithms have been introduced recently for reducing overfitting of classification rules - J-pruning and Jmax-pruning. Both algorithms are based on the J-measure, an information theoretic means for quantifying the theoretical information content of a rule. Jmax-pruning attempts to exploit the J-measure to its full potential because J-pruning does not actually achieve this and may even lead to underfitting. A series of experiments have proved that Jmax-pruning may outperform J-pruning in reducing overfitting. However, Jmax-pruning is computationally relatively expensive and may also lead to underfitting. This paper reviews the Prism method and the two existing pruning algorithms above. It also proposes a novel pruning algorithm called Jmid-pruning. The latter is based on the J-measure and it reduces overfitting to a similar level as the other two algorithms but is better in avoiding underfitting and unnecessary computational effort. The authors conduct an experimental study on the performance of the Jmid-pruning algorithm in terms of classification accuracy and computational efficiency. The algorithm is also evaluated comparatively with the J-pruning and Jmax-pruning algorithms.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Pattern recognition in large amount of data has been paramount in the last decade, since that is not straightforward to design interactive and real time classification systems. Very recently, the Optimum-Path Forest classifier was proposed to overcome such limitations, together with its training set pruning algorithm, which requires a parameter that has been empirically set up to date. In this paper, we propose a Harmony Search-based algorithm that can find near optimal values for that. The experimental results have showed that our algorithm is able to find proper values for the OPF pruning algorithm parameter. © 2011 IEEE.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This article deals with classification problems involving unequal probabilities in each class and discusses metrics to systems that use multilayer perceptrons neural networks (MLP) for the task of classifying new patterns. In addition we propose three new pruning methods that were compared to other seven existing methods in the literature for MLP networks. All pruning algorithms presented in this paper have been modified by the authors to do pruning of neurons, in order to produce fully connected MLP networks but being small in its intermediary layer. Experiments were carried out involving the E. coli unbalanced classification problem and ten pruning methods. The proposed methods had obtained good results, actually, better results than another pruning methods previously defined at the MLP neural network area. (C) 2014 Elsevier Ltd. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Nowadays, fraud detection is important to avoid nontechnical energy losses. Various electric companies around the world have been faced with such losses, mainly from industrial and commercial consumers. This problem has traditionally been dealt with using artificial intelligence techniques, although their use can result in difficulties such as a high computational burden in the training phase and problems with parameter optimization. A recently-developed pattern recognition technique called optimum-path forest (OPF), however, has been shown to be superior to state-of-the-art artificial intelligence techniques. In this paper, we proposed to use OPF for nontechnical losses detection, as well as to apply its learning and pruning algorithms to this purpose. Comparisons against neural networks and other techniques demonstrated the robustness of the OPF with respect to commercial losses automatic identification.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we would like to shed light the problem of efficiency and effectiveness of image classification in large datasets. As the amount of data to be processed and further classified has increased in the last years, there is a need for faster and more precise pattern recognition algorithms in order to perform online and offline training and classification procedures. We deal here with the problem of moist area classification in radar image in a fast manner. Experimental results using Optimum-Path Forest and its training set pruning algorithm also provided and discussed. © 2011 IEEE.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents the application of a variety of techniques to study jet substructure. The performance of various modified jet algorithms, or jet grooming techniques, for several jet types and event topologies is investigated for jets with transverse momentum larger than 300 GeV. Properties of jets subjected to the mass-drop filtering, trimming, and pruning algorithms are found to have reduced sensitivity to multiple proton-proton interactions, are more stable at high luminosity and improve the physics potential of searches for heavy boosted objects. Studies of the expected discrimination power of jet mass and jet substructure observables in searches for new physics are also presented. Event samples enriched in boosted W and Z bosons and top-quark pairs are used to study both the individual jet invariant mass scales and the efficacy of algorithms to tag boosted hadronic objects. The analyses presented use the full 2011 ATLAS dataset, corresponding to an integrated luminosity of 4.7 +/- 0.1 /fb from proton-proton collisions produced by the Large Hadron Collider at a center-of-mass energy of sqrt(s) = 7 TeV.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Computionally efficient sequential learning algorithms are developed for direct-link resource-allocating networks (DRANs). These are achieved by decomposing existing recursive training algorithms on a layer by layer and neuron by neuron basis. This allows network weights to be updated in an efficient parallel manner and facilitates the implementation of minimal update extensions that yield a significant reduction in computation load per iteration compared to existing sequential learning methods employed in resource-allocation network (RAN) and minimal RAN (MRAN) approaches. The new algorithms, which also incorporate a pruning strategy to control network growth, are evaluated on three different system identification benchmark problems and shown to outperform existing methods both in terms of training error convergence and computational efficiency. (c) 2005 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We introduce and describe the Multiple Gravity Assist problem, a global optimisation problem that is of great interest in the design of spacecraft and their trajectories. We discuss its formalization and we show, in one particular problem instance, the performance of selected state of the art heuristic global optimisation algorithms. A deterministic search space pruning algorithm is then developed and its polynomial time and space complexity derived. The algorithm is shown to achieve search space reductions of greater than six orders of magnitude, thus reducing significantly the complexity of the subsequent optimisation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In a world where data is captured on a large scale the major challenge for data mining algorithms is to be able to scale up to large datasets. There are two main approaches to inducing classification rules, one is the divide and conquer approach, also known as the top down induction of decision trees; the other approach is called the separate and conquer approach. A considerable amount of work has been done on scaling up the divide and conquer approach. However, very little work has been conducted on scaling up the separate and conquer approach.In this work we describe a parallel framework that allows the parallelisation of a certain family of separate and conquer algorithms, the Prism family. Parallelisation helps the Prism family of algorithms to harvest additional computer resources in a network of computers in order to make the induction of classification rules scale better on large datasets. Our framework also incorporates a pre-pruning facility for parallel Prism algorithms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Prism family of algorithms induces modular classification rules which, in contrast to decision tree induction algorithms, do not necessarily fit together into a decision tree structure. Classifiers induced by Prism algorithms achieve a comparable accuracy compared with decision trees and in some cases even outperform decision trees. Both kinds of algorithms tend to overfit on large and noisy datasets and this has led to the development of pruning methods. Pruning methods use various metrics to truncate decision trees or to eliminate whole rules or single rule terms from a Prism rule set. For decision trees many pre-pruning and postpruning methods exist, however for Prism algorithms only one pre-pruning method has been developed, J-pruning. Recent work with Prism algorithms examined J-pruning in the context of very large datasets and found that the current method does not use its full potential. This paper revisits the J-pruning method for the Prism family of algorithms and develops a new pruning method Jmax-pruning, discusses it in theoretical terms and evaluates it empirically.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Prism family of algorithms induces modular classification rules in contrast to the Top Down Induction of Decision Trees (TDIDT) approach which induces classification rules in the intermediate form of a tree structure. Both approaches achieve a comparable classification accuracy. However in some cases Prism outperforms TDIDT. For both approaches pre-pruning facilities have been developed in order to prevent the induced classifiers from overfitting on noisy datasets, by cutting rule terms or whole rules or by truncating decision trees according to certain metrics. There have been many pre-pruning mechanisms developed for the TDIDT approach, but for the Prism family the only existing pre-pruning facility is J-pruning. J-pruning not only works on Prism algorithms but also on TDIDT. Although it has been shown that J-pruning produces good results, this work points out that J-pruning does not use its full potential. The original J-pruning facility is examined and the use of a new pre-pruning facility, called Jmax-pruning, is proposed and evaluated empirically. A possible pre-pruning facility for TDIDT based on Jmax-pruning is also discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this article we describe a feature extraction algorithm for pattern classification based on Bayesian Decision Boundaries and Pruning techniques. The proposed method is capable of optimizing MLP neural classifiers by retaining those neurons in the hidden layer that realy contribute to correct classification. Also in this article we proposed a method which defines a plausible number of neurons in the hidden layer based on the stem-and-leaf graphics of training samples. Experimental investigation reveals the efficiency of the proposed method. © 2002 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Includes bibliographical references.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis presents approximation algorithms for some NP-Hard combinatorial optimization problems on graphs and networks; in particular, we study problems related to Network Design. Under the widely-believed complexity-theoretic assumption that P is not equal to NP, there are no efficient (i.e., polynomial-time) algorithms that solve these problems exactly. Hence, if one desires efficient algorithms for such problems, it is necessary to consider approximate solutions: An approximation algorithm for an NP-Hard problem is a polynomial time algorithm which, for any instance of the problem, finds a solution whose value is guaranteed to be within a multiplicative factor of the value of an optimal solution to that instance. We attempt to design algorithms for which this factor, referred to as the approximation ratio of the algorithm, is as small as possible. The field of Network Design comprises a large class of problems that deal with constructing networks of low cost and/or high capacity, routing data through existing networks, and many related issues. In this thesis, we focus chiefly on designing fault-tolerant networks. Two vertices u,v in a network are said to be k-edge-connected if deleting any set of k − 1 edges leaves u and v connected; similarly, they are k-vertex connected if deleting any set of k − 1 other vertices or edges leaves u and v connected. We focus on building networks that are highly connected, meaning that even if a small number of edges and nodes fail, the remaining nodes will still be able to communicate. A brief description of some of our results is given below. We study the problem of building 2-vertex-connected networks that are large and have low cost. Given an n-node graph with costs on its edges and any integer k, we give an O(log n log k) approximation for the problem of finding a minimum-cost 2-vertex-connected subgraph containing at least k nodes. We also give an algorithm of similar approximation ratio for maximizing the number of nodes in a 2-vertex-connected subgraph subject to a budget constraint on the total cost of its edges. Our algorithms are based on a pruning process that, given a 2-vertex-connected graph, finds a 2-vertex-connected subgraph of any desired size and of density comparable to the input graph, where the density of a graph is the ratio of its cost to the number of vertices it contains. This pruning algorithm is simple and efficient, and is likely to find additional applications. Recent breakthroughs on vertex-connectivity have made use of algorithms for element-connectivity problems. We develop an algorithm that, given a graph with some vertices marked as terminals, significantly simplifies the graph while preserving the pairwise element-connectivity of all terminals; in fact, the resulting graph is bipartite. We believe that our simplification/reduction algorithm will be a useful tool in many settings. We illustrate its applicability by giving algorithms to find many trees that each span a given terminal set, while being disjoint on edges and non-terminal vertices; such problems have applications in VLSI design and other areas. We also use this reduction algorithm to analyze simple algorithms for single-sink network design problems with high vertex-connectivity requirements; we give an O(k log n)-approximation for the problem of k-connecting a given set of terminals to a common sink. We study similar problems in which different types of links, of varying capacities and costs, can be used to connect nodes; assuming there are economies of scale, we give algorithms to construct low-cost networks with sufficient capacity or bandwidth to simultaneously support flow from each terminal to the common sink along many vertex-disjoint paths. We further investigate capacitated network design, where edges may have arbitrary costs and capacities. Given a connectivity requirement R_uv for each pair of vertices u,v, the goal is to find a low-cost network which, for each uv, can support a flow of R_uv units of traffic between u and v. We study several special cases of this problem, giving both algorithmic and hardness results. In addition to Network Design, we consider certain Traveling Salesperson-like problems, where the goal is to find short walks that visit many distinct vertices. We give a (2 + epsilon)-approximation for Orienteering in undirected graphs, achieving the best known approximation ratio, and the first approximation algorithm for Orienteering in directed graphs. We also give improved algorithms for Orienteering with time windows, in which vertices must be visited between specified release times and deadlines, and other related problems. These problems are motivated by applications in the fields of vehicle routing, delivery and transportation of goods, and robot path planning.