8 resultados para Pruning algorithms

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)


Relevância:

20.00% 20.00%

Publicador:

Resumo:

There is an increasing interest in the application of Evolutionary Algorithms (EAs) to induce classification rules. This hybrid approach can benefit areas where classical methods for rule induction have not been very successful. One example is the induction of classification rules in imbalanced domains. Imbalanced data occur when one or more classes heavily outnumber other classes. Frequently, classical machine learning (ML) classifiers are not able to learn in the presence of imbalanced data sets, inducing classification models that always predict the most numerous classes. In this work, we propose a novel hybrid approach to deal with this problem. We create several balanced data sets with all minority class cases and a random sample of majority class cases. These balanced data sets are fed to classical ML systems that produce rule sets. The rule sets are combined creating a pool of rules and an EA is used to build a classifier from this pool of rules. This hybrid approach has some advantages over undersampling, since it reduces the amount of discarded information, and some advantages over oversampling, since it avoids overfitting. The proposed approach was experimentally analysed and the experimental results show an improvement in the classification performance measured as the area under the receiver operating characteristics (ROC) curve.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

J.A. Ferreira Neto, E.C. Santos Junior, U. Fra Paleo, D. Miranda Barros, and M.C.O. Moreira. 2011. Optimal subdivision of land in agrarian reform projects: an analysis using genetic algorithms. Cien. Inv. Agr. 38(2): 169-178. The objective of this manuscript is to develop a new procedure to achieve optimal land subdivision using genetic algorithms (GA). The genetic algorithm was tested in the rural settlement of Veredas, located in Minas Gerais, Brazil. This implementation was based on the land aptitude and its productivity index. The sequence of tests in the study was carried out in two areas with eight different agricultural aptitude classes, including one area of 391.88 ha subdivided into 12 lots and another of 404.1763 ha subdivided into 14 lots. The effectiveness of the method was measured using the shunting line standard value of a parceled area lot`s productivity index. To evaluate each parameter, a sequence of 15 calculations was performed to record the best individual fitness average (MMI) found for each parameter variation. The best parameter combination found in testing and used to generate the new parceling with the GA was the following: 320 as the generation number, a population of 40 individuals, 0.8 mutation tax, and a 0.3 renewal tax. The solution generated rather homogeneous lots in terms of productive capacity.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We describe the canonical and microcanonical Monte Carlo algorithms for different systems that can be described by spin models. Sites of the lattice, chosen at random, interchange their spin values, provided they are different. The canonical ensemble is generated by performing exchanges according to the Metropolis prescription whereas in the microcanonical ensemble, exchanges are performed as long as the total energy remains constant. A systematic finite size analysis of intensive quantities and a comparison with results obtained from distinct ensembles are performed and the quality of results reveal that the present approach may be an useful tool for the study of phase transitions, specially first-order transitions. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we present a novel approach for multispectral image contextual classification by combining iterative combinatorial optimization algorithms. The pixel-wise decision rule is defined using a Bayesian approach to combine two MRF models: a Gaussian Markov Random Field (GMRF) for the observations (likelihood) and a Potts model for the a priori knowledge, to regularize the solution in the presence of noisy data. Hence, the classification problem is stated according to a Maximum a Posteriori (MAP) framework. In order to approximate the MAP solution we apply several combinatorial optimization methods using multiple simultaneous initializations, making the solution less sensitive to the initial conditions and reducing both computational cost and time in comparison to Simulated Annealing, often unfeasible in many real image processing applications. Markov Random Field model parameters are estimated by Maximum Pseudo-Likelihood (MPL) approach, avoiding manual adjustments in the choice of the regularization parameters. Asymptotic evaluations assess the accuracy of the proposed parameter estimation procedure. To test and evaluate the proposed classification method, we adopt metrics for quantitative performance assessment (Cohen`s Kappa coefficient), allowing a robust and accurate statistical analysis. The obtained results clearly show that combining sub-optimal contextual algorithms significantly improves the classification performance, indicating the effectiveness of the proposed methodology. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present parallel algorithms on the BSP/CGM model, with p processors, to count and generate all the maximal cliques of a circle graph with n vertices and m edges. To count the number of all the maximal cliques, without actually generating them, our algorithm requires O(log p) communication rounds with O(nm/p) local computation time. We also present an algorithm to generate the first maximal clique in O(log p) communication rounds with O(nm/p) local computation, and to generate each one of the subsequent maximal cliques this algorithm requires O(log p) communication rounds with O(m/p) local computation. The maximal cliques generation algorithm is based on generating all maximal paths in a directed acyclic graph, and we present an algorithm for this problem that uses O(log p) communication rounds with O(m/p) local computation for each maximal path. We also show that the presented algorithms can be extended to the CREW PRAM model.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

For a fixed family F of graphs, an F-packing in a graph G is a set of pairwise vertex-disjoint subgraphs of G, each isomorphic to an element of F. Finding an F-packing that maximizes the number of covered edges is a natural generalization of the maximum matching problem, which is just F = {K(2)}. In this paper we provide new approximation algorithms and hardness results for the K(r)-packing problem where K(r) = {K(2), K(3,) . . . , K(r)}. We show that already for r = 3 the K(r)-packing problem is APX-complete, and, in fact, we show that it remains so even for graphs with maximum degree 4. On the positive side, we give an approximation algorithm with approximation ratio at most 2 for every fixed r. For r = 3, 4, 5 we obtain better approximations. For r = 3 we obtain a simple 3/2-approximation, achieving a known ratio that follows from a more involved algorithm of Halldorsson. For r = 4, we obtain a (3/2 + epsilon)-approximation, and for r = 5 we obtain a (25/14 + epsilon)-approximation. (C) 2008 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A bipartite graph G = (V, W, E) is convex if there exists an ordering of the vertices of W such that, for each v. V, the neighbors of v are consecutive in W. We describe both a sequential and a BSP/CGM algorithm to find a maximum independent set in a convex bipartite graph. The sequential algorithm improves over the running time of the previously known algorithm and the BSP/CGM algorithm is a parallel version of the sequential one. The complexity of the algorithms does not depend on |W|.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We investigate several two-dimensional guillotine cutting stock problems and their variants in which orthogonal rotations are allowed. We first present two dynamic programming based algorithms for the Rectangular Knapsack (RK) problem and its variants in which the patterns must be staged. The first algorithm solves the recurrence formula proposed by Beasley; the second algorithm - for staged patterns - also uses a recurrence formula. We show that if the items are not so small compared to the dimensions of the bin, then these algorithms require polynomial time. Using these algorithms we solved all instances of the RK problem found at the OR-LIBRARY, including one for which no optimal solution was known. We also consider the Two-dimensional Cutting Stock problem. We present a column generation based algorithm for this problem that uses the first algorithm above mentioned to generate the columns. We propose two strategies to tackle the residual instances. We also investigate a variant of this problem where the bins have different sizes. At last, we study the Two-dimensional Strip Packing problem. We also present a column generation based algorithm for this problem that uses the second algorithm above mentioned where staged patterns are imposed. In this case we solve instances for two-, three- and four-staged patterns. We report on some computational experiments with the various algorithms we propose in this paper. The results indicate that these algorithms seem to be suitable for solving real-world instances. We give a detailed description (a pseudo-code) of all the algorithms presented here, so that the reader may easily implement these algorithms. (c) 2007 Elsevier B.V. All rights reserved.