864 resultados para binary search
Resumo:
This work deals with the problem of minimizing the waste of space that occurs on a rotational placement of a set of irregular two dimensional polygons inside a two dimensional container. This problem is approached with an heuristic based on simulated annealing. Traditional 14 external penalization"" techniques are avoided through the application of the no-fit polygon, that determinates the collision free area for each polygon before its placement. The simulated annealing controls: the rotation applied, the placement and the sequence of placement of the polygons. For each non placed polygon, a limited depth binary search is performed to find a scale factor that when applied to the polygon, would allow it to be fitted in the container. It is proposed a crystallization heuristic, in order to increase the number of accepted solutions. The bottom left and larger first deterministic heuristics were also studied. The proposed process is suited for non convex polygons and containers, the containers can have holes inside. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
Approximate Quickselect, a simple modification of the well known Quickselect algorithm for selection, can be used to efficiently find an element with rank k in a given range [i..j], out of n given elements. We study basic cost measures of Approximate Quickselect by computing exact and asymptotic results for the expected number of passes, comparisons and data moves during the execution of this algorithm. The key element appearing in the analysis of Approximate Quickselect is a trivariate recurrence that we solve in full generality. The general solution of the recurrence proves to be very useful, as it allows us to tackle several related problems, besides the analysis that originally motivated us. In particular, we have been able to carry out a precise analysis of the expected number of moves of the ith element when selecting the jth smallest element with standard Quickselect, where we are able to give both exact and asymptotic results. Moreover, we can apply our general results to obtain exact and asymptotic results for several parameters in binary search trees, namely the expected number of common ancestors of the nodes with rank i and j, the expected size of the subtree rooted at the least common ancestor of the nodes with rank i and j, and the expected distance between the nodes of ranks i and j.
Resumo:
Le présent mémoire comprend un survol des principales méthodes de rendu en demi-tons, de l’analog screening à la recherche binaire directe en passant par l’ordered dither, avec une attention particulière pour la diffusion d’erreur. Ces méthodes seront comparées dans la perspective moderne de la sensibilité à la structure. Une nouvelle méthode de rendu en demi-tons par diffusion d’erreur est présentée et soumise à diverses évaluations. La méthode proposée se veut originale, simple, autant à même de préserver le caractère structurel des images que la méthode à l’état de l’art, et plus rapide que cette dernière par deux à trois ordres de magnitude. D’abord, l’image est décomposée en fréquences locales caractéristiques. Puis, le comportement de base de la méthode proposée est donné. Ensuite, un ensemble minutieusement choisi de paramètres permet de modifier ce comportement de façon à épouser les différents caractères fréquentiels locaux. Finalement, une calibration détermine les bons paramètres à associer à chaque fréquence possible. Une fois l’algorithme assemblé, toute image peut être traitée très rapidement : chaque pixel est attaché à une fréquence propre, cette fréquence sert d’indice pour la table de calibration, les paramètres de diffusion appropriés sont récupérés, et la couleur de sortie déterminée pour le pixel contribue en espérance à souligner la structure dont il fait partie.
Resumo:
Clustering is defined as the grouping of similar items in a set, and is an important process within the field of data mining. As the amount of data for various applications continues to increase, in terms of its size and dimensionality, it is necessary to have efficient clustering methods. A popular clustering algorithm is K-Means, which adopts a greedy approach to produce a set of K-clusters with associated centres of mass, and uses a squared error distortion measure to determine convergence. Methods for improving the efficiency of K-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting a more efficient data structure, notably a multi-dimensional binary search tree (KD-Tree) to store either centroids or data points. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient K-Means techniques in parallel computational environments. In this work, we provide a parallel formulation for the KD-Tree based K-Means algorithm and address its load balancing issues.
Resumo:
One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Techniques for improving the efficiency of k-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing issue. Three solutions have been developed and tested. Two approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy.
Resumo:
K-Means is a popular clustering algorithm which adopts an iterative refinement procedure to determine data partitions and to compute their associated centres of mass, called centroids. The straightforward implementation of the algorithm is often referred to as `brute force' since it computes a proximity measure from each data point to each centroid at every iteration of the K-Means process. Efficient implementations of the K-Means algorithm have been predominantly based on multi-dimensional binary search trees (KD-Trees). A combination of an efficient data structure and geometrical constraints allow to reduce the number of distance computations required at each iteration. In this work we present a general space partitioning approach for improving the efficiency and the scalability of the K-Means algorithm. We propose to adopt approximate hierarchical clustering methods to generate binary space partitioning trees in contrast to KD-Trees. In the experimental analysis, we have tested the performance of the proposed Binary Space Partitioning K-Means (BSP-KM) when a divisive clustering algorithm is used. We have carried out extensive experimental tests to compare the proposed approach to the one based on KD-Trees (KD-KM) in a wide range of the parameters space. BSP-KM is more scalable than KDKM, while keeping the deterministic nature of the `brute force' algorithm. In particular, the proposed space partitioning approach has shown to overcome the well-known limitation of KD-Trees in high-dimensional spaces and can also be adopted to improve the efficiency of other algorithms in which KD-Trees have been used.
Resumo:
Exascale systems are the next frontier in high-performance computing and are expected to deliver a performance of the order of 10^18 operations per second using massive multicore processors. Very large- and extreme-scale parallel systems pose critical algorithmic challenges, especially related to concurrency, locality and the need to avoid global communication patterns. This work investigates a novel protocol for dynamic group communication that can be used to remove the global communication requirement and to reduce the communication cost in parallel formulations of iterative data mining algorithms. The protocol is used to provide a communication-efficient parallel formulation of the k-means algorithm for cluster analysis. The approach is based on a collective communication operation for dynamic groups of processes and exploits non-uniform data distributions. Non-uniform data distributions can be either found in real-world distributed applications or induced by means of multidimensional binary search trees. The analysis of the proposed dynamic group communication protocol has shown that it does not introduce significant communication overhead. The parallel clustering algorithm has also been extended to accommodate an approximation error, which allows a further reduction of the communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing elements.
Resumo:
Global communication requirements and load imbalance of some parallel data mining algorithms are the major obstacles to exploit the computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication cost in iterative parallel data mining algorithms. In particular, the analysis focuses on one of the most influential and popular data mining methods, the k-means algorithm for cluster analysis. The straightforward parallel formulation of the k-means algorithm requires a global reduction operation at each iteration step, which hinders its scalability. This work studies a different parallel formulation of the algorithm where the requirement of global communication can be relaxed while still providing the exact solution of the centralised k-means algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real world distributed applications or can be induced by means of multi-dimensional binary search trees. The approach can also be extended to accommodate an approximation error which allows a further reduction of the communication costs.
Resumo:
An approach for solving reactive power planning problems is presented, which is based on binary search techniques and the use of a special heuristic to obtain a discrete solution. Two versions were developed, one to run on conventional (sequential) computers and the other to run on a distributed memory (hypercube) machine. This latter parallel processing version employs an asynchronous programming model. Once the set of candidate buses has been defined, the program gives the location and size of the reactive sources needed(if any) in keeping with operating and security constraints.
Resumo:
As an alternative to traditional evolutionary algorithms (EAs), population-based incremental learning (PBIL) maintains a probabilistic model of the best individual(s). Originally, PBIL was applied in binary search spaces. Recently, some work has been done to extend it to continuous spaces. In this paper, we review two such extensions of PBIL. An improved version of the PBIL based on Gaussian model is proposed that combines two main features: a new updating rule that takes into account all the individuals and their fitness values and a self-adaptive learning rate parameter. Furthermore, a new continuous PBIL employing a histogram probabilistic model is proposed. Some experiments results are presented that highlight the features of the new algorithms.
Resumo:
Self-dual doubly even linear binary error-correcting codes, often referred to as Type II codes, are codes closely related to many combinatorial structures such as 5-designs. Extremal codes are codes that have the largest possible minimum distance for a given length and dimension. The existence of an extremal (72,36,16) Type II code is still open. Previous results show that the automorphism group of a putative code C with the aforementioned properties has order 5 or dividing 24. In this work, we present a method and the results of an exhaustive search showing that such a code C cannot admit an automorphism group Z6. In addition, we present so far unpublished construction of the extended Golay code by P. Becker. We generalize the notion and provide example of another Type II code that can be obtained in this fashion. Consequently, we relate Becker's construction to the construction of binary Type II codes from codes over GF(2^r) via the Gray map.
Resumo:
Feature selection has been actively pursued in the last years, since to find the most discriminative set of features can enhance the recognition rates and also to make feature extraction faster. In this paper, the propose a new feature selection called Binary Cuckoo Search, which is based on the behavior of cuckoo birds. The experiments were carried out in the context of theft detection in power distribution systems in two datasets obtained from a Brazilian electrical power company, and have demonstrated the robustness of the proposed technique against with several others nature-inspired optimization techniques. © 2013 IEEE.
Resumo:
Feature selection aims to find the most important information from a given set of features. As this task can be seen as an optimization problem, the combinatorial growth of the possible solutions may be inviable for a exhaustive search. In this paper we propose a new nature-inspired feature selection technique based on the Charged System Search (CSS), which has never been applied to this context so far. The wrapper approach combines the power of exploration of CSS together with the speed of the Optimum-Path Forest classifier to find the set of features that maximizes the accuracy in a validating set. Experiments conducted in four public datasets have demonstrated the validity of the proposed approach can outperform some well-known swarm-based techniques. © 2013 Springer-Verlag.
Resumo:
We present the first results of an all-sky search for continuous gravitational waves from unknown spinning neutron stars in binary systems using LIGO and Virgo data. Using a specially developed analysis program, the TwoSpect algorithm, the search was carried out on data from the sixth LIGO science run and the second and third Virgo science runs. The search covers a range of frequencies from 20 Hz to 520 Hz, a range of orbital periods from 2 to similar to 2,254 h and a frequency-and period-dependent range of frequency modulation depths from 0.277 to 100 mHz. This corresponds to a range of projected semimajor axes of the orbit from similar to 0.6 x 10(-3) ls to similar to 6,500 ls assuming the orbit of the binary is circular. While no plausible candidate gravitational wave events survive the pipeline, upper limits are set on the analyzed data. The most sensitive 95% confidence upper limit obtained on gravitational wave strain is 2.3 x 10(-24) at 217 Hz, assuming the source waves are circularly polarized. Although this search has been optimized for circular binary orbits, the upper limits obtained remain valid for orbital eccentricities as large as 0.9. In addition, upper limits are placed on continuous gravitational wave emission from the low-mass x-ray binary Scorpius X-1 between 20 Hz and 57.25 Hz.
Resumo:
The result of the distributed computing projectWieferich@Home is presented: the binary periodic numbers of bit pseudo-length j ≤ 3500 obtained by replication of a bit string of bit pseudo-length k ≤ 24 and increased by one are Wieferich primes only for the cases of 1092 or 3510.