873 resultados para tag data structure
Resumo:
K-Means is a popular clustering algorithm which adopts an iterative refinement procedure to determine data partitions and to compute their associated centres of mass, called centroids. The straightforward implementation of the algorithm is often referred to as `brute force' since it computes a proximity measure from each data point to each centroid at every iteration of the K-Means process. Efficient implementations of the K-Means algorithm have been predominantly based on multi-dimensional binary search trees (KD-Trees). A combination of an efficient data structure and geometrical constraints allow to reduce the number of distance computations required at each iteration. In this work we present a general space partitioning approach for improving the efficiency and the scalability of the K-Means algorithm. We propose to adopt approximate hierarchical clustering methods to generate binary space partitioning trees in contrast to KD-Trees. In the experimental analysis, we have tested the performance of the proposed Binary Space Partitioning K-Means (BSP-KM) when a divisive clustering algorithm is used. We have carried out extensive experimental tests to compare the proposed approach to the one based on KD-Trees (KD-KM) in a wide range of the parameters space. BSP-KM is more scalable than KDKM, while keeping the deterministic nature of the `brute force' algorithm. In particular, the proposed space partitioning approach has shown to overcome the well-known limitation of KD-Trees in high-dimensional spaces and can also be adopted to improve the efficiency of other algorithms in which KD-Trees have been used.
Resumo:
We have optimised the atmospheric radiation algorithm of the FAMOUS climate model on several hardware platforms. The optimisation involved translating the Fortran code to C and restructuring the algorithm around the computation of a single air column. Instead of the existing MPI-based domain decomposition, we used a task queue and a thread pool to schedule the computation of individual columns on the available processors. Finally, four air columns are packed together in a single data structure and computed simultaneously using Single Instruction Multiple Data operations. The modified algorithm runs more than 50 times faster on the CELL’s Synergistic Processing Elements than on its main PowerPC processing element. On Intel-compatible processors, the new radiation code runs 4 times faster. On the tested graphics processor, using OpenCL, we find a speed-up of more than 2.5 times as compared to the original code on the main CPU. Because the radiation code takes more than 60% of the total CPU time, FAMOUS executes more than twice as fast. Our version of the algorithm returns bit-wise identical results, which demonstrates the robustness of our approach. We estimate that this project required around two and a half man-years of work.
Resumo:
The Bloom filter is a space efficient randomized data structure for representing a set and supporting membership queries. Bloom filters intrinsically allow false positives. However, the space savings they offer outweigh the disadvantage if the false positive rates are kept sufficiently low. Inspired by the recent application of the Bloom filter in a novel multicast forwarding fabric, this paper proposes a variant of the Bloom filter, the optihash. The optihash introduces an optimization for the false positive rate at the stage of Bloom filter formation using the same amount of space at the cost of slightly more processing than the classic Bloom filter. Often Bloom filters are used in situations where a fixed amount of space is a primary constraint. We present the optihash as a good alternative to Bloom filters since the amount of space is the same and the improvements in false positives can justify the additional processing. Specifically, we show via simulations and numerical analysis that using the optihash the false positives occurrences can be reduced and controlled at a cost of small additional processing. The simulations are carried out for in-packet forwarding. In this framework, the Bloom filter is used as a compact link/route identifier and it is placed in the packet header to encode the route. At each node, the Bloom filter is queried for membership in order to make forwarding decisions. A false positive in the forwarding decision is translated into packets forwarded along an unintended outgoing link. By using the optihash, false positives can be reduced. The optimization processing is carried out in an entity termed the Topology Manger which is part of the control plane of the multicast forwarding fabric. This processing is only carried out on a per-session basis, not for every packet. The aim of this paper is to present the optihash and evaluate its false positive performances via simulations in order to measure the influence of different parameters on the false positive rate. The false positive rate for the optihash is then compared with the false positive probability of the classic Bloom filter.
Resumo:
Moving-least-squares (MLS) surfaces undergoing large deformations need periodic regeneration of the point set (point-set resampling) so as to keep the point-set density quasi-uniform. Previous work by the authors dealt with algebraic MLS surfaces, and proposed a resampling strategy based on defining the new points at the intersections of the MLS surface with a suitable set of rays. That strategy has very low memory requirements and is easy to parallelize. In this article new resampling strategies with reduced CPU-time cost are explored. The basic idea is to choose as set of rays the lines of a regular, Cartesian grid, and to fully exploit this grid: as data structure for search queries, as spatial structure for traversing the surface in a continuation-like algorithm, and also as approximation grid for an interpolated version of the MLS surface. It is shown that in this way a very simple and compact resampling technique is obtained, which cuts the resampling cost by half with affordable memory requirements.
Resumo:
Given two strings A and B of lengths n(a) and n(b), n(a) <= n(b), respectively, the all-substrings longest common subsequence (ALCS) problem obtains, for every substring B` of B, the length of the longest string that is a subsequence of both A and B. The ALCS problem has many applications, such as finding approximate tandem repeats in strings, solving the circular alignment of two strings and finding the alignment of one string with several others that have a common substring. We present an algorithm to prepare the basic data structure for ALCS queries that takes O(n(a)n(b)) time and O(n(a) + n(b)) space. After this preparation, it is possible to build that allows any LCS length to be retrieved in constant time. Some trade-offs between the space required and a matrix of size O(n(b)(2)) the querying time are discussed. To our knowledge, this is the first algorithm in the literature for the ALCS problem. (C) 2007 Elsevier B.V. All rights reserved.
Resumo:
We revisit the problem of visibility, which is to determine a set of primitives potentially visible in a set of geometry data represented by a data structure, such as a mesh of polygons or triangles, we propose a solution for speeding up the three-dimensional visualization processing in applications. We introduce a lean structure , in the sense of data abstraction and reduction, which can be used for online and interactive applications. The visibility problem is especially important in 3D visualization of scenes represented by large volumes of data, when it is not worthwhile keeping all polygons of the scene in memory. This implies a greater time spent in the rendering, or is even impossible to keep them all in huge volumes of data. In these cases, given a position and a direction of view, the main objective is to determine and load a minimum ammount of primitives (polygons) in the scene, to accelerate the rendering step. For this purpose, our algorithm performs cutting primitives (culling) using a hybrid paradigm based on three known techniques. The scene is divided into a cell grid, for each cell we associate the primitives that belong to them, and finally determined the set of primitives potentially visible. The novelty is the use of triangulation Ja 1 to create the subdivision grid. We chose this structure because of its relevant characteristics of adaptivity and algebrism (ease of calculations). The results show a substantial improvement over traditional methods when applied separately. The method introduced in this work can be used in devices with low or no dedicated processing power CPU, and also can be used to view data via the Internet, such as virtual museums applications
Resumo:
Leafcutters are the highest evolved within Neotropical ants in the tribe Attini and model systems for studying caste formation, labor division and symbiosis with microorganisms. Some species of leafcutters are agricultural pests controlled by chemicals which affect other animals and accumulate in the environment. Aiming to provide genetic basis for the study of leafcutters and for the development of more specific and environmentally friendly methods for the control of pest leafcutters, we generated expressed sequence tag data from Atta laevigata, one of the pest ants with broad geographic distribution in South America. Results: The analysis of the expressed sequence tags allowed us to characterize 2,006 unique sequences in Atta laevigata. Sixteen of these genes had a high number of transcripts and are likely positively selected for high level of gene expression, being responsible for three basic biological functions: energy conservation through redox reactions in mitochondria; cytoskeleton and muscle structuring; regulation of gene expression and metabolism. Based on leafcutters lifestyle and reports of genes involved in key processes of other social insects, we identified 146 sequences potential targets for controlling pest leafcutters. The targets are responsible for antixenobiosis, development and longevity, immunity, resistance to pathogens, pheromone function, cell signaling, behavior, polysaccharide metabolism and arginine kynase activity. Conclusion: The generation and analysis of expressed sequence tags from Atta laevigata have provided important genetic basis for future studies on the biology of leaf-cutting ants and may contribute to the development of a more specific and environmentally friendly method for the control of agricultural pest leafcutters.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
We examine the problem of combining Mexican inflation predictions or projections provided by a biweekly survey of professional forecasters. Consumer price inflation in Mexico is measured twice a month. We consider several combining methods and advocate the use of dimension reduction techniques whose performance is compared with different benchmark methods, including the simplest average prediction. Missing values in the database are imputed by two different databased methods. The results obtained are basically robust to the choice of the imputation method. A preliminary analysis of the data was based on its panel data structure and showed the potential usefulness of using dimension reduction techniques to combine the experts' predictions. The main findings are: the first monthly predictions are best combined by way of the first principal component of the predictions available; the best second monthly prediction is obtained by calculating the median prediction and is more accurate than the first one.
Resumo:
Pós-graduação em Agronomia (Energia na Agricultura) - FCA
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
In this work was developed a program capable of performing automatic counting of vehicles on roads. The problem of counting vehicles is using expensive techniques for its realization, techniques which often involve manual counting or degradation of the pavement. The main motivation for this work was the importance that the vehicle counting represents to the Traffic Engineer, being essential to analyze the performance of the roads, allowing to measure the need for installation of traffic lights, roundabouts, access ways, among other means capable of ensuring a continuous flow and safe for vehicles. The main objective of this work was to apply a statistical segmentation technique recently developed, based on a nonparametric linear regression model, to solve the segmentation problem of the program counter. The development program was based on the creation of three major modules, one for the segmentation, another for the tracking and another for the recognition. For the development of the segmentation module, it was applied a statistical technique combined with the segmentation by background difference, in order to optimize the process. The tracking module was developed based on the use of Kalman filters and application of simple concepts of analytical geometry. To develop the recognition module, it was used Fourier descriptors and a neural network multilayer perceptron, trained by backpropagation. Besides the development of the modules, it was also developed a control logic capable of performing the interconnection among the modules, mainly based on a data structure called state. The analysis of the results was applied to the program counter and its component modules, and the individual analysis served as a means to establish the par ameter values of techniques used. The find result was positive, since the statistical segmentation technique proved to be very useful and the developed program was able to count the vehicles belonging to the three goal..
Resumo:
A total of 3.035 lactations of Holstein cows from four farms in the Southeast, to check the influence of data structure of milk yield on the genetic parameters. Four dataset with different structures were tested, weekly controls (CW) with 122.842 controls, monthly controls (CM) 30.883, bimonthly controls (CB) with 15,837 and quarterly controls (CQ) with 12,702. The random regression model was used and was considered as random additive genetic and permanent environment effects, fixed effects of the contemporary groups (herd-year-month of test-day) and age of cow (linear and quadratic effects). Heritability estimates showed similar trends among the data files analyzed, with the greatest similarity between dataset CS, CM and CB. The dataset submitted all the CB estimates of genetic parameters analyzed with the same trend and similar magnitude to the CS and CM dataset, allowing the claim that there was no influence of the data structure on estimates of covariance components for the dataset CS, CM and CB. Thus, milk recording could be accomplished in a CB structure.