877 resultados para Acceleration data structure


Relevância:

80.00% 80.00%

Publicador:

Resumo:

A key capability of data-race detectors is to determine whether one thread executes logically in parallel with another or whether the threads must operate in series. This paper provides two algorithms, one serial and one parallel, to maintain series-parallel (SP) relationships "on the fly" for fork-join multithreaded programs. The serial SP-order algorithm runs in O(1) amortized time per operation. In contrast, the previously best algorithm requires a time per operation that is proportional to Tarjan’s functional inverse of Ackermann’s function. SP-order employs an order-maintenance data structure that allows us to implement a more efficient "English-Hebrew" labeling scheme than was used in earlier race detectors, which immediately yields an improved determinacy-race detector. In particular, any fork-join program running in T₁ time on a single processor can be checked on the fly for determinacy races in O(T₁) time. Corresponding improved bounds can also be obtained for more sophisticated data-race detectors, for example, those that use locks. By combining SP-order with Feng and Leiserson’s serial SP-bags algorithm, we obtain a parallel SP-maintenance algorithm, called SP-hybrid. Suppose that a fork-join program has n threads, T₁ work, and a critical-path length of T[subscript ∞]. When executed on P processors, we prove that SP-hybrid runs in O((T₁/P + PT[subscript ∞]) lg n) expected time. To understand this bound, consider that the original program obtains linear speed-up over a 1-processor execution when P = O(T₁/T[subscript ∞]). In contrast, SP-hybrid obtains linear speed-up when P = O(√T₁/T[subscript ∞]), but the work is increased by a factor of O(lg n).

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Resumen: Este trabajo estudia los resultados en matemáticas y lenguaje de 32000 estudiantes en la prueba saber 11 del 2008, de la ciudad de Bogotá. Este análisis reconoce que los individuos se encuentran contenidos en barrios y colegios, pero no todos los individuos del mismo barrio asisten a la misma escuela y viceversa. Con el fin de modelar esta estructura de datos se utilizan varios modelos econométricos, incluyendo una regresión jerárquica multinivel de efectos cruzados. Nuestro objetivo central es identificar en qué medida y que condiciones del barrio y del colegio se correlacionan con los resultados educacionales de la población objetivo y cuáles características de los barrios y de los colegios están más asociadas al resultado en las pruebas. Usamos datos de la prueba saber 11, del censo de colegios c600, del censo poblacional del 2005 y de la policía metropolitana de Bogotá. Nuestras estimaciones muestran que tanto el barrio como el colegio están correlacionados con los resultados en las pruebas; pero el efecto del colegio parece ser mucho más fuerte que el del barrio. Las características del colegio que están más asociadas con el resultado en las pruebas son la educación de los profesores, la jornada, el valor de la pensión, y el contexto socio económico del colegio. Las características de los barrios más asociadas con el resultado en las pruebas son, la presencia de universitarios en la UPZ, un clúster de altos niveles de educación y nivel de crimen en el barrio que se correlaciona negativamente. Los resultados anteriores fueron hallados teniendo en cuenta controles familiares y personales.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper a cell by cell anisotropic adaptive mesh technique is added to an existing staggered mesh Lagrange plus remap finite element ALE code for the solution of the Euler equations. The quadrilateral finite elements may be subdivided isotropically or anisotropically and a hierarchical data structure is employed. An efficient computational method is proposed, which only solves on the finest level of resolution that exists for each part of the domain with disjoint or hanging nodes being used at resolution transitions. The Lagrangian, equipotential mesh relaxation and advection (solution remapping) steps are generalised so that they may be applied on the dynamic mesh. It is shown that for a radial Sod problem and a two-dimensional Riemann problem the anisotropic adaptive mesh method runs over eight times faster.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Clustering is defined as the grouping of similar items in a set, and is an important process within the field of data mining. As the amount of data for various applications continues to increase, in terms of its size and dimensionality, it is necessary to have efficient clustering methods. A popular clustering algorithm is K-Means, which adopts a greedy approach to produce a set of K-clusters with associated centres of mass, and uses a squared error distortion measure to determine convergence. Methods for improving the efficiency of K-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting a more efficient data structure, notably a multi-dimensional binary search tree (KD-Tree) to store either centroids or data points. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient K-Means techniques in parallel computational environments. In this work, we provide a parallel formulation for the KD-Tree based K-Means algorithm and address its load balancing issues.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Techniques for improving the efficiency of k-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing issue. Three solutions have been developed and tested. Two approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In real world applications sequential algorithms of data mining and data exploration are often unsuitable for datasets with enormous size, high-dimensionality and complex data structure. Grid computing promises unprecedented opportunities for unlimited computing and storage resources. In this context there is the necessity to develop high performance distributed data mining algorithms. However, the computational complexity of the problem and the large amount of data to be explored often make the design of large scale applications particularly challenging. In this paper we present the first distributed formulation of a frequent subgraph mining algorithm for discriminative fragments of molecular compounds. Two distributed approaches have been developed and compared on the well known National Cancer Institute’s HIV-screening dataset. We present experimental results on a small-scale computing environment.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This study investigates the superposition-based cooperative transmission system. In this system, a key point is for the relay node to detect data transmitted from the source node. This issued was less considered in the existing literature as the channel is usually assumed to be flat fading and a priori known. In practice, however, the channel is not only a priori unknown but subject to frequency selective fading. Channel estimation is thus necessary. Of particular interest is the channel estimation at the relay node which imposes extra requirement for the system resources. The authors propose a novel turbo least-square channel estimator by exploring the superposition structure of the transmission data. The proposed channel estimator not only requires no pilot symbols but also has significantly better performance than the classic approach. The soft-in-soft-out minimum mean square error (MMSE) equaliser is also re-derived to match the superimposed data structure. Finally computer simulation results are shown to verify the proposed algorithm.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Web Services for Remote Portlets (WSRP) is gaining attention among portal developers and vendors to enable easy development, increased richness in functionality, pluggability, and flexibility of deployment. Whilst currently not supporting all WSRP functionalities, open-source portal frameworks could in future use WSRP Consumers to access remote portlets found from a WSRP Producer registry service. This implies that we need a central registry for the remote portlets and a more expressive WSRP Consumer interface to implement the remote portlet functions. This paper reports on an investigation into a new system architecture, which includes a Web Services repository, registry, and client interface. The Web Services repository holds portlets as remote resource producers. A new data structure for expressing remote portlets is found and published by populating a Universal Description, Discovery and Integration (UDDI) registry. A remote portlet publish and search engine for UDDI has also been developed. Finally, a remote portlet client interface was developed as a Web application. The client interface supports remote portlet features, as well as window status and mode functions. Copyright (c) 2007 John Wiley & Sons, Ltd.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

K-Means is a popular clustering algorithm which adopts an iterative refinement procedure to determine data partitions and to compute their associated centres of mass, called centroids. The straightforward implementation of the algorithm is often referred to as `brute force' since it computes a proximity measure from each data point to each centroid at every iteration of the K-Means process. Efficient implementations of the K-Means algorithm have been predominantly based on multi-dimensional binary search trees (KD-Trees). A combination of an efficient data structure and geometrical constraints allow to reduce the number of distance computations required at each iteration. In this work we present a general space partitioning approach for improving the efficiency and the scalability of the K-Means algorithm. We propose to adopt approximate hierarchical clustering methods to generate binary space partitioning trees in contrast to KD-Trees. In the experimental analysis, we have tested the performance of the proposed Binary Space Partitioning K-Means (BSP-KM) when a divisive clustering algorithm is used. We have carried out extensive experimental tests to compare the proposed approach to the one based on KD-Trees (KD-KM) in a wide range of the parameters space. BSP-KM is more scalable than KDKM, while keeping the deterministic nature of the `brute force' algorithm. In particular, the proposed space partitioning approach has shown to overcome the well-known limitation of KD-Trees in high-dimensional spaces and can also be adopted to improve the efficiency of other algorithms in which KD-Trees have been used.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We have optimised the atmospheric radiation algorithm of the FAMOUS climate model on several hardware platforms. The optimisation involved translating the Fortran code to C and restructuring the algorithm around the computation of a single air column. Instead of the existing MPI-based domain decomposition, we used a task queue and a thread pool to schedule the computation of individual columns on the available processors. Finally, four air columns are packed together in a single data structure and computed simultaneously using Single Instruction Multiple Data operations. The modified algorithm runs more than 50 times faster on the CELL’s Synergistic Processing Elements than on its main PowerPC processing element. On Intel-compatible processors, the new radiation code runs 4 times faster. On the tested graphics processor, using OpenCL, we find a speed-up of more than 2.5 times as compared to the original code on the main CPU. Because the radiation code takes more than 60% of the total CPU time, FAMOUS executes more than twice as fast. Our version of the algorithm returns bit-wise identical results, which demonstrates the robustness of our approach. We estimate that this project required around two and a half man-years of work.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Bloom filter is a space efficient randomized data structure for representing a set and supporting membership queries. Bloom filters intrinsically allow false positives. However, the space savings they offer outweigh the disadvantage if the false positive rates are kept sufficiently low. Inspired by the recent application of the Bloom filter in a novel multicast forwarding fabric, this paper proposes a variant of the Bloom filter, the optihash. The optihash introduces an optimization for the false positive rate at the stage of Bloom filter formation using the same amount of space at the cost of slightly more processing than the classic Bloom filter. Often Bloom filters are used in situations where a fixed amount of space is a primary constraint. We present the optihash as a good alternative to Bloom filters since the amount of space is the same and the improvements in false positives can justify the additional processing. Specifically, we show via simulations and numerical analysis that using the optihash the false positives occurrences can be reduced and controlled at a cost of small additional processing. The simulations are carried out for in-packet forwarding. In this framework, the Bloom filter is used as a compact link/route identifier and it is placed in the packet header to encode the route. At each node, the Bloom filter is queried for membership in order to make forwarding decisions. A false positive in the forwarding decision is translated into packets forwarded along an unintended outgoing link. By using the optihash, false positives can be reduced. The optimization processing is carried out in an entity termed the Topology Manger which is part of the control plane of the multicast forwarding fabric. This processing is only carried out on a per-session basis, not for every packet. The aim of this paper is to present the optihash and evaluate its false positive performances via simulations in order to measure the influence of different parameters on the false positive rate. The false positive rate for the optihash is then compared with the false positive probability of the classic Bloom filter.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Moving-least-squares (MLS) surfaces undergoing large deformations need periodic regeneration of the point set (point-set resampling) so as to keep the point-set density quasi-uniform. Previous work by the authors dealt with algebraic MLS surfaces, and proposed a resampling strategy based on defining the new points at the intersections of the MLS surface with a suitable set of rays. That strategy has very low memory requirements and is easy to parallelize. In this article new resampling strategies with reduced CPU-time cost are explored. The basic idea is to choose as set of rays the lines of a regular, Cartesian grid, and to fully exploit this grid: as data structure for search queries, as spatial structure for traversing the surface in a continuation-like algorithm, and also as approximation grid for an interpolated version of the MLS surface. It is shown that in this way a very simple and compact resampling technique is obtained, which cuts the resampling cost by half with affordable memory requirements.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Given two strings A and B of lengths n(a) and n(b), n(a) <= n(b), respectively, the all-substrings longest common subsequence (ALCS) problem obtains, for every substring B` of B, the length of the longest string that is a subsequence of both A and B. The ALCS problem has many applications, such as finding approximate tandem repeats in strings, solving the circular alignment of two strings and finding the alignment of one string with several others that have a common substring. We present an algorithm to prepare the basic data structure for ALCS queries that takes O(n(a)n(b)) time and O(n(a) + n(b)) space. After this preparation, it is possible to build that allows any LCS length to be retrieved in constant time. Some trade-offs between the space required and a matrix of size O(n(b)(2)) the querying time are discussed. To our knowledge, this is the first algorithm in the literature for the ALCS problem. (C) 2007 Elsevier B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Usually, a Petri net is applied as an RFID model tool. This paper, otherwise, presents another approach to the Petri net concerning RFID systems. This approach, called elementary Petri net inside an RFID distributed database, or PNRD, is the first step to improve RFID and control systems integration, based on a formal data structure to identify and update the product state in real-time process execution, allowing automatic discovery of unexpected events during tag data capture. There are two main features in this approach: to use RFID tags as the object process expected database and last product state identification; and to apply Petri net analysis to automatically update the last product state registry during reader data capture. RFID reader data capture can be viewed, in Petri nets, as a direct analysis of locality for a specific transition that holds in a specific workflow. Following this direction, RFID readers storage Petri net control vector list related to each tag id is expected to be perceived. This paper presents PNRD cornerstones and a PNRD implementation example in software called DEMIS Distributed Environment in Manufacturing Information Systems.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We revisit the problem of visibility, which is to determine a set of primitives potentially visible in a set of geometry data represented by a data structure, such as a mesh of polygons or triangles, we propose a solution for speeding up the three-dimensional visualization processing in applications. We introduce a lean structure , in the sense of data abstraction and reduction, which can be used for online and interactive applications. The visibility problem is especially important in 3D visualization of scenes represented by large volumes of data, when it is not worthwhile keeping all polygons of the scene in memory. This implies a greater time spent in the rendering, or is even impossible to keep them all in huge volumes of data. In these cases, given a position and a direction of view, the main objective is to determine and load a minimum ammount of primitives (polygons) in the scene, to accelerate the rendering step. For this purpose, our algorithm performs cutting primitives (culling) using a hybrid paradigm based on three known techniques. The scene is divided into a cell grid, for each cell we associate the primitives that belong to them, and finally determined the set of primitives potentially visible. The novelty is the use of triangulation Ja 1 to create the subdivision grid. We chose this structure because of its relevant characteristics of adaptivity and algebrism (ease of calculations). The results show a substantial improvement over traditional methods when applied separately. The method introduced in this work can be used in devices with low or no dedicated processing power CPU, and also can be used to view data via the Internet, such as virtual museums applications