Biblioteca Digital

245 resultados para parallel algorithm

A parallel quantum histogram architecture

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A parallel formulation of an algorithm for the histogram computation of n data items using an on-the-fly data decomposition and a novel quantum-like representation (QR) is developed. The QR transformation separates multiple data read operations from multiple bin update operations thereby making it easier to bind data items into their corresponding histogram bins. Under this model the steps required to compute the histogram is n/s + t steps, where s is a speedup factor and t is associated with pipeline latency. Here, we show that an overall speedup factor, s, is available for up to an eightfold acceleration. Our evaluation also shows that each one of these cells requires less area/time complexity compared to similar proposals found in the literature.

Dynamic group communication for large-scale parallel data mining

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Exascale systems are the next frontier in high-performance computing and are expected to deliver a performance of the order of 10^18 operations per second using massive multicore processors. Very large- and extreme-scale parallel systems pose critical algorithmic challenges, especially related to concurrency, locality and the need to avoid global communication patterns. This work investigates a novel protocol for dynamic group communication that can be used to remove the global communication requirement and to reduce the communication cost in parallel formulations of iterative data mining algorithms. The protocol is used to provide a communication-efficient parallel formulation of the k-means algorithm for cluster analysis. The approach is based on a collective communication operation for dynamic groups of processes and exploits non-uniform data distributions. Non-uniform data distributions can be either found in real-world distributed applications or induced by means of multidimensional binary search trees. The analysis of the proposed dynamic group communication protocol has shown that it does not introduce significant communication overhead. The parallel clustering algorithm has also been extended to accommodate an approximation error, which allows a further reduction of the communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing elements.

Efficient group communication for large-scale parallel clustering

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Global communication requirements and load imbalance of some parallel data mining algorithms are the major obstacles to exploit the computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication cost in iterative parallel data mining algorithms. In particular, the analysis focuses on one of the most influential and popular data mining methods, the k-means algorithm for cluster analysis. The straightforward parallel formulation of the k-means algorithm requires a global reduction operation at each iteration step, which hinders its scalability. This work studies a different parallel formulation of the algorithm where the requirement of global communication can be relaxed while still providing the exact solution of the centralised k-means algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real world distributed applications or can be induced by means of multi-dimensional binary search trees. The approach can also be extended to accommodate an approximation error which allows a further reduction of the communication costs.

A parallel formulation for the simulation of a generic branch predictor

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A parallel formulation for the simulation of a branch prediction algorithm is presented. This parallel formulation identifies independent tasks in the algorithm which can be executed concurrently. The parallel implementation is based on the multithreading model and two parallel programming platforms: pthreads and Cilk++. Improvement in execution performance by up to 7 times is observed for a generic 2-bit predictor in a 12-core multiprocessor system.

Bit-index sort: a fast non-comparison integer sorting algorithm for permutations

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes a fast integer sorting algorithm, herein referred as Bit-index sort, which is a non-comparison sorting algorithm for partial per-mutations, with linear complexity order in execution time. Bit-index sort uses a bit-array to classify input sequences of distinct integers, and exploits built-in bit functions in C compilers supported by machine hardware to retrieve the ordered output sequence. Results show that Bit-index sort outperforms in execution time to quicksort and counting sort algorithms. A parallel approach for Bit-index sort using two simultaneous threads is included, which obtains speedups up to 1.6.

The diabatic contour advective semi-Lagrangian model

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article describes a novel algorithmic development extending the contour advective semi-Lagrangian model to include nonconservative effects. The Lagrangian contour representation of finescale tracer fields, such as potential vorticity, allows for conservative, nondiffusive treatment of sharp gradients allowing very high numerical Reynolds numbers. It has been widely employed in accurate geostrophic turbulence and tracer advection simulations. In the present, diabatic version of the model the constraint of conservative dynamics is overcome by including a parallel Eulerian field that absorbs the nonconservative ( diabatic) tendencies. The diabatic buildup in this Eulerian field is limited through regular, controlled transfers of this field to the contour representation. This transfer is done with a fast newly developed contouring algorithm. This model has been implemented for several idealized geometries. In this paper a single-layer doubly periodic geometry is used to demonstrate the validity of the model. The present model converges faster than the analogous semi-Lagrangian models at increased resolutions. At the same nominal spatial resolution the new model is 40 times faster than the analogous semi-Lagrangian model. Results of an orographically forced idealized storm track show nontrivial dependency of storm-track statistics on resolution and on the numerical model employed. If this result is more generally applicable, this may have important consequences for future high-resolution climate modeling.

An improved algorithm for generating global window brightness temperatures from multiple satellite infrared imagery

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An improved algorithm for the generation of gridded window brightness temperatures is presented. The primary data source is the International Satellite Cloud Climatology Project, level B3 data, covering the period from July 1983 to the present. The algorithm rakes window brightness, temperatures from multiple satellites, both geostationary and polar orbiting, which have already been navigated and normalized radiometrically to the National Oceanic and Atmospheric Administration's Advanced Very High Resolution Radiometer, and generates 3-hourly global images on a 0.5 degrees by 0.5 degrees latitude-longitude grid. The gridding uses a hierarchical scheme based on spherical kernel estimators. As part of the gridding procedure, the geostationary data are corrected for limb effects using a simple empirical correction to the radiances, from which the corrected temperatures are computed. This is in addition to the application of satellite zenith angle weighting to downweight limb pixels in preference to nearer-nadir pixels. The polar orbiter data are windowed on the target time with temporal weighting to account for the noncontemporaneous nature of the data. Large regions of missing data are interpolated from adjacent processed images using a form of motion compensated interpolation based on the estimation of motion vectors using an hierarchical block matching scheme. Examples are shown of the various stages in the process. Also shown are examples of the usefulness of this type of data in GCM validation.

A geometric algorithm for an artificial life form based on anachronistic methods

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Modern methods of spawning new technological motifs are not appropriate when it is desired to realize artificial life as an actual real world entity unto itself (Pattee 1995; Brooks 2006; Chalmers 1995). Many fundamental aspects of such a machine are absent in common methods, which generally lack methodologies of construction. In this paper we mix classical and modern studies in order to attempt to realize an artificial life form from first principles. A model of an algorithm is introduced, its methodology of construction is presented, and the fundamental source from which it sprang is discussed.

Algorithm for generating defective graphene sheets

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An algorithm is presented for the generation of molecular models of defective graphene fragments, containing a majority of 6-membered rings with a small number of 5- and 7-membered rings as defects. The structures are generated from an initial random array of points in 2D space, which are then subject to Delaunay triangulation. The dual of the triangulation forms a Voronoi tessellation of polygons with a range of ring sizes. An iterative cycle of refinement, involving deletion and addition of points followed by further triangulation, is performed until the user-defined criteria for the number of defects are met. The array of points and connectivities are then converted to a molecular structure and subject to geometry optimization using a standard molecular modeling package to generate final atomic coordinates. On the basis of molecular mechanics with minimization, this automated method can generate structures, which conform to user-supplied criteria and avoid the potential bias associated with the manual building of structures. One application of the algorithm is the generation of structures for the evaluation of the reactivity of different defect sites. Ab initio electronic structure calculations on a representative structure indicate preferential fluorination close to 5-ring defects.

A contour-advective semi-lagrangian numerical algorithm for simulating fine-scale conservative dynamical fields

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes a novel numerical algorithm for simulating the evolution of fine-scale conservative fields in layer-wise two-dimensional flows, the most important examples of which are the earth's atmosphere and oceans. the algorithm combines two radically different algorithms, one Lagrangian and the other Eulerian, to achieve an unexpected gain in computational efficiency. The algorithm is demonstrated for multi-layer quasi-geostrophic flow, and results are presented for a simulation of a tilted stratospheric polar vortex and of nearly-inviscid quasi-geostrophic turbulence. the turbulence results contradict previous arguments and simulation results that have suggested an ultimate two-dimensional, vertically-coherent character of the flow. Ongoing extensions of the algorithm to the generally ageostrophic flows characteristic of planetary fluid dynamics are outlined.

Dynamic load balancing for the distributed mining of molecular structures

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable for large-scale, multi-domain, heterogeneous environments, such as computational grids.

A customizable multi-agent system for distributed data mining

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a general Multi-Agent System framework for distributed data mining based on a Peer-to-Peer model. Agent protocols are implemented through message-based asynchronous communication. The framework adopts a dynamic load balancing policy that is particularly suitable for irregular search algorithms. A modular design allows a separation of the general-purpose system protocols and software components from the specific data mining algorithm. The experimental evaluation has been carried out on a parallel frequent subgraph mining algorithm, which has shown good scalability performances.

A genetic algorithm for the design of a fuzzy controller for active queue management

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Active queue management (AQM) policies are those policies of router queue management that allow for the detection of network congestion, the notification of such occurrences to the hosts on the network borders, and the adoption of a suitable control policy. This paper proposes the adoption of a fuzzy proportional integral (FPI) controller as an active queue manager for Internet routers. The analytical design of the proposed FPI controller is carried out in analogy with a proportional integral (PI) controller, which recently has been proposed for AQM. A genetic algorithm is proposed for tuning of the FPI controller parameters with respect to optimal disturbance rejection. In the paper the FPI controller design metodology is described and the results of the comparison with random early detection (RED), tail drop, and PI controller are presented.

Decentralized load balancing for highly irregular search problems

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we present a distributed computing framework for problems characterized by a highly irregular search tree, whereby no reliable workload prediction is available. The framework is based on a peer-to-peer computing environment and dynamic load balancing. The system allows for dynamic resource aggregation, does not depend on any specific meta-computing middleware and is suitable for large-scale, multi-domain, heterogeneous environments, such as computational Grids. Dynamic load balancing policies based on global statistics are known to provide optimal load balancing performance, while randomized techniques provide high scalability. The proposed method combines both advantages and adopts distributed job-pools and a randomized polling technique. The framework has been successfully adopted in a parallel search algorithm for subgraph mining and evaluated on a molecular compounds dataset. The parallel application has shown good calability and close-to linear speedup in a distributed network of workstations.

Efficient operator pipelining in a bit serial genetic algorithm engine

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The authors propose a bit serial pipeline used to perform the genetic operators in a hardware genetic algorithm. The bit-serial nature of the dataflow allows the operators to be pipelined, resulting in an architecture which is area efficient, easily scaled and is independent of the lengths of the chromosomes. An FPGA implementation of the device achieves a throughput of >25 million genes per second

«
1
2
3
4
5
6
7
8
...
16
17
»