989 resultados para Parallel computation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Streaming SIMD Extensions (SSE) is a unique feature embedded in the Pentium III and P4 classes of microprocessors. By fully exploiting SSE, parallel algorithms can be implemented on a standard personal computer and a theoretical speedup of four can be achieved. In this paper, we demonstrate the implementation of a parallel LU matrix decomposition algorithm for solving power systems network equations with SSE and discuss advantages and disadvantages of this approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Morse-Smale complex is a useful topological data structure for the analysis and visualization of scalar data. This paper describes an algorithm that processes all mesh elements of the domain in parallel to compute the Morse-Smale complex of large two-dimensional data sets at interactive speeds. We employ a reformulation of the Morse-Smale complex using Forman's Discrete Morse Theory and achieve scalability by computing the discrete gradient using local accesses only. We also introduce a novel approach to merge gradient paths that ensures accurate geometry of the computed complex. We demonstrate that our algorithm performs well on both multicore environments and on massively parallel architectures such as the GPU.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Morse-Smale complex is a topological structure that captures the behavior of the gradient of a scalar function on a manifold. This paper discusses scalable techniques to compute the Morse-Smale complex of scalar functions defined on large three-dimensional structured grids. Computing the Morse-Smale complex of three-dimensional domains is challenging as compared to two-dimensional domains because of the non-trivial structure introduced by the two types of saddle criticalities. We present a parallel shared-memory algorithm to compute the Morse-Smale complex based on Forman's discrete Morse theory. The algorithm achieves scalability via synergistic use of the CPU and the GPU. We first prove that the discrete gradient on the domain can be computed independently for each cell and hence can be implemented on the GPU. Second, we describe a two-step graph traversal algorithm to compute the 1-saddle-2-saddle connections efficiently and in parallel on the CPU. Simultaneously, the extremasaddle connections are computed using a tree traversal algorithm on the GPU.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper studies the development of a real-time stereovision system to track multiple infrared markers attached to a surgical instrument. Multiple stages of pipeline in field-programmable gate array (FPGA) are developed to recognize the targets in both left and right image planes and to give each target a unique label. The pipeline architecture includes a smoothing filter, an adaptive threshold module, a connected component labeling operation, and a centroid extraction process. A parallel distortion correction method is proposed and implemented in a dual-core DSP. A suitable kinematic model is established for the moving targets, and a novel set of parallel and interactive computation mechanisms is proposed to position and track the targets, which are carried out by a cross-computation method in a dual-core DSP. The proposed tracking system can track the 3-D coordinate, velocity, and acceleration of four infrared markers with a delay of 9.18 ms. Furthermore, it is capable of tracking a maximum of 110 infrared markers without frame dropping at a frame rate of 60 f/s. The accuracy of the proposed system can reach the scale of 0.37 mm RMS along the x- and y-directions and 0.45 mm RMS along the depth direction (the depth is from 0.8 to 0.45 m). The performance of the proposed system can meet the requirements of applications such as surgical navigation, which needs high real time and accuracy capability.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper studies the development of a real-time stereovision system to track multiple infrared markers attached to a surgical instrument. Multiple stages of pipeline in field-programmable gate array (FPGA) are developed to recognize the targets in both left and right image planes and to give each target a unique label. The pipeline architecture includes a smoothing filter, an adaptive threshold module, a connected component labeling operation, and a centroid extraction process. A parallel distortion correction method is proposed and implemented in a dual-core DSP. A suitable kinematic model is established for the moving targets, and a novel set of parallel and interactive computation mechanisms is proposed to position and track the targets, which are carried out by a cross-computation method in a dual-core DSP. The proposed tracking system can track the 3-D coordinate, velocity, and acceleration of four infrared markers with a delay of 9.18 ms. Furthermore, it is capable of tracking a maximum of 110 infrared markers without frame dropping at a frame rate of 60 f/s. The accuracy of the proposed system can reach the scale of 0.37 mm RMS along the x- and y-directions and 0.45 mm RMS along the depth direction (the depth is from 0.8 to 0.45 m). The performance of the proposed system can meet the requirements of applications such as surgical navigation, which needs high real time and accuracy capability.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A vernier offset is detected at once among straight lines, and reaction times are almost independent of the number of simultaneously presented stimuli (distractors), indicating parallel processing of vernier offsets. Reaction times for identifying a vernier offset to one side among verniers offset to the opposite side increase with the number of distractors, indicating serial processing. Even deviations below a photoreceptor diameter can be detected at once. The visual system thus attains positional accuracy below the photoreceptor diameter simultaneously at different positions. I conclude that deviation from straightness, or change of orientation, is detected in parallel over the visual field. Discontinuities or gradients in orientation may represent an elementary feature of vision.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We describe recent progress of an ongoing research programme aimed at producing computational science software that can exploit high performance architectures in the atomic physics application domain. We examine the computational bottleneck of matrix construction in a suite of two-dimensional R-matrix propagation programs, 2DRMP, that are aimed at creating virtual electron collision experiments on HPC architectures. We build on Ixaru's extended frequency dependent quadrature rules (EFDQR) for Slater integrals and examine the challenge of constructing Hamiltonian matrices in parallel across an m-processor compute node in a block cyclic distribution for subsequent diagonalization by ScaLAPACK.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Studies have shown that most of the computers in a non-dedicated cluster are often idle or lightly loaded. The underutilized computers in a non-dedicated cluster can be employed to execute parallel applications. The aim of this study is to learn how concurrent execution of a computation-bound and sequential applications influence their execution performance and cluster utilization. The result of the study has demonstrated that a computation-bound parallel application benefits from load balancing, and at the same time sequential applications suffer only an insignificant slowdown of execution. Overall, the utilization of a non-dedicated cluster is improved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Studies have shown that most of the computers in a non-dedicated cluster are often idle or lightly loaded. The underutilized computers in a non-dedicated cluster can be employed to execute parallel applications. The aim of this study is to learn how concurrent execution of a computation-bound and sequential applications influence their execution performance and cluster utilization. The result of the study has demonstrated that a computation-bound parallel application benefits from load balancing, and at the same time sequential applications suffer only an insignificant slowdown of execution. Overall, the utilization of a non-dedicated cluster is improved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

"This report reproduces a thesis of the same title submitted to the Department of Electrical Engineering, Massachusetts Institute of Technology, in partial fulfillment of the requirements for the degree of Doctor of Philosophy, May 1970."--p. 2

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Three paradigms for distributed-memory parallel computation that free the application programmer from the details of message passing are compared for an archetypal structured scientific computation -- a nonlinear, structured-grid partial differential equation boundary value problem -- using the same algorithm on the same hardware. All of the paradigms -- parallel languages represented by the Portland Group's HPF, (semi-)automated serial-to-parallel source-to-source translation represented by CAP-Tools from the University of Greenwich, and parallel libraries represented by Argonne's PETSc -- are found to be easy to use for this problem class, and all are reasonably effective in exploiting concurrency after a short learning curve. The level of involvement required by the application programmer under any paradigm includes specification of the data partitioning, corresponding to a geometrically simple decomposition of the domain of the PDE. Programming in SPMD style for the PETSc library requires writing only the routines that discretize the PDE and its Jacobian, managing subdomain-to-processor mappings (affine global-to-local index mappings), and interfacing to library solver routines. Programming for HPF requires a complete sequential implementation of the same algorithm as a starting point, introduction of concurrency through subdomain blocking (a task similar to the index mapping), and modest experimentation with rewriting loops to elucidate to the compiler the latent concurrency. Programming with CAPTools involves feeding the same sequential implementation to the CAPTools interactive parallelization system, and guiding the source-to-source code transformation by responding to various queries about quantities knowable only at runtime. Results representative of "the state of the practice" for a scaled sequence of structured grid problems are given on three of the most important contemporary high-performance platforms: the IBM SP, the SGI Origin 2000, and the CRAYY T3E.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The real-time parallel computation of histograms using an array of pipelined cells is proposed and prototyped in this paper with application to consumer imaging products. The array operates in two modes: histogram computation and histogram reading. The proposed parallel computation method does not use any memory blocks. The resulting histogram bins can be stored into an external memory block in a pipelined fashion for subsequent reading or streaming of the results. The array of cells can be tuned to accommodate the required data path width in a VLSI image processing engine as present in many imaging consumer devices. Synthesis of the architectures presented in this paper in FPGA are shown to compute the real-time histogram of images streamed at over 36 megapixels at 30 frames/s by processing in parallel 1, 2 or 4 pixels per clock cycle.