105 resultados para GPU computing


Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we propose a multi-camera application capable of processing high resolution images and extracting features based on colors patterns over graphic processing units (GPU). The goal is to work in real time under the uncontrolled environment of a sport event like a football match. Since football players are composed for diverse and complex color patterns, a Gaussian Mixture Models (GMM) is applied as segmentation paradigm, in order to analyze sport live images and video. Optimization techniques have also been applied over the C++ implementation using profiling tools focused on high performance. Time consuming tasks were implemented over NVIDIA's CUDA platform, and later restructured and enhanced, speeding up the whole process significantly. Our resulting code is around 4-11 times faster on a low cost GPU than a highly optimized C++ version on a central processing unit (CPU) over the same data. Real time has been obtained processing until 64 frames per second. An important conclusion derived from our study is the scalability of the application to the number of cores on the GPU. © 2011 Springer-Verlag.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Wireless sensor node platforms are very diversified and very constrained, particularly in power consumption. When choosing or sizing a platform for a given application, it is necessary to be able to evaluate in an early design stage the impact of those choices. Applied to the computing platform implemented on the sensor node, it requires a good understanding of the workload it must perform. Nevertheless, this workload is highly application-dependent. It depends on the data sampling frequency together with application-specific data processing and management. It is thus necessary to have a model that can represent the workload of applications with various needs and characteristics. In this paper, we propose a workload model for wireless sensor node computing platforms. This model is based on a synthetic application that models the different computational tasks that the computing platform will perform to process sensor data. It allows to model the workload of various different applications by tuning data sampling rate and processing. A case study is performed by modeling different applications and by showing how it can be used for workload characterization. © 2011 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The R-matrix method when applied to the study of intermediate energy electron scattering by the hydrogen atom gives rise to a large number of two electron integrals between numerical basis functions. Each integral is evaluated independently of the others, thereby rendering this a prime candidate for a parallel implementation. In this paper, we present a parallel implementation of this routine which uses a Graphical Processing Unit as a co-processor, giving a speedup of approximately 20 times when compared with a sequential version. We briefly consider properties of this calculation which make a GPU implementation appropriate with a view to identifying other calculations which might similarly benet.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A novel cost-effective and low-latency wormhole router for packet-switched NoC designs, tailored for FPGA, is presented. This has been designed to be scalable at system level to fully exploit the characteristics and constraints of FPGA based systems, rather than custom ASIC technology. A key feature is that it achieves a low packet propagation latency of only two cycles per hop including both router pipeline delay and link traversal delay - a significant enhancement over existing FPGA designs - whilst being very competitive in terms of performance and hardware complexity. It can also be configured in various network topologies including 1-D, 2-D, and 3-D. Detailed design-space exploration has been carried for a range of scaling parameters, with the results of various design trade-offs being presented and discussed. By taking advantage of abundant buildin reconfigurable logic and routing resources, we have been able to create a new scalable on-chip FPGA based router that exhibits high dimensionality and connectivity. The architecture proposed can be easily migrated across many FPGA families to provide flexible, robust and cost-effective NoC solutions suitable for the implementation of high-performance FPGA computing systems. © 2011 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The scheduling problem in distributed data-intensive computing environments has become an active research topic due to the tremendous growth in grid and cloud computing environments. As an innovative distributed intelligent paradigm, swarm intelligence provides a novel approach to solving these potentially intractable problems. In this paper, we formulate the scheduling problem for work-flow applications with security constraints in distributed data-intensive computing environments and present a novel security constraint model. Several meta-heuristic adaptations to the particle swarm optimization algorithm are introduced to deal with the formulation of efficient schedules. A variable neighborhood particle swarm optimization algorithm is compared with a multi-start particle swarm optimization and multi-start genetic algorithm. Experimental results illustrate that population based meta-heuristics approaches usually provide a good balance between global exploration and local exploitation and their feasibility and effectiveness for scheduling work-flow applications. © 2010 Elsevier Inc. All rights reserved.