774 resultados para GPGPU Parallel Computing


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Space applications are challenged by the reliability of parallel computing systems (FPGAs) employed in space crafts due to Single-Event Upsets. The work reported in this paper aims to achieve self-managing systems which are reliable for space applications by applying autonomic computing constructs to parallel computing systems. A novel technique, 'Swarm-Array Computing' inspired by swarm robotics, and built on the foundations of autonomic and parallel computing is proposed as a path to achieve autonomy. The constitution of swarm-array computing comprising for constituents, namely the computing system, the problem / task, the swarm and the landscape is considered. Three approaches that bind these constituents together are proposed. The feasibility of one among the three proposed approaches is validated on the SeSAm multi-agent simulator and landscapes representing the computing space and problem are generated using the MATLAB.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Space applications demand the need for building reliable systems. Autonomic computing defines such reliable systems as self-managing systems. The work reported in this paper combines agent based and swarm robotic approaches leading to swarm-array computing, a novel technique to achieve autonomy for distributed parallel computing systems. Two swarm-array computing approaches based on swarms of computational resources and swarms of tasks are explored. FPGA is considered as the computing system. The feasibility of the two proposed approaches that binds the computing system and the task together is simulated on the SeSAm multi-agent simulator.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The locally twisted cube is a newly introduced interconnection network for parallel computing. Ring embedding is an important issue for evaluating the performance of an interconnection network. In this paper, we investigate the problem of embedding rings into a locally twisted cube. Our main contribution is to find that, for each integer l is an element of (4,5,...,2(n)}, a ring of length I can be embedded into an n-dimensional locally twisted cube so that both the dilation and the load factor are one. As a result, a locally twisted cube is Hamiltonian. We conclude that a locally twisted cube is superior to a hypercube in terms of ring embedding capability. (C) 2004 Elsevier Ltd. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

An interconnection network with n nodes is four-pancyclic if it contains a cycle of length l for each integer l with 4 <= l <= n. An interconnection network is fault-tolerant four-pancyclic if the surviving network is four-pancyclic in the presence of faults. The fault-tolerant four-pancyclicity of interconnection networks is a desired property because many classical parallel algorithms can be mapped onto such networks in a communication-efficient fashion, even in the presence of failing nodes or edges. Due to some attractive properties as compared with its hypercube counterpart of the same size, the Mobius cube has been proposed as a promising candidate for interconnection topology. Hsieh and Chen [S.Y. Hsieh, C.H. Chen, Pancyclicity on Mobius cubes with maximal edge faults, Parallel Computing, 30(3) (2004) 407-421.] showed that an n-dimensional Mobius cube is four-pancyclic in the presence of up to n-2 faulty edges. In this paper, we show that an n-dimensional Mobius cube is four-pancyclic in the presence of up to n-2 faulty nodes. The obtained result is optimal in that, if n-1 nodes are removed, the surviving network may not be four-pancyclic. (C) 2005 Elsevier B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Processor virtualization for process migration in distributed parallel computing systems has formed a significant component of research on load balancing. In contrast, the potential of processor virtualization for fault tolerance has been addressed minimally. The work reported in this paper is motivated towards extending concepts of processor virtualization towards ‘intelligent cores’ as a means to achieve fault tolerance in distributed parallel computing systems. Intelligent cores are an abstraction of the hardware processing cores, with the incorporation of cognitive capabilities, on which parallel tasks can be executed and migrated. When a processing core executing a task is predicted to fail the task being executed is proactively transferred onto another core. A parallel reduction algorithm incorporating concepts of intelligent cores is implemented on a computer cluster using Adaptive MPI and Charm ++. Preliminary results confirm the feasibility of the approach.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Recent research in multi-agent systems incorporate fault tolerance concepts, but does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely 'Intelligent Agents'. A task to be executed on a parallel computing system is decomposed to sub-tasks and mapped onto agents that traverse an abstracted hardware layer. The agents intercommunicate across processors to share information during the event of a predicted core/processor failure and for successfully completing the task. The feasibility of the approach is validated by implementation of a parallel reduction algorithm on a computer cluster using the Message Passing Interface.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Recent research in multi-agent systems incorporate fault tolerance concepts, but does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely 'Intelligent Agents'. A task to be executed on a parallel computing system is decomposed to sub-tasks and mapped onto agents that traverse an abstracted hardware layer. The agents intercommunicate across processors to share information during the event of a predicted core/processor failure and for successfully completing the task. The feasibility of the approach is validated by simulations on an FPGA using a multi-agent simulator, and implementation of a parallel reduction algorithm on a computer cluster using the Message Passing Interface.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents a review of the design and development of the Yorick series of active stereo camera platforms and their integration into real-time closed loop active vision systems, whose applications span surveillance, navigation of autonomously guided vehicles (AGVs), and inspection tasks for teleoperation, including immersive visual telepresence. The mechatronic approach adopted for the design of the first system, including head/eye platform, local controller, vision engine, gaze controller and system integration, proved to be very successful. The design team comprised researchers with experience in parallel computing, robot control, mechanical design and machine vision. The success of the project has generated sufficient interest to sanction a number of revisions of the original head design, including the design of a lightweight compact head for use on a robot arm, and the further development of a robot head to look specifically at increasing visual resolution for visual telepresence. The controller and vision processing engines have also been upgraded, to include the control of robot heads on mobile platforms and control of vergence through tracking of an operator's eye movement. This paper details the hardware development of the different active vision/telepresence systems.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In order to gain knowledge from large databases, scalable data mining technologies are needed. Data are captured on a large scale and thus databases are increasing at a fast pace. This leads to the utilisation of parallel computing technologies in order to cope with large amounts of data. In the area of classification rule induction, parallelisation of classification rules has focused on the divide and conquer approach, also known as the Top Down Induction of Decision Trees (TDIDT). An alternative approach to classification rule induction is separate and conquer which has only recently been in the focus of parallelisation. This work introduces and evaluates empirically a framework for the parallel induction of classification rules, generated by members of the Prism family of algorithms. All members of the Prism family of algorithms follow the separate and conquer approach.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Artificial neural networks are usually applied to solve complex problems. In problems with more complexity, by increasing the number of layers and neurons, it is possible to achieve greater functional efficiency. Nevertheless, this leads to a greater computational effort. The response time is an important factor in the decision to use neural networks in some systems. Many argue that the computational cost is higher in the training period. However, this phase is held only once. Once the network trained, it is necessary to use the existing computational resources efficiently. In the multicore era, the problem boils down to efficient use of all available processing cores. However, it is necessary to consider the overhead of parallel computing. In this sense, this paper proposes a modular structure that proved to be more suitable for parallel implementations. It is proposed to parallelize the feedforward process of an RNA-type MLP, implemented with OpenMP on a shared memory computer architecture. The research consistes on testing and analizing execution times. Speedup, efficiency and parallel scalability are analyzed. In the proposed approach, by reducing the number of connections between remote neurons, the response time of the network decreases and, consequently, so does the total execution time. The time required for communication and synchronization is directly linked to the number of remote neurons in the network, and so it is necessary to investigate which one is the best distribution of remote connections

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The increasing demand for high performance wireless communication systems has shown the inefficiency of the current model of fixed allocation of the radio spectrum. In this context, cognitive radio appears as a more efficient alternative, by providing opportunistic spectrum access, with the maximum bandwidth possible. To ensure these requirements, it is necessary that the transmitter identify opportunities for transmission and the receiver recognizes the parameters defined for the communication signal. The techniques that use cyclostationary analysis can be applied to problems in either spectrum sensing and modulation classification, even in low signal-to-noise ratio (SNR) environments. However, despite the robustness, one of the main disadvantages of cyclostationarity is the high computational cost for calculating its functions. This work proposes efficient architectures for obtaining cyclostationary features to be employed in either spectrum sensing and automatic modulation classification (AMC). In the context of spectrum sensing, a parallelized algorithm for extracting cyclostationary features of communication signals is presented. The performance of this features extractor parallelization is evaluated by speedup and parallel eficiency metrics. The architecture for spectrum sensing is analyzed for several configuration of false alarm probability, SNR levels and observation time for BPSK and QPSK modulations. In the context of AMC, the reduced alpha-profile is proposed as as a cyclostationary signature calculated for a reduced cyclic frequencies set. This signature is validated by a modulation classification architecture based on pattern matching. The architecture for AMC is investigated for correct classification rates of AM, BPSK, QPSK, MSK and FSK modulations, considering several scenarios of observation length and SNR levels. The numerical results of performance obtained in this work show the eficiency of the proposed architectures

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents vectorized methods of construction and descent of quadtrees that can be easily adapted to message passing parallel computing. A time complexity analysis for the present approach is also discussed. The proposed method of tree construction requires a hash table to index nodes of a linear quadtree in the breadth-first order. The hash is performed in two steps: an internal hash to index child nodes and an external hash to index nodes in the same level (depth). The quadtree descent is performed by considering each level as a vector segment of a linear quadtree, so that nodes of the same level can be processed concurrently. © 2012 Springer-Verlag.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Transactional memory (TM) is a new synchronization mechanism devised to simplify parallel programming, thereby helping programmers to unleash the power of current multicore processors. Although software implementations of TM (STM) have been extensively analyzed in terms of runtime performance, little attention has been paid to an equally important constraint faced by nearly all computer systems: energy consumption. In this work we conduct a comprehensive study of energy and runtime tradeoff sin software transactional memory systems. We characterize the behavior of three state-of-the-art lock-based STM algorithms, along with three different conflict resolution schemes. As a result of this characterization, we propose a DVFS-based technique that can be integrated into the resolution policies so as to improve the energy-delay product (EDP). Experimental results show that our DVFS-enhanced policies are indeed beneficial for applications with high contention levels. Improvements of up to 59% in EDP can be observed in this scenario, with an average EDP reduction of 16% across the STAMP workloads. © 2012 IEEE.