34 resultados para Parallel or distributed processing


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Bhutani N, Ray S, Murthy A. Is saccade averaging determined by visual processing or movement planning? J Neurophysiol 108: 3161-3171, 2012. First published September 26, 2012; doi:10.1152/jn.00344.2012.-Saccadic averaging that causes subjects' gaze to land between the location of two targets when faced with simultaneously or sequentially presented stimuli has been often used as a probe to investigate the nature of computations that transform sensory representations into an oculomotor plan. Since saccadic movements involve at least two processing stages-a visual stage that selects a target and a movement stage that prepares the response-saccade averaging can either occur due to interference in visual processing or movement planning. By having human subjects perform two versions of a saccadic double-step task, in which the stimuli remained the same, but different instructions were provided (REDIRECT gaze to the later-appearing target vs. FOLLOW the sequence of targets in their order of appearance), we tested two alternative hypotheses. If saccade averaging were due to visual processing alone, the pattern of saccade averaging is expected to remain the same across task conditions. However, whereas subjects produced averaged saccades between two targets in the FOLLOW condition, they produced hypometric saccades in the direction of the initial target in the REDIRECT condition, suggesting that the interaction between competing movement plans produces saccade averaging.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We consider the wireless two-way relay channel, in which two-way data transfer takes place between the end nodes with the help of a relay. For the Denoise-And-Forward (DNF) protocol, it was shown by Koike-Akino et al. that adaptively changing the network coding map used at the relay greatly reduces the impact of Multiple Access Interference at the relay. The harmful effect of the deep channel fade conditions can be effectively mitigated by proper choice of these network coding maps at the relay. Alternatively, in this paper we propose a Distributed Space Time Coding (DSTC) scheme, which effectively removes most of the deep fade channel conditions at the transmitting nodes itself without any CSIT and without any need to adaptively change the network coding map used at the relay. It is shown that the deep fades occur when the channel fade coefficient vector falls in a finite number of vector subspaces of, which are referred to as the singular fade subspaces. DSTC design criterion referred to as the singularity minimization criterion under which the number of such vector subspaces are minimized is obtained. Also, a criterion to maximize the coding gain of the DSTC is obtained. Explicit low decoding complexity DSTC designs which satisfy the singularity minimization criterion and maximize the coding gain for QAM and PSK signal sets are provided. Simulation results show that at high Signal to Noise Ratio, the DSTC scheme provides large gains when compared to the conventional Exclusive OR network code and performs better than the adaptive network coding scheme.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

With proliferation of chip multicores (CMPs) on desktops and embedded platforms, multi-threaded programs have become ubiquitous. Existence of multiple threads may cause resource contention, such as, in on-chip shared cache and interconnects, depending upon how they access resources. Hence, we propose a tool - Thread Contention Predictor (TCP) to help quantify the number of threads sharing data and their sharing pattern. We demonstrate its use to predict a more profitable shared, last level on-chip cache (LLC) access policy on CMPs. Our cache configuration predictor is 2.2 times faster compared to the cycle-accurate simulations. We also demonstrate its use for identifying hot data structures in a program which may cause performance degradation due to false data sharing. We fix layout of such data structures and show up-to 10% and 18% improvement in execution time and energy-delay product (EDP), respectively.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Programming for parallel architectures that do not have a shared address space is extremely difficult due to the need for explicit communication between memories of different compute devices. A heterogeneous system with CPUs and multiple GPUs, or a distributed-memory cluster are examples of such systems. Past works that try to automate data movement for distributed-memory architectures can lead to excessive redundant communication. In this paper, we propose an automatic data movement scheme that minimizes the volume of communication between compute devices in heterogeneous and distributed-memory systems. We show that by partitioning data dependences in a particular non-trivial way, one can generate data movement code that results in the minimum volume for a vast majority of cases. The techniques are applicable to any sequence of affine loop nests and works on top of any choice of loop transformations, parallelization, and computation placement. The data movement code generated minimizes the volume of communication for a particular configuration of these. We use a combination of powerful static analyses relying on the polyhedral compiler framework and lightweight runtime routines they generate, to build a source-to-source transformation tool that automatically generates communication code. We demonstrate that the tool is scalable and leads to substantial gains in efficiency. On a heterogeneous system, the communication volume is reduced by a factor of 11X to 83X over state-of-the-art, translating into a mean execution time speedup of 1.53X. On a distributed-memory cluster, our scheme reduces the communication volume by a factor of 1.4X to 63.5X over state-of-the-art, resulting in a mean speedup of 1.55X. In addition, our scheme yields a mean speedup of 2.19X over hand-optimized UPC codes.