263 resultados para 291704 Computer Communications Networks


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Multicore computational accelerators such as GPUs are now commodity components for highperformance computing at scale. While such accelerators have been studied in some detail as stand-alone computational engines, their integration in large-scale distributed systems raises new challenges and trade-offs. In this paper, we present an exploration of resource management alternatives for building asymmetric accelerator-based distributed systems. We present these alternatives in the context of a capabilities-aware framework for data-intensive computing, which uses an enhanced implementation of the MapReduce programming model for accelerator-based clusters, compared to the state of the art. The framework can transparently utilize heterogeneous accelerators for deriving high performance with low programming effort. Our work is the first to compare heterogeneous types of accelerators, GPUs and a Cell processors, in the same environment and the first to explore the trade-offs between compute-efficient and control-efficient accelerators on data-intensive systems. Our investigation shows that our framework scales well with the number of different compute nodes. Furthermore, it runs simultaneously on two different types of accelerators, successfully adapts to the resource capabilities, and performs 26.9% better on average than a static execution approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper investigates a systematic approach for the identification and control of Hammerstein systems over a physical IEEE 802.11b wireless channel.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Advances in silicon technology have been a key development in the realisation of many telecommunication and signal processing systems. In many cases, the development of application-specific digital signal processing (DSP) chips is the most cost-effective solution and provides the highest performance. Advances made in computer-aided design (CAD) tools and design methodologies now allow designers to develop complex chips within months or even weeks. This paper gives an insight into the challenges and design methodologies of implementing advanced highperformance chips for DSP. In particular, the paper reviews some of the techniques used to develop circuit architectures from high-level descriptions and the tools which are then used to realise silicon layout.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Traditional static analysis fails to auto-parallelize programs with a complex control and data flow. Furthermore, thread-level parallelism in such programs is often restricted to pipeline parallelism, which can be hard to discover by a programmer. In this paper we propose a tool that, based on profiling information, helps the programmer to discover parallelism. The programmer hand-picks the code transformations from among the proposed candidates which are then applied by automatic code transformation techniques.

This paper contributes to the literature by presenting a profiling tool for discovering thread-level parallelism. We track dependencies at the whole-data structure level rather than at the element level or byte level in order to limit the profiling overhead. We perform a thorough analysis of the needs and costs of this technique. Furthermore, we present and validate the belief that programs with complex control and data flow contain significant amounts of exploitable coarse-grain pipeline parallelism in the program’s outer loops. This observation validates our approach to whole-data structure dependencies. As state-of-the-art compilers focus on loops iterating over data structure members, this observation also explains why our approach finds coarse-grain pipeline parallelism in cases that have remained out of reach for state-of-the-art compilers. In cases where traditional compilation techniques do find parallelism, our approach allows to discover higher degrees of parallelism, allowing a 40% speedup over traditional compilation techniques. Moreover, we demonstrate real speedups on multiple hardware platforms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

As a result of resource limitations, state in branch predictors is frequently shared between uncorrelated branches. This interference can significantly limit prediction accuracy. In current predictor designs, the branches sharing prediction information are determined by their branch addresses and thus branch groups are arbitrarily chosen during compilation. This feasibility study explores a more analytic and systematic approach to classify branches into clusters with similar behavioral characteristics. We present several ways to incorporate this cluster information as an additional information source in branch predictors.