Biblioteca Digital

338 resultados para cache-oblivious

The Cache Inference Problem and its Application to Content and Request Routing

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In many networked applications, independent caching agents cooperate by servicing each other's miss streams, without revealing the operational details of the caching mechanisms they employ. Inference of such details could be instrumental for many other processes. For example, it could be used for optimized forwarding (or routing) of one's own miss stream (or content) to available proxy caches, or for making cache-aware resource management decisions. In this paper, we introduce the Cache Inference Problem (CIP) as that of inferring the characteristics of a caching agent, given the miss stream of that agent. While CIP is insolvable in its most general form, there are special cases of practical importance in which it is, including when the request stream follows an Independent Reference Model (IRM) with generalized power-law (GPL) demand distribution. To that end, we design two basic "litmus" tests that are able to detect LFU and LRU replacement policies, the effective size of the cache and of the object universe, and the skewness of the GPL demand for objects. Using extensive experiments under synthetic as well as real traces, we show that our methods infer such characteristics accurately and quite efficiently, and that they remain robust even when the IRM/GPL assumptions do not hold, and even when the underlying replacement policies are not "pure" LFU or LRU. We exemplify the value of our inference framework by considering example applications.

Cache-Integrated Network Interfaces: Flexible On-Chip Communication and Synchronization for Large-Scale CMPs

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Per-core scratchpad memories (or local stores) allow direct inter-core communication, with latency and energy advantages over coherent cache-based communication, especially as CMP architectures become more distributed. We have designed cache-integrated network interfaces, appropriate for scalable multicores, that combine the best of two worlds – the flexibility of caches and the efficiency of scratchpad memories: on-chip SRAM is configurably shared among caching, scratchpad, and virtualized network interface (NI) functions. This paper presents our architecture, which provides local and remote scratchpad access, to either individual words or multiword blocks through RDMA copy. Furthermore, we introduce event responses, as a technique that enables software configurable communication and synchronization primitives. We present three event response mechanisms that expose NI functionality to software, for multiword transfer initiation, completion notifications for software selected sets of arbitrary size transfers, and multi-party synchronization queues. We implemented these mechanisms in a four-core FPGA prototype, and measure the logic overhead over a cache-only design for basic NI functionality to be less than 20%. We also evaluate the on-chip communication performance on the prototype, as well as the performance of synchronization functions with simulation of CMPs with up to 128 cores. We demonstrate efficient synchronization, low-overhead communication, and amortized-overhead bulk transfers, which allow parallelization gains for fine-grain tasks, and efficient exploitation of the hardware bandwidth.

Constructing optimal XOR-functions to minimize cache conflict misses

Relevância:

20.00% 20.00%

Publicador:

Application-specific reconfigurable XOR-indexing to eliminate cache conflict misses

Relevância:

20.00% 20.00%

Publicador:

Efficient profile-based evaluation of randomising set index functions for cache memories

Relevância:

20.00% 20.00%

Publicador:

A technique for high bandwidth and deterministic low latency load/store accesses to multiple cache banks

Relevância:

20.00% 20.00%

Publicador:

Prefetching and Cache Management using Task Lifetimes

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Task-based dataflow programming models and runtimes emerge as promising candidates for programming multicore and manycore architectures. These programming models analyze dynamically task dependencies at runtime and schedule independent tasks concurrently to the processing elements. In such models, cache locality, which is critical for performance, becomes more challenging in the presence of fine-grain tasks, and in architectures with many simple cores.

This paper presents a combined hardware-software approach to improve cache locality and offer better performance is terms of execution time and energy in the memory system. We propose the explicit bulk prefetcher (EBP) and epoch-based cache management (ECM) to help runtimes prefetch task data and guide the replacement decisions in caches. The runtimem software can use this hardware support to expose its internal knowledge about the tasks to the architecture and achieve more efficient task-based execution. Our combined scheme outperforms HW-only prefetchers and state-of-the-art replacement policies, improves performance by an average of 17%, generates on average 26% fewer L2 misses, and consumes on average 28% less energy in the components of the memory system.

Fast Synchronization on Scalable Cache-Coherent Multiprocessors using Hybrid Primitives

Relevância:

20.00% 20.00%

Publicador:

Code and Data Transformations for Improving Shared Cache Performance on SMT Processors

Relevância:

20.00% 20.00%

Publicador:

MESA: Reducing Cache Conflicts by Integrating Static and Run-Time Methods

Relevância:

20.00% 20.00%

Publicador:

On-chip Communication and Synchronization Mechanisms with Cache-Integrated Network Interfaces

Relevância:

20.00% 20.00%

Publicador:

L'être humain se cache dans les détails

Relevância:

20.00% 20.00%

Publicador:

Où se cache le bonheur (2e édition) / par H. Roux-Ferrand

Relevância:

20.00% 20.00%

Publicador:

La pure vérité cachée

Relevância:

20.00% 20.00%

Publicador:

Resumo:

[Mazarinade. 1649]

À cache-cache : comédie en un acte en vers et en prose / par Louis Péricaud et Carle Le Dhuy

Relevância:

20.00% 20.00%

Publicador:

«
1
2
3
4
5
6
7
8
...
22
23
»