Biblioteca Digital

957 resultados para cache placement

Thermographic assessment of dentine pin placement

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Effects of groove placement on retention/resistance of maxillary anterior resin bonded retainers

Relevância:

20.00% 20.00%

Publicador:

Veja mais

RF performance of a 418-MHz radio telemeter packaged for human vaginal placement

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The electrical and communication performance of a 0.8-mu W UHF temperature telemeter designed for human vaginal placement is discussed; a solenoidal loop antenna was used, occupying a volume of 0.1 cm(3). In situ, measured power absorption was between 19-25 dB, resulting in an effective operating range of 10 m. Capacitive loading lowered the antenna's resonant frequency by 1.4% and there was a significant polarization change in the radiated output.

Veja mais

Cache-Integrated Network Interfaces: Flexible On-Chip Communication and Synchronization for Large-Scale CMPs

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Per-core scratchpad memories (or local stores) allow direct inter-core communication, with latency and energy advantages over coherent cache-based communication, especially as CMP architectures become more distributed. We have designed cache-integrated network interfaces, appropriate for scalable multicores, that combine the best of two worlds – the flexibility of caches and the efficiency of scratchpad memories: on-chip SRAM is configurably shared among caching, scratchpad, and virtualized network interface (NI) functions. This paper presents our architecture, which provides local and remote scratchpad access, to either individual words or multiword blocks through RDMA copy. Furthermore, we introduce event responses, as a technique that enables software configurable communication and synchronization primitives. We present three event response mechanisms that expose NI functionality to software, for multiword transfer initiation, completion notifications for software selected sets of arbitrary size transfers, and multi-party synchronization queues. We implemented these mechanisms in a four-core FPGA prototype, and measure the logic overhead over a cache-only design for basic NI functionality to be less than 20%. We also evaluate the on-chip communication performance on the prototype, as well as the performance of synchronization functions with simulation of CMPs with up to 128 cores. We demonstrate efficient synchronization, low-overhead communication, and amortized-overhead bulk transfers, which allow parallelization gains for fine-grain tasks, and efficient exploitation of the hardware bandwidth.

Veja mais

Critical Path-Based Thread Placement for NUMA Systems

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Constructing optimal XOR-functions to minimize cache conflict misses

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Application-specific reconfigurable XOR-indexing to eliminate cache conflict misses

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Efficient profile-based evaluation of randomising set index functions for cache memories

Relevância:

20.00% 20.00%

Publicador:

Veja mais

A technique for high bandwidth and deterministic low latency load/store accesses to multiple cache banks

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Teacher Placement and Staffing

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Empowerment, quality of life and service satisfaction: comparisons between a hospital and community placement group

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Prefetching and Cache Management using Task Lifetimes

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Task-based dataflow programming models and runtimes emerge as promising candidates for programming multicore and manycore architectures. These programming models analyze dynamically task dependencies at runtime and schedule independent tasks concurrently to the processing elements. In such models, cache locality, which is critical for performance, becomes more challenging in the presence of fine-grain tasks, and in architectures with many simple cores.

This paper presents a combined hardware-software approach to improve cache locality and offer better performance is terms of execution time and energy in the memory system. We propose the explicit bulk prefetcher (EBP) and epoch-based cache management (ECM) to help runtimes prefetch task data and guide the replacement decisions in caches. The runtimem software can use this hardware support to expose its internal knowledge about the tasks to the architecture and achieve more efficient task-based execution. Our combined scheme outperforms HW-only prefetchers and state-of-the-art replacement policies, improves performance by an average of 17%, generates on average 26% fewer L2 misses, and consumes on average 28% less energy in the components of the memory system.

Veja mais