117 resultados para Network on chip
em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast
Resumo:
Per-core scratchpad memories (or local stores) allow direct inter-core communication, with latency and energy advantages over coherent cache-based communication, especially as CMP architectures become more distributed. We have designed cache-integrated network interfaces, appropriate for scalable multicores, that combine the best of two worlds – the flexibility of caches and the efficiency of scratchpad memories: on-chip SRAM is configurably shared among caching, scratchpad, and virtualized network interface (NI) functions. This paper presents our architecture, which provides local and remote scratchpad access, to either individual words or multiword blocks through RDMA copy. Furthermore, we introduce event responses, as a technique that enables software configurable communication and synchronization primitives. We present three event response mechanisms that expose NI functionality to software, for multiword transfer initiation, completion notifications for software selected sets of arbitrary size transfers, and multi-party synchronization queues. We implemented these mechanisms in a four-core FPGA prototype, and measure the logic overhead over a cache-only design for basic NI functionality to be less than 20%. We also evaluate the on-chip communication performance on the prototype, as well as the performance of synchronization functions with simulation of CMPs with up to 128 cores. We demonstrate efficient synchronization, low-overhead communication, and amortized-overhead bulk transfers, which allow parallelization gains for fine-grain tasks, and efficient exploitation of the hardware bandwidth.
Resumo:
This paper presents a thorough investigation of the combined allocator design for Networks-on-Chip (NoC). Particularly, we discuss the interlock of the combined NoC allocator, which is caused by the lock mechanism of priority updating between the local and global arbiters. Architectures and implementations of three interlock-free combined allocators are presented in detail. Their cost, critical path, as well as network level performance are demonstrated based on 65-nm standard cell technology.
Resumo:
A methodology for the production of silicon cores for wavelet packet decomposition has been developed. The scheme utilizes efficient scalable architectures for both orthonormal and biorthogonal wavelet transforms. The cores produced from these architectures can be readily scaled for any wavelet function and are easily configurable for any subband structure. The cores are fully parameterized in terms of wavelet choice and appropriate wordlengths. Designs produced are portable across a range of silicon foundries as well as FPGA and PLD technologies. A number of exemplar implementations have been produced.
Resumo:
A novel Networks-on-Chip (NoC) router architecture specified for FPGA based implementation with configurable Virtual-Channel (VC) is presented. Each pipeline stage of the proposed architecture has been optimized so that low packet propagation latency and reduced hardware overhead can be achieved. The proposed architecture enables high performance and cost effective VC NoC based on-chip system interconnects to be deployed on FPGA.
Resumo:
In this study, we describe a simple and efficient method for on-chip storage of reagents for point-of-care (POC) diagnostics. The method is based on gelification of all reagents required for on-chip PCR-based diagnostics as a ready-to-use product. The result reported here is a key step towards the development of a ready and easy to use fully integrated Lab-on-a-chip (LOC) system for fast, cost-effective and efficient POC diagnostics analysis.
Resumo:
The end of Dennard scaling has pushed power consumption into a first order concern for current systems, on par with performance. As a result, near-threshold voltage computing (NTVC) has been proposed as a potential means to tackle the limited cooling capacity of CMOS technology. Hardware operating in NTV consumes significantly less power, at the cost of lower frequency, and thus reduced performance, as well as increased error rates. In this paper, we investigate if a low-power systems-on-chip, consisting of ARM's asymmetric big.LITTLE technology, can be an alternative to conventional high performance multicore processors in terms of power/energy in an unreliable scenario. For our study, we use the Conjugate Gradient solver, an algorithm representative of the computations performed by a large range of scientific and engineering codes.
Resumo:
A solvent-vapour thermoplastic bonding process is reported which provides high strength bonding of PMMA over a large area for multi-channel and multi-layer microfluidic devices with shallow high resolution channel features. The bond process utilises a low temperature vacuum thermal fusion step with prior exposure of the substrate to chloroform (CHCl3) vapour to reduce bond temperature to below the PMMA glass transition temperature. Peak tensile and shear bond strengths greater than 3 MPa were achieved for a typical channel depth reduction of 25 µm. The device-equivalent bond performance was evaluated for multiple layers and high resolution channel features using double-side and single-side exposure of the bonding pieces. A single-sided exposure process was achieved which is suited to multi-layer bonding with channel alignment at the expense of greater depth loss and a reduction in peak bond strength. However, leak and burst tests demonstrate bond integrity up to at least 10 bar channel pressure over the full substrate area of 100 mm x 100 mm. The inclusion of metal tracks within the bond resulted in no loss of performance. The vertical wall integrity between channels was found to be compromised by solvent permeation for wall thicknesses of 100 µm which has implications for high resolution serpentine structures. Bond strength is reduced considerably for multi-layer patterned substrates where features on each layer are not aligned, despite the presence of an intermediate blank substrate. Overall a high performance bond process has been developed that has the potential to meet the stringent specifications for lab-on-chip deployment in harsh environmental conditions for applications such as deep ocean profiling.
Resumo:
The end of Dennard scaling has promoted low power consumption into a firstorder concern for computing systems. However, conventional power conservation schemes such as voltage and frequency scaling are reaching their limits when used in performance-constrained environments. New technologies are required to break the power wall while sustaining performance on future processors. Low-power embedded processors and near-threshold voltage computing (NTVC) have been proposed as viable solutions to tackle the power wall in future computing systems. Unfortunately, these technologies may also compromise per-core performance and, in the case of NTVC, xreliability. These limitations would make them unsuitable for HPC systems and datacenters. In order to demonstrate that emerging low-power processing technologies can effectively replace conventional technologies, this study relies on ARM’s big.LITTLE processors as both an actual and emulation platform, and state-of-the-art implementations of the CG solver. For NTVC in particular, the paper describes how efficient algorithm-based fault tolerance schemes preserve the power and energy benefits of very low voltage operation.