Biblioteca Digital

140 resultados para Hardware

REDEFINE: Architecture of a SoC Fabric for Runtime Composition of Computation Structures

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we propose the architecture of a SoC fabric onto which applications described in a HLL are synthesized. The fabric is a homogeneous layout of computation, storage and communication resources on silicon. Through a process of composition of resources (as opposed to decomposition of applications), application specific computational structures are defined on the fabric at runtime to realize different modules of the applications in hardware. Applications synthesized on this fabric offers performance comparable to ASICs while retaining the programmability of processing cores. We outline the application synthesis methodology through examples, and compare our results with software implementations on traditional platforms with unbounded resources.

MEMS based pressure sensor with triple modular redundancy

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, the design and development of micro electro mechanical systems (MEMS) based pressure sensor with triple modular redundancy (TMR) for space applications has been presented. In order to minimize the mass of the system and also to avoid the uncertainty in the pressure measurement of the three independent hardware, an integrated approach with TMR is adopted. Sequential steps of TMR logic followed and the test results obtained are included.

Total Factor Productivity Growth and Output Growth in Indian Electronics Industry in the Liberalization Era: An Empirical Examination

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper analyses the efficiency and productivity growth of Electronics industry, which is considered one of the vibrant and rapidly growing manufacturing industry sub-sectors of India in the liberalization era since 1991. The main objective of the paper is to examine the extent and growth of Total Factor Productivity (TFP) and its components namely, Technical Efficiency Change (TEC) and Technological Progress (TP) and its contribution to total output growth. In this study, the electronics industry is broadly classified into communication equipments, computer hardware, consumer electronics and other electronics, with the purpose of performing a comparative analysis of productivity growth for each of these sub-sectors for the time period 1993-2004. The paper found that the sub-sectors have improved in terms of economies of scale and contribution of capital.The change in technical efficiency and technological progress moved in reverse directions. Three of the four industry witnessed growth in the output primarily due to TFPG and the contribution of input growth to output growth had been negative/negligible, except for Computer hardware where contribution from both input growth and TFPG to output growth were prominent. The paper explored the possible reasons that addressed the issue of low technical efficiency and technological progress in the industry.

VLSI Implementation of Spatial prediction Based Image Compression Scheme

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We propose the design and implementation of hardware architecture for spatial prediction based image compression scheme, which consists of prediction phase and quantization phase. In prediction phase, the hierarchical tree structure obtained from the test image is used to predict every central pixel of an image by its four neighboring pixels. The prediction scheme generates an error image, to which the wavelet/sub-band coding algorithm can be applied to obtain efficient compression. The software model is tested for its performance in terms of entropy, standard deviation. The memory and silicon area constraints play a vital role in the realization of the hardware for hand-held devices. The hardware architecture is constructed for the proposed scheme, which involves the aspects of parallelism in instructions and data. The processor consists of pipelined functional units to obtain the maximum throughput and higher speed of operation. The hardware model is analyzed for performance in terms throughput, speed and power. The results of hardware model indicate that the proposed architecture is suitable for power constrained implementations with higher data rate

Compiler Assisted Leakage Energy Optimization for Clustered VLIW Architectures

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Miniaturization of devices and the ensuing decrease in the threshold voltage has led to a substantial increase in the leakage component of the total processor energy consumption. Relatively simpler issue logic and the presence of a large number of function units in the VLIW and the clustered VLIW architectures attribute a large fraction of this leakage energy consumption in the functional units. However, functional units are not fully utilized in the VLIW architectures because of the inherent variations in the ILP of the programs. This underutilization is even more pronounced in the context of clustered VLIW architectures because of the contentions for the limited number of slow intercluster communication channels which lead to many short idle cycles.In the past, some architectural schemes have been proposed to obtain leakage energy bene .ts by aggressively exploiting the idleness of functional units. However, presence of many short idle cycles cause frequent transitions from the active mode to the sleep mode and vice-versa and adversely a ffects the energy benefits of a purely hardware based scheme. In this paper, we propose and evaluate a compiler instruction scheduling algorithm that assist such a hardware based scheme in the context of VLIW and clustered VLIW architectures. The proposed scheme exploits the scheduling slacks of instructions to orchestrate the functional unit mapping with the objective of reducing the number of transitions in functional units thereby keeping them off for a longer duration. The proposed compiler-assisted scheme obtains a further 12% reduction of energy consumption of functional units with negligible performance degradation over a hardware-only scheme for a VLIW architecture. The benefits are 15% and 17% in the context of a 2-clustered and a 4-clustered VLIW architecture respectively. Our test bed uses the Trimaran compiler infrastructure.

Design of a constrained high data rate CDMA system

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper deals with the design of a high data rate code-division multiple-access (CDMA) system under a speci��ed jamming mar- gin speci��cation as well as hardware and band-width limitations. Several choices had to be made in coming up with the design such as specify-ing the number of subcarriers, choice of spread-ing codes and the nature of the modulation.The rationale behind each of the choices made is given. Descriptions of transmitter and receiver are also included. Relevant simulations of cross-correlation are also provided.

Speed and Area Optimized Implementation of H.264 8X8 DCT Transform and Quantizer

Relevância:

10.00% 10.00%

Publicador:

Resumo:

H.264 is a video codec standard which delivers high resolution video even at low bit rates. To provide high throughput at low bit rates hardware implementations are essential. In this paper, we propose hardware implementations for speed and area optimized DCT and quantizer modules. To target above criteria we propose two architectures. First architecture is speed optimized which gives a high throughput and can meet requirements of 4096x2304 frame at 30 frames/sec. Second architecture is area optimized and occupies 2009 LUTs in Altera��s stratix-II and can meet the requirements of 1080HD at 30 frames/sec.

Accelerating Numerical Linear Algebra Kernels on a Scalable Run Time Reconfigurable Platform

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Numerical Linear Algebra (NLA) kernels are at the heart of all computational problems. These kernels require hardware acceleration for increased throughput. NLA Solvers for dense and sparse matrices differ in the way the matrices are stored and operated upon although they exhibit similar computational properties. While ASIC solutions for NLA Solvers can deliver high performance, they are not scalable, and hence are not commercially viable. In this paper, we show how NLA kernels can be accelerated on REDEFINE, a scalable runtime reconfigurable hardware platform. Compared to a software implementation, Direct Solver (Modified Faddeev's algorithm) on REDEFINE shows a 29X improvement on an average and Iterative Solver (Conjugate Gradient algorithm) shows a 15-20% improvement. We further show that solution on REDEFINE is scalable over larger problem sizes without any notable degradation in performance.

Design space exploration of systolic realization of QR factorization on a runtime reconfigurable platform

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In the world of high performance computing huge efforts have been put to accelerate Numerical Linear Algebra (NLA) kernels like QR Decomposition (QRD) with the added advantage of reconfigurability and scalability. While popular custom hardware solution in form of systolic arrays can deliver high performance, they are not scalable, and hence not commercially viable. In this paper, we show how systolic solutions of QRD can be realized efficiently on REDEFINE, a scalable runtime reconfigurable hardware platform. We propose various enhancements to REDEFINE to meet the custom need of accelerating NLA kernels. We further do the design space exploration of the proposed solution for any arbitrary application of size n �� n. We determine the right size of the sub-array in accordance with the optimal pipeline depth of the core execution units and the number of such units to be used per sub-array.

A simple and fast scheme for code compression for VLIW processors

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Summary form only given. A scheme for code compression that has a fast decompression algorithm, which can be implemented using simple hardware, is proposed. The effectiveness of the scheme on the TMS320C62x architecture that includes the overheads of a line address table (LAT) is evaluated and obtained compression rates ranging from 70% to 80%. Two schemes for decompression are proposed. The basic idea underlying the scheme is a simple clustering algorithm that partially maps a block of instructions into a set of clusters. The clustering algorithm is a greedy algorithm based on the frequency of occurrence of various instructions.

On-Chip Memory Architecture Exploration Framework for DSP Processor-Based Embedded System on Chip

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Today's SoCs are complex designs with multiple embedded processors, memory subsystems, and application specific peripherals. The memory architecture of embedded SoCs strongly influences the power and performance of the entire system. Further, the memory subsystem constitutes a major part (typically up to 70%) of the silicon area for the current day SoC. In this article, we address the on-chip memory architecture exploration for DSP processors which are organized as multiple memory banks, where banks can be single/dual ported with non-uniform bank sizes. In this paper we propose two different methods for physical memory architecture exploration and identify the strengths and applicability of these methods in a systematic way. Both methods address the memory architecture exploration for a given target application by considering the application's data access characteristics and generates a set of Pareto-optimal design points that are interesting from a power, performance and VLSI area perspective. To the best of our knowledge, this is the first comprehensive work on memory space exploration at physical memory level that integrates data layout and memory exploration to address the system objectives from both hardware design and application software development perspective. Further we propose an automatic framework that explores the design space identifying 100's of Pareto-optimal design points within a few hours of running on a standard desktop configuration.

Differential Diffractive Reflectance Modulation Sensing

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We fabricated a reflectance based sensor which relies on the diffraction pattern generated from a bio-microarray where an underlying thin film structure enhances the diffracted intensity from molecular layers. The zero order diffraction represents the background signal and the higher orders represent the phase difference between the array elements and the background. By taking the differential ratio of the first and zero order diffraction signals we get a quantitative measure of molecular binding while simultaneously rejecting common mode fluctuations. We improved the signal-to-noise ratio by an order of magnitude with this ratiometric approach compared to conventional single channel detection. In addition, we use a lithography based approach for fabricating microarrays which results in spot sizes as small as 5 micron diameter unlike the 100 micron spots from inkjet printing and is therefore capable of a high degree of multiplexing. We will describe the real-time measurement of adsorption of charged polymers and bulk refractometry using this technique. The lack of moving parts for point scanning of the microarray and the differential ratiometric measurements using diffracted orders from the same probe beam allows us to make real-time measurements in spite of noise arising from thermal or mechanical fluctuations in the fluid sample above the sensor surface. Further, the lack of moving parts leads to considerable simplification in the readout hardware permitting the use of this technique in compact point of care sensors.

Receive Antenna Selection for Time-Varying Channels Using Discrete Prolate Spheroidal Sequences

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Receive antenna selection (AS) has been shown to maintain the diversity benefits of multiple antennas while potentially reducing hardware costs. However, the promised diversity gains of receive AS depend on the assumptions of perfect channel knowledge at the receiver and slowly time-varying fading. By explicitly accounting for practical constraints imposed by the next-generation wireless standards such as training, packetization and antenna switching time, we propose a single receive AS method for time-varying fading channels. The method exploits the low training overhead and accuracy possible from the use of discrete prolate spheroidal (DPS) sequences based reduced rank subspace projection techniques. It only requires knowledge of the Doppler bandwidth, and does not require detailed correlation knowledge. Closed-form expressions for the channel prediction and estimation error as well as symbol error probability (SEP) of M-ary phase-shift keying (MPSK) for symbol-by-symbol receive AS are also derived. It is shown that the proposed AS scheme, after accounting for the practical limitations mentioned above, outperforms the ideal conventional single-input single-output (SISO) system with perfect CSI and no AS at the receiver and AS with conventional estimation based on complex exponential basis functions.

Probabilistic Shared Cache Management (PriSM)

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Effective sharing of the last level cache has a significant influence on the overall performance of a multicore system. We observe that existing solutions control cache occupancy at a coarser granularity, do not scale well to large core counts and in some cases lack the flexibility to support a variety of performance goals. In this paper, we propose Probabilistic Shared Cache Management (PriSM), a framework to manage the cache occupancy of different cores at cache block granularity by controlling their eviction probabilities. The proposed framework requires only simple hardware changes to implement, can scale to larger core count and is flexible enough to support a variety of performance goals. We demonstrate the flexibility of PriSM, by computing the eviction probabilities needed to achieve goals like hit-maximization, fairness and QOS. PriSM-HitMax improves performance by 18.7% over LRU and 11.8% over previously proposed schemes in a sixteen core machine. PriSM-Fairness improves fairness over existing solutions by 23.3% along with a performance improvement of 19.0%. PriSM-QOS successfully achieves the desired QOS targets.

Evaluation of dynamic voltage and frequency scaling for stream programs

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Dynamic Voltage and Frequency Scaling (DVFS) offers a huge potential for designing trade-offs involving energy, power, temperature and performance of computing systems. In this paper, we evaluate three different DVFS schemes - our enhancement of a Petri net performance model based DVFS method for sequential programs to stream programs, a simple profile based Linear Scaling method, and an existing hardware based DVFS method for multithreaded applications - using multithreaded stream applications, in a full system Chip Multiprocessor (CMP) simulator. From our evaluation, we find that the software based methods achieve significant Energy/Throughput2(ET��2) improvements. The hardware based scheme degrades performance heavily and suffers ET��2 loss. Our results indicate that the simple profile based scheme achieves the benefits of the complex Petri net based scheme for stream programs, and present a strong case for the need for independent voltage/frequency control for different cores of CMPs, which is lacking in most of the state-of-the-art CMPs. This is in contrast to the conclusions of a recent evaluation of per-core DVFS schemes for multithreaded applications for CMPs.

«
1
2
3
4
5
6
7
8
9
10
»