42 resultados para embedded Systems
em Indian Institute of Science - Bangalore - Índia
Resumo:
The memory subsystem is a major contributor to the performance, power, and area of complex SoCs used in feature rich multimedia products. Hence, memory architecture of the embedded DSP is complex and usually custom designed with multiple banks of single-ported or dual ported on-chip scratch pad memory and multiple banks of off-chip memory. Building software for such large complex memories with many of the software components as individually optimized software IPs is a big challenge. In order to obtain good performance and a reduction in memory stalls, the data buffers of the application need to be placed carefully in different types of memory. In this paper we present a unified framework (MODLEX) that combines different data layout optimizations to address the complex DSP memory architectures. Our method models the data layout problem as multi-objective genetic algorithm (GA) with performance and power being the objectives and presents a set of solution points which is attractive from a platform design viewpoint. While most of the work in the literature assumes that performance and power are non-conflicting objectives, our work demonstrates that there is significant trade-off (up to 70%) that is possible between power and performance.
Resumo:
Today's feature-rich multimedia products require embedded system solution with complex System-on-Chip (SoC) to meet market expectations of high performance at a low cost and lower energy consumption. The memory architecture of the embedded system strongly influences these parameters. Hence the embedded system designer performs a complete memory architecture exploration. This problem is a multi-objective optimization problem and can be tackled as a two-level optimization problem. The outer level explores various memory architecture while the inner level explores placement of data sections (data layout problem) to minimize memory stalls. Further, the designer would be interested in multiple optimal design points to address various market segments. However, tight time-to-market constraints enforces short design cycle time. In this paper we address the multi-level multi-objective memory architecture exploration problem through a combination of Multi-objective Genetic Algorithm (Memory Architecture exploration) and an efficient heuristic data placement algorithm. At the outer level the memory architecture exploration is done by picking memory modules directly from a ASIC memory Library. This helps in performing the memory architecture exploration in a integrated framework, where the memory allocation, memory exploration and data layout works in a tightly coupled way to yield optimal design points with respect to area, power and performance. We experimented our approach for 3 embedded applications and our approach explores several thousand memory architecture for each application, yielding a few hundred optimal design points in a few hours of computation time on a standard desktop.
Resumo:
This paper presents a low energy memory decoder architecture for ultra-low-voltage systems containing multiple voltage domains. Due to limitations in scalability of memory supply voltages, these systems typically contain a core operating at subthreshold voltages and memories operating at a higher voltage. This difference in voltage provides a timing slack on the memory path as the core supply is scaled. The paper analyzes the feasibility and trade-offs in utilizing this timing slack to operate a greater section of memory decoder circuitry at the lower supply. A 256x16-bit SRAM interface has been designed in UMC 65nm low-leakage process to evaluate the above technique with the core and memory operating at 280 mV and 500 mV respectively. The technique provides a reduction of up to 20% in energy/cycle of the row decoder without any penalty in area and system-delay.
Resumo:
As the conventional MOSFET's scaling is approaching the limit imposed by short channel effects, Double Gate (DG) MOS transistors are appearing as the most feasible candidate in terms of technology in sub-45nm technology nodes. As the short channel effect in DG transistor is controlled by the device geometry, undoped or lightly doped body is used to sustain the channel. There exits a disparity in threshold voltage calculation criteria of undoped-body symmetric double gate transistors which uses two definitions, one is potential based and the another is charge based definition. In this paper, a novel concept of "crossover point'' is introduced, which proves that the charge-based definition is more accurate than the potential based definition.The change in threshold voltage with body thickness variation for a fixed channel length is anomalous as predicted by potential based definition while it is monotonous for charge based definition.The threshold voltage is then extracted from drain currant versus gate voltage characteristics using linear extrapolation and "Third Derivative of Drain-Source Current'' method or simply "TD'' method. The trend of threshold voltage variation is found same in both the cases which support charge-based definition.
Resumo:
As the conventional MOSFETs scaling is approaching the limit imposed by short channel effects, Double Gate (DG) MOS transistors are appearing as the most feasible andidate in terms of technology in sub-45nm technology nodes. As the short channel effect in DG transistor is controlled by the device geometry, undoped or lightly doped body, is used to sustain the channel. There exits a disparity in threshold voltage calculation criteria of undoped-body symmetric double gate transistors which uses two definitions, one is potential based and the another is charge based definition. In this paper, a novel concept of "crossover point" is introduced, which proves that the charge-based definition is more accurate than the potential based definition. The change in threshold voltage with body thickness variation for a fixed channel length is anomalous as predicted by, potential based definition while it is monotonous for change based definition. The threshold voltage is then extracted from drain currant versus gate voltage characteristics using linear extrapolation and "Third Derivative of Drain-Source Current" method or simply "TD" method. The trend of threshold voltage variation is found some in both the cases which support charge-based definition.
Resumo:
In this paper, we present Dynamic Voltage and Frequency Managed 256 x 64 SRAM block in 65nm technology, for frequency ranging from 100MHz to 1GHz. The total energy is minimized for any operating frequency in the above range and leakage energy is minimized during standby mode. Since noise margin of SRAM cell deteriorates at low voltages, we propose Static Noise Margin improvement circuitry, which symmetrizes the SRAM cell by controlling the body bias of pull down NMOS transistor. We used a 9T SRAM cell that isolates Read and Hold Noise Margin and has less leakage. We have implemented an efficient technique of pushing address decoder into zigzag-super-cut-off in stand-by mode without affecting its performance in active mode of operation. The Read Bit Line (RBL) voltage drop is controlled and pre-charge of bit lines is done only when needed for reducing power wastage.
Resumo:
In this work, for the first time, we present a physically based analytical threshold voltage model for omega gate silicon nanowire transistor. This model is developed for long channel cylindrical body structure. The potential distribution at each and every point of the of the wire is derived with a closed form solution of two dimensional Poisson's equation, which is then used to model the threshold voltage. Proposed model can be treated as a generalized model, which is valid for both surround gate and semi-surround gate cylindrical transistors. The accuracy of proposed model is verified for different device geometry against the results obtained from three dimensional numerical device simulators and close agreement is observed.
Resumo:
This paper presents design of a Low power 256x72 bit TCAM in 0.13um CMOS technology. In contrast to conventional Match line (ML) sensing scheme in which equal power is consumed irrespective of match or mismatch, the ML scheme employed in this design allocates less power to match decisions involving a large number of mismatched bits. Typically, the probability of mismatch is high so this scheme results in significant CAM power reduction. We propose to use this technique along with pipelining of search operation in which the MLs are broken into several segments. Since most words fail to match in first segment, the search operation for subsequent segments is discontinued, resulting in further reduction in power consumption. The above architecture provides 70% power reduction while performing search in 3ns.
Resumo:
In this paper the static noise margin for SET (single electron transistor) logic is defined and compact models for the noise margin are developed by making use of the MIB (Mahapatra-Ionescu-Banerjee) model. The variation of the noise margin with temperature and background charge is also studied. A chain of SET inverters is simulated to validate the definition of various logic levels (like VIH, VOH, etc.) and noise margin. Finally the noise immunity of SET logic is compared with current CMOS logic.
Resumo:
Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Although clustering helps by improving clock speed, reducing energy consumption of the logic, and making the design simpler, it introduces extra overheads by way of inter-cluster communication. This communication happens over long global wires which leads to delay in execution and significantly high energy consumption.In this paper, we propose a new instruction scheduling algorithm that exploits scheduling slacks of instructions and communication slacks of data values together to achieve better energy-performance trade-offs for clustered architectures with heterogeneous interconnect. Our instruction scheduling algorithm achieves 35% and 40% reduction in communication energy, whereas the overall energy-delay product improves by 4.5% and 6.5% respectively for 2 cluster and 4 cluster machines with marginal increase (1.6% and 1.1%) in execution time. Our test bed uses the Trimaran compiler infrastructure.
Resumo:
We propose a compact model for small signal non quasi static analysis of long channel symmetric double gate MOSFET The model is based on the EKV formalism and is valid in all regions of operation and thus suitable for RF circuit design Proposed model is verified with professional numerical device simulator and excellent agreement is found well beyond the cut-off frequency
Resumo:
Conventional Random access scan (RAS) for testing has lower test application time, low power dissipation, and low test data volume compared to standard serial scan chain based design In this paper, we present two cluster based techniques, namely, Serial Input Random Access Scan and Variable Word Length Random Access Scan to reduce test application time even further by exploiting the parallelism among the clusters and performing write operations on multiple bits Experimental results on benchmarks circuits show on an average 2-3 times speed up in test write time and average 60% reduction in write test data volume
Resumo:
Sensor network nodes exhibit characteristics of both embedded systems and general-purpose systems.A sensor network operating system is a kind of embedded operating system, but unlike a typical embedded operating system, sensor network operatin g system may not be real time, and is constrained by memory and energy constraints. Most sensor network operating systems are based on event-driven approach. Event-driven approach is efficient in terms of time and space.Also this approach does not require a separate stack for each execution context. But using this model, it is difficult to implement long running tasks, like cryptographic operations. A thread based computation requires a separate stack for each execution context, and is less efficient in terms of time and space. In this paper, we propose a thread based execution model that uses only a fixed number of stacks. In this execution model, the number of stacks at each priority level are fixed. It minimizes the stack requirement for multi-threading environment and at the same time provides ease of programming. We give an implementation of this model in Contiki OS by separating thread implementation from protothread implementation completely. We have tested our OS by implementing a clock synchronization protocol using it.
Resumo:
In this paper we explore an implementation of a high-throughput, streaming application on REDEFINE-v2, which is an enhancement of REDEFINE. REDEFINE is a polymorphic ASIC combining the flexibility of a programmable solution with the execution speed of an ASIC. In REDEFINE Compute Elements are arranged in an 8x8 grid connected via a Network on Chip (NoC) called RECONNECT, to realize the various macrofunctional blocks of an equivalent ASIC. For a 1024-FFT we carry out an application-architecture design space exploration by examining the various characterizations of Compute Elements in terms of the size of the instruction store. We further study the impact by using application specific, vectorized FUs. By setting up different partitions of the FFT algorithm for persistent execution on REDEFINE-v2, we derive the benefits of setting up pipelined execution for higher performance. The impact of the REDEFINE-v2 micro-architecture for any arbitrary N-point FFT (N > 4096) FFT is also analyzed. We report the various algorithm-architecture tradeoffs in terms of area and execution speed with that of an ASIC implementation. In addition we compare the performance gain with respect to a GPP.
Resumo:
A low-power frequency multiplication technique, developed for ZigBee (IEEE 802.15.4) like applications is presented. We have provided an estimate for the power consumption for a given output voltage swing using our technique. The advantages and disadvantages which determine the application areas of the technique are discussed. The issues related to design, layout and process variation are also addressed. Finally, a design is presented for operation in 2.405-2.485-GHz band of ZigBee receiver. SpectreRF simulations show 30% improvement in efficiency for our circuit with regard to conversion of DC bias current to output amplitude, against a LC-VCO. To establish the low-power credentials, we have compared our circuit with an existing technique; our circuit performs better with just 1/3 of total current from supply, and uses one inductor as against three in the latter case. A test chip was implemented in UMC 0.13-mum RF process with spiral on-chip inductors and MIM (metal-insulator-metal) capacitor option.