113 resultados para Armer, Chip
Resumo:
H.264 video standard achieves high quality video along with high data compression when compared to other existing video standards. H.264 uses context-based adaptive variable length coding (CAVLC) to code residual data in Baseline profile. In this paper we describe a novel architecture for CAVLC decoder including coeff-token decoder, level decoder total-zeros decoder and run-before decoder UMC library in 0.13 mu CMOS technology is used to synthesize the proposed design. The proposed design reduces chip area and improves critical path performance of CAVLC decoder in comparison with [1]. Macroblock level (including luma and chroma) pipeline processing for CAVLC is implemented with an average of 141 cycles (including pipeline buffering) per macroblock at 250MHz clock frequency. To compare our results with [1] clock frequency is constrained to 125MHz. The area required for the proposed architecture is 17586 gates, which is 22.1% improvement in comparison to [1]. We obtain a throughput of 1.73 * 10(6) macroblocks/second, which is 28% higher than that reported in [1]. The proposed design meets the processing requirement of 1080HD [5] video at 30frames/seconds.
Resumo:
Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Although clustering helps by improving clock speed, reducing energy consumption of the logic, and making the design simpler, it introduces extra overheads by way of inter-cluster communication. This communication happens over long global wires which leads to delay in execution and significantly high energy consumption.In this paper, we propose a new instruction scheduling algorithm that exploits scheduling slacks of instructions and communication slacks of data values together to achieve better energy-performance trade-offs for clustered architectures with heterogeneous interconnect. Our instruction scheduling algorithm achieves 35% and 40% reduction in communication energy, whereas the overall energy-delay product improves by 4.5% and 6.5% respectively for 2 cluster and 4 cluster machines with marginal increase (1.6% and 1.1%) in execution time. Our test bed uses the Trimaran compiler infrastructure.
Resumo:
Purpose: Limbal stem cell deficiency is a challenging clinical problem and the current treatment involves replenishing the depleted limbal stem cell (LSC) pool by either limbal tissue transplantation or use of cultivated limbal epithelial cells (LEC). Our experience of cultivating the LEC on denuded human amniotic membrane using a feeder cell free method, led to identification of mesenchymal cells of limbus (MC-L), which showed phenotypic resemblance to bone marrow derived mesenchymal stem cells (MSC-BM). To understand the transcriptional profile of these cells, microarray experiments were carried out.Methods: RNA was isolated from cultured LEC, MC-L and MSC-BM and microarray experiments were carried out by using Agilent chip (4x44 k). The microarray data was validated by using Realtime and semiquntitative reverse transcription polymerase chain reaction. Results: The microarray analysis revealed specific gene signature of LEC and MC-L, and also their complementary role related to cytokine and growth factor profile, thus supporting the nurturing roles of the MC-L. We have also observed similar and differential gene expression between MC-L and MSC-BM.Conclusions: This study represents the first extensive gene expression analysis of limbal explant culture derived epithelial and mesenchymal cells and as such reveals new insight into the biology, ontogeny, and in vivo function of these cells.
Resumo:
A new Schmitt trigger circuit based on the lambda bipolar transistor is presented. This circuit which exhibits a hysteresis in its transfer characteristic seems to use a smaller chip area than many of the circuits proposed so far.
Resumo:
The increasing variability in device leakage has made the design of keepers for wide OR structures a challenging task. The conventional feedback keepers (CONV) can no longer improve the performance of wide dynamic gates for the future technologies. In this paper, we propose an adaptive keeper technique called rate sensing keeper (RSK) that enables faster switching and tracks the variation across different process corners. It can switch upto 1.9x faster (for 20 legs) than CONV and can scale upto 32 legs as against 20 legs for CONV in a 130-nm 1.2-V process. The delay tracking is within 8% across the different process corners. We demonstrate the circuit operation of RSK using a 32 x 8 register file implemented in an industrial 130-nm 1.2-V CMOS process. The performance of individual dynamic logic gates are also evaluated on chip for various keeper techniques. We show that the RSK technique gives superior performance compared to the other alternatives such as Conditional Keeper (CKP) and current mirror-based keeper (LCR).
Resumo:
The system gain of two CCD systems in regular use at the Vainu Bappu Observatory, Kavalur, is determined at a few gain settings. The procedure used for the determination of system gain and base-level noise is described in detail. The Photometrics CCD system at the 1-m reflector uses a Thomson-CSF TH 7882 CDA chip coated for increased ultraviolet sensitivity. The gain is programme-selected through the parameter 'cgain' varying between 0 and 4095 in steps of 1. The inverse system gain for this system varies almost linearly from 27.7 electrons DN-1 at cgain = 0 to 1.5 electrons DN-1 at cgain = 500. The readout noise is less than or similar 11 electrons at cgain = 66. The Astromed CCD system at 2.3-m Vainu Bappu Telescope uses a GEC P8603 chip which is also coated for enhanced ultraviolet sensitivity. The amplifier gain is selected in discrete steps using switches in the controller. The inverse system gain is 4.15 electrons DN-1 at the gain setting of 9.2, and the readout noise approximately 8 electrons.
Resumo:
brusive Jet Machining (AJM) or Micro Blast Machining is a non-traditional machining process, wherein material removal is effected by the erosive action of a high velocity jet of a gas, carrying fine-grained abrasive particles, impacting the work surface. The AJM process differs from conventional sand blasting in that the abrasive is much finer and the process parameters and cutting action are carefully controlled. The process is particularly suitable to cut intricate shapes in hard and brittle materials which are sensitive to heat and have a tendency to chip easily. In other words, AJM can handle virtually any hard or brittle material. Already the process has found its ways Into dozens of applications; sometimes replacing conventional alternatives often doing jobs that could not be done in any other way. This paper reviews the current status of this non-conventional machining process and discusses the unique advantages and possible applications.
Resumo:
We consider a system comprising a finite number of nodes, with infinite packet buffers, that use unslotted ALOHA with Code Division Multiple Access (CDMA) to share a channel for transmitting packetised data. We propose a simple model for packet transmission and retransmission at each node, and show that saturation throughput in this model yields a sufficient condition for the stability of the packet buffers; we interpret this as the capacity of the access method. We calculate and compare the capacities of CDMA-ALOHA (with and without code sharing) and TDMA-ALOHA; we also consider carrier sensing and collision detection versions of these protocols. In each case, saturation throughput can be obtained via analysis pf a continuous time Markov chain. Our results show how saturation throughput degrades with code-sharing. Finally, we also present some simulation results for mean packet delay. Our work is motivated by optical CDMA in which "chips" can be optically generated, and hence the achievable chip rate can exceed the achievable TDMA bit rate which is limited by electronics. Code sharing may be useful in the optical CDMA context as it reduces the number of optical correlators at the receivers. Our throughput results help to quantify by how much the CDMA chip rate should exceed the TDMA bit rate so that CDMA-ALOHA yields better capacity than TDMA-ALOHA.
Resumo:
An in-situ power monitoring technique for Dynamic Voltage and Threshold scaling (DVTS) systems is proposed which measures total power consumed by load circuit using sleep transistor acting as power sensor. Design details of power monitor are examined using simulation framework in UMC 90nm CMOS process. Experimental results of test chip fabricated in AMS 0.35µm CMOS process are presented. The test chip has variable activity between 0.05 and 0.5 and has PMOS VTH control through nWell contact. Maximum resolution obtained from power monitor is 0.25mV. Overhead of power monitor in terms of its power consumption is 0.244 mW (2.2% of total power of load circuit). Lastly, power monitor is used to demonstrate closed loop DVTS system. DVTS algorithm shows 46.3% power savings using in-situ power monitor.
Resumo:
In this paper, a method of tracking the peak power in a wind energy conversion system (WECS) is proposed, which is independent of the turbine parameters and air density. The algorithm searches for the peak power by varying the speed in the desired direction. The generator is operated in the speed control mode with the speed reference being dynamically modified in accordance with the magnitude and direction of change of active power. The peak power points in the P-omega curve correspond to dP/domega = 0. This fact is made use of in the optimum point search algorithm. The generator considered is a wound rotor induction machine whose stator is connected directly to the grid and the rotor is fed through back-to-back pulse-width-modulation (PWM) converters. Stator flux-oriented vector control is applied to control the active and reactive current loops independently. The turbine characteristics are generated by a dc motor fed from a commercial dc drive. All of the control loops are executed by a single-chip digital signal processor (DSP) controller TMS320F240. Experimental results show that the performance of the control algorithm compares well with the conventional torque control method.
Resumo:
In this work, we evaluate performance of a real-world image processing application that uses a cross-correlation algorithm to compare a given image with a reference one. The algorithm processes individual images represented as 2-dimensional matrices of single-precision floating-point values using O(n4) operations involving dot-products and additions. We implement this algorithm on a nVidia GTX 285 GPU using CUDA, and also parallelize it for the Intel Xeon (Nehalem) and IBM Power7 processors, using both manual and automatic techniques. Pthreads and OpenMP with SSE and VSX vector intrinsics are used for the manually parallelized version, while a state-of-the-art optimization framework based on the polyhedral model is used for automatic compiler parallelization and optimization. The performance of this algorithm on the nVidia GPU suffers from: (1) a smaller shared memory, (2) unaligned device memory access patterns, (3) expensive atomic operations, and (4) weaker single-thread performance. On commodity multi-core processors, the application dataset is small enough to fit in caches, and when parallelized using a combination of task and short-vector data parallelism (via SSE/VSX) or through fully automatic optimization from the compiler, the application matches or beats the performance of the GPU version. The primary reasons for better multi-core performance include larger and faster caches, higher clock frequency, higher on-chip memory bandwidth, and better compiler optimization and support for parallelization. The best performing versions on the Power7, Nehalem, and GTX 285 run in 1.02s, 1.82s, and 1.75s, respectively. These results conclusively demonstrate that, under certain conditions, it is possible for a FLOP-intensive structured application running on a multi-core processor to match or even beat the performance of an equivalent GPU version.
Resumo:
Building flexible constraint length Viterbi decoders requires us to be able to realize de Bruijn networks of various sizes on the physically provided interconnection network. This paper considers the case when the physical network is itself a de Bruijn network and presents a scalable technique for realizing any n-node de Bruijn network on an N-node de Bruijn network, where n < N. The technique ensures that the length of the longest path realized on the network is minimized and that each physical connection is utilized to send only one data item, both of which are desirable in order to reduce the hardware complexity of the network and to obtain the best possible performance.
Resumo:
Today's feature-rich multimedia products require embedded system solution with complex System-on-Chip (SoC) to meet market expectations of high performance at a low cost and lower energy consumption. The memory architecture of the embedded system strongly influences critical system design objectives like area, power and performance. Hence the embedded system designer performs a complete memory architecture exploration to custom design a memory architecture for a given set of applications. Further, the designer would be interested in multiple optimal design points to address various market segments. However, tight time-to-market constraints enforces short design cycle time. In this paper we address the multi-level multi-objective memory architecture exploration problem through a combination of exhaustive-search based memory exploration at the outer level and a two step based integrated data layout for SPRAM-Cache based architectures at the inner level. We present a two step integrated approach for data layout for SPRAM-Cache based hybrid architectures with the first step as data-partitioning that partitions data between SPRAM and Cache, and the second step is the cache conscious data layout. We formulate the cache-conscious data layout as a graph partitioning problem and show that our approach gives up to 34% improvement over an existing approach and also optimizes the off-chip memory address space. We experimented our approach with 3 embedded multimedia applications and our approach explores several hundred memory configurations for each application, yielding several optimal design points in a few hours of computation on a standard desktop.
Resumo:
The work reported in this thesis is an attempt to enhance heat transfer in electronic devices with the use of impinging air jets on pin-finned heat sinks. The cooling per-formance of electronic devices has attracted increased attention owing to the demand of compact size, higher power densities and demands on system performance and re-liability. Although the technology of cooling has greatly advanced, the main cause of malfunction of the electronic devices remains overheating. The problem arises due to restriction of space and also due to high heat dissipation rates, which have increased from a fraction of a W/cm2to 100s of W /cm2. Although several researchers have at-tempted to address this at the design stage, unfortunately the speed of invention of cooling mechanism has not kept pace with the ever-increasing requirement of heat re- moval from electronic chips. As a result, efficient cooling of electronic chip remains a challenge in thermal engineering. Heat transfer can be enhanced by several ways like air cooling, liquid cooling, phase change cooling etc. However, in certain applications due to limitations on cost and weight, eg. air borne application, air cooling is imperative. The heat transfer can be increased by two ways. First, increasing the heat transfer coefficient (forced convec- tion), and second, increasing the surface area of heat transfer (finned heat sinks). From previous literature it was established that for a given volumetric air flow rate, jet im-pingement is the best option for enhancing heat transfer coefficient and for a given volume of heat sink material pin-finned heat sinks are the best option because of their high surface area to volume ratio. There are certain applications where very high jet velocities cannot be used because of limitations of noise and presence of delicate components. This process can further be improved by pulsating the jet. A steady jet often stabilizes the boundary layer on the surface to be cooled. Enhancement in the convective heat transfer can be achieved if the boundary layer is broken. Disruptions in the boundary layer can be caused by pulsating the impinging jet, i.e., making the jet unsteady. Besides, the pulsations lead to chaotic mixing, i.e., the fluid particles no more follow well defined streamlines but move unpredictably through the stagnation region. Thus the flow mimics turbulence at low Reynolds number. The pulsation should be done in such a way that the boundary layer can be disturbed periodically and yet adequate coolant is made available. So, that there is not much variation in temperature during one pulse cycle. From previous literature it was found that square waveform is most effective in enhancing heat transfer. In the present study the combined effect of pin-finned heat sink and impinging slot jet, both steady and unsteady, has been investigated for both laminar and turbulent flows. The effect of fin height and height of impingement has been studied. The jets have been pulsated in square waveform to study the effect of frequency and duty cycle. This thesis attempts to increase our understanding of the slot jet impingement on pin-finned heat sinks through numerical investigations. A systematic study is carried out using the finite-volume code FLUENT (Version 6.2) to solve the thermal and flow fields. The standard k-ε model for turbulence equations and two layer zonal model in wall function are used in the problem Pressure-velocity coupling is handled using the SIMPLE algorithm with a staggered grid. The parameters that affect the heat transfer coefficient are: height of the fins, total height of impingement, jet exit Reynolds number, frequency of the jet and duty cycle (percentage time the jet is flowing during one complete cycle of the pulse). From the studies carried out it was found that: a) beyond a certain height of the fin the rate of enhancement of heat transfer becomes very low with further increase in height, b) the heat transfer enhancement is much more sensitive to any changes at low Reynolds number than compared to high Reynolds number, c) for a given total height of impingement the use of fins and pulsated jet, increases the effective heat transfer coefficient by almost 200% for the same average Reynolds number, d) for all the cases it was observed that the optimum frequency of impingement is around 50 − 100 Hz and optimum duty cycle around 25-33.33%, e) in the case of turbulent jets the enhancement in heat transfer due to pulsations is very less compared to the enhancement in case of laminar jets.
Resumo:
Fault-tolerance is due to the semiconductor technology development important, not only for safety-critical systems but also for general-purpose (non-safety critical) systems. However, instead of guaranteeing that deadlines always are met, it is for general-purpose systems important to minimize the average execution time (AET) while ensuring fault-tolerance. For a given job and a soft (transient) error probability, we define mathematical formulas for AET that includes bus communication overhead for both voting (active replication) and rollback-recovery with checkpointing (RRC). And, for a given multi-processor system-on-chip (MPSoC), we define integer linear programming (ILP) models that minimize AET including bus communication overhead when: (1) selecting the number of checkpoints when using RRC, (2) finding the number of processors and job-to-processor assignment when using voting, and (3) defining fault-tolerance scheme (voting or RRC) per job and defining its usage for each job. Experiments demonstrate significant savings in AET.