Biblioteca Digital

957 resultados para Multiprocessor on chip

A new look at reversible logic implementation of decimal adder

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Reversibility plays a fundamental role when logic gates such as AND, OR, and XOR are not reversible. computations with minimal energy dissipation are considered. Hence, these gates dissipate heat and may reduce the life of In recent years, reversible logic has emerged as one of the most the circuit. So, reversible logic is in demand in power aware important approaches for power optimization with its circuits. application in low power CMOS, quantum computing and A reversible conventional BCD adder was proposed in using conventional reversible gates.

Reconfigurable Architectures for General-Purpose Computing

Relevância:

80.00% 80.00%

Publicador:

Resumo:

General-purpose computing devices allow us to (1) customize computation after fabrication and (2) conserve area by reusing expensive active circuitry for different functions in time. We define RP-space, a restricted domain of the general-purpose architectural space focussed on reconfigurable computing architectures. Two dominant features differentiate reconfigurable from special-purpose architectures and account for most of the area overhead associated with RP devices: (1) instructions which tell the device how to behave, and (2) flexible interconnect which supports task dependent dataflow between operations. We can characterize RP-space by the allocation and structure of these resources and compare the efficiencies of architectural points across broad application characteristics. Conventional FPGAs fall at one extreme end of this space and their efficiency ranges over two orders of magnitude across the space of application characteristics. Understanding RP-space and its consequences allows us to pick the best architecture for a task and to search for more robust design points in the space. Our DPGA, a fine- grained computing device which adds small, on-chip instruction memories to FPGAs is one such design point. For typical logic applications and finite- state machines, a DPGA can implement tasks in one-third the area of a traditional FPGA. TSFPGA, a variant of the DPGA which focuses on heavily time-switched interconnect, achieves circuit densities close to the DPGA, while reducing typical physical mapping times from hours to seconds. Rigid, fabrication-time organization of instruction resources significantly narrows the range of efficiency for conventional architectures. To avoid this performance brittleness, we developed MATRIX, the first architecture to defer the binding of instruction resources until run-time, allowing the application to organize resources according to its needs. Our focus MATRIX design point is based on an array of 8-bit ALU and register-file building blocks interconnected via a byte-wide network. With today's silicon, a single chip MATRIX array can deliver over 10 Gop/s (8-bit ops). On sample image processing tasks, we show that MATRIX yields 10-20x the computational density of conventional processors. Understanding the cost structure of RP-space helps us identify these intermediate architectural points and may provide useful insight more broadly in guiding our continual search for robust and efficient general-purpose computing structures.

Pullpipelining: A technique for systolic pipelined circuits

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Pullpipelining, a pipeline technique where data is pulled from successor stages from predecessor stages is proposed Control circuits using a synchronous, a semi-synchronous and an asynchronous approach are given. Simulation examples for a DLX generic RISC datapath show that common control pipeline circuit overhead is avoided using the proposal. Applications to linear systolic arrays in cases when computation is finished at early stages in the array are foreseen. This would allow run-time data-driven digital frequency modulation of synchronous pipelined designs. This has applications to implement algorithms exhibiting average-case processing time using a synchronous approach.

An efficient low power FFT implementation for multiband full-rate ultra-wideband (UWB) receivers

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper discusses the design, implementation and synthesis of an FFT module that has been specifically optimized for use in the OFDM based Multiband UWB system, although the work is generally applicable to many other OFDM based receiver systems. Previous work has detailed the requirements for the receiver FFT module within the Multiband UWB ODFM based system and this paper draws on those requirements coupled with modern digital architecture principles and low power design criteria to converge on our optimized solution. The FFT design obtained in this paper is also applicable for implementation of the transmitter IFFT module therefore only needing one FFT module for half-duplex operation. The results from this paper enable the baseband designers of the 200Mbit/sec variant of Multiband UWB systems (and indeed other OFDM based receivers) using System-on-Chip (SoC), FPGA and ASIC technology to create cost effective and low power solutions biased toward the competitive consumer electronics market.

A low clock frequency FFT core implementation for multiband full-rate ultra-wideband (UWB) receivers

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper discusses the design, implementation and synthesis of an FFT module that has been specifically optimized for use in the OFDM based Multiband UWB system, although the work is generally applicable to many other OFDM based receiver systems. Previous work has detailed the requirements for the receiver FFT module within the Multiband UWB ODFM based system and this paper draws on those requirements coupled with modern digital architecture principles and low power design criteria to converge on our optimized solution particularly aimed at a low-clock rate implementation. The FFT design obtained in this paper is also applicable for implementation of the transmitter IFFT module therefore only needing one FFT module in the device for half-duplex operation. The results from this paper enable the baseband designers of the 200Mbit/sec variant of Multiband UWB systems (and indeed other OFDM based receivers) using System-on-Chip (SoC), FPGA and ASIC technology to create cost effective and low power consumer electronics product solutions biased toward the very competitive market.

Active micromachined integrated terahertz circuits

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Schottky barrier diodes have been integrated into on-chip rectangular waveguides. Two novel techniques have been developed to fabricate diodes with posts suitable for integration into waveguides. One technique produces diodes with anode diameters of the order of microns with post heights from 90 to 125 microns and the second technique produces sub-micron anodes with post heights around 20 microns. A method has been developed to incorporate these structures into a rectangular waveguide and provide a top contact onto the anode which could be used as an I.F. output in a mixer circuit. Devices have been fabricated and D.C. characterized.

Technique for micro-machining millimetre-wave rectangular waveguide

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A new technique is reported for micro-machining millimetre-wave rectangular waveguide components. S-parameter measurements on these structures show that they achieve lower loss than those produced using any other on-chip fabrication technique, have highly accurate dimensions, are physically robust, and are cheap and easy to manufacture.

A Parallel Hardware Architecture for Scale and Rotation Invariant Feature Detection

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper proposes a parallel hardware architecture for image feature detection based on the Scale Invariant Feature Transform algorithm and applied to the Simultaneous Localization And Mapping problem. The work also proposes specific hardware optimizations considered fundamental to embed such a robotic control system on-a-chip. The proposed architecture is completely stand-alone; it reads the input data directly from a CMOS image sensor and provides the results via a field-programmable gate array coupled to an embedded processor. The results may either be used directly in an on-chip application or accessed through an Ethernet connection. The system is able to detect features up to 30 frames per second (320 x 240 pixels) and has accuracy similar to a PC-based implementation. The achieved system performance is at least one order of magnitude better than a PC-based solution, a result achieved by investigating the impact of several hardware-orientated optimizations oil performance, area and accuracy.

Projeto e implementação de uma plataforma MP-SoC usando SystemC

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This work presents the concept, design and implementation of a MP-SoC platform, named STORM (MP-SoC DirecTory-Based PlatfORM). Currently the platform is composed of the following modules: SPARC V8 processor, GPOP processor, Cache module, Memory module, Directory module and two different modles of Network-on-Chip, NoCX4 and Obese Tree. All modules were implemented using SystemC, simulated and validated, individually or in group. The modules description is presented in details. For programming the platform in C it was implemented a SPARC assembler, fully compatible with gcc s generated assembly code. For the parallel programming it was implemented a library for mutex managing, using the due assembler s support. A total of 10 simulations of increasing complexity are presented for the validation of the presented concepts. The simulations include real parallel applications, such as matrix multiplication, Mergesort, KMP, Motion Estimation and DCT 2D

Design of a reconfigurable pseudorandom number generator for use in intelligent systems

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper deals with the design of a network-on-chip reconfigurable pseudorandom number generation unit that can map and execute meta-heuristic algorithms in hardware. The unit can be configured to implement one of the following five linear generator algorithms: a multiplicative congruential, a mixed congruential, a standard multiple recursive, a mixed multiple recursive, and a multiply-with-carry. The generation unit can be used both as a pseudorandom and a message passing-based server, which is able to produce pseudorandom numbers on demand, sending them to the network-on-chip blocks that originate the service request. The generator architecture has been mapped to a field programmable gate array, and showed that millions of numbers in 32-, 64-, 96-, or 128-bit formats can be produced in tens of milliseconds. (C) 2011 Elsevier B.V. All rights reserved.

A petri net based timing model for hardware/software co-design of digital systems

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We have recently proposed an extension to Petri nets in order to be able to directly deal with all aspects of embedded digital systems. This extension is meant to be used as an internal model of our co-design environment. After analyzing relevant related work, and presenting a short introduction to our extension as a background material, we describe the details of the timing model we use in our approach, which is mainly based in Merlin's time model. We conclude the paper by discussing an example of its usage. © 2004 IEEE.

Approximation of hyperbolic tangent activation function using hybrid methods

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Artificial Neural Networks are widely used in various applications in engineering, as such solutions of nonlinear problems. The implementation of this technique in reconfigurable devices is a great challenge to researchers by several factors, such as floating point precision, nonlinear activation function, performance and area used in FPGA. The contribution of this work is the approximation of a nonlinear function used in ANN, the popular hyperbolic tangent activation function. The system architecture is composed of several scenarios that provide a tradeoff of performance, precision and area used in FPGA. The results are compared in different scenarios and with current literature on error analysis, area and system performance. © 2013 IEEE.

Identificação de perfis diferenciais de metilação do DNA na endometriose

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Efeito do fresamento com alta velocidade de corte na usinabilidade de aços ferríticos com grãos ultrafinos

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

A multi-objective adaptive immune algorithm for multi-application NoC mapping

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Current SoC design trends are characterized by the integration of larger amount of IPs targeting a wide range of application fields. Such multi-application systems are constrained by a set of requirements. In such scenario network-on-chips (NoC) are becoming more important as the on-chip communication structure. Designing an optimal NoC for satisfying the requirements of each individual application requires the specification of a large set of configuration parameters leading to a wide solution space. It has been shown that IP mapping is one of the most critical parameters in NoC design, strongly influencing the SoC performance. IP mapping has been solved for single application systems using single and multi-objective optimization algorithms. In this paper we propose the use of a multi-objective adaptive immune algorithm (M(2)AIA), an evolutionary approach to solve the multi-application NoC mapping problem. Latency and power consumption were adopted as the target multi-objective functions. To compare the efficiency of our approach, our results are compared with those of the genetic and branch and bound multi-objective mapping algorithms. We tested 11 well-known benchmarks, including random and real applications, and combines up to 8 applications at the same SoC. The experimental results showed that the M(2)AIA decreases in average the power consumption and the latency 27.3 and 42.1 % compared to the branch and bound approach and 29.3 and 36.1 % over the genetic approach.

«
1
2
...
7
8
9
10
11
12
13
...
63
64
»