992 resultados para Hardware reconfigurable


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Reconfigurable platforms are a promising technology that offers an interesting trade-off between flexibility and performance, which many recent embedded system applications demand, especially in fields such as multimedia processing. These applications typically involve multiple ad-hoc tasks for hardware acceleration, which are usually represented using formalisms such as Data Flow Diagrams (DFDs), Data Flow Graphs (DFGs), Control and Data Flow Graphs (CDFGs) or Petri Nets. However, none of these models is able to capture at the same time the pipeline behavior between tasks (that therefore can coexist in order to minimize the application execution time), their communication patterns, and their data dependencies. This paper proves that the knowledge of all this information can be effectively exploited to reduce the resource requirements and the timing performance of modern reconfigurable systems, where a set of hardware accelerators is used to support the computation. For this purpose, this paper proposes a novel task representation model, named Temporal Constrained Data Flow Diagram (TCDFD), which includes all this information. This paper also presents a mapping-scheduling algorithm that is able to take advantage of the new TCDFD model. It aims at minimizing the dynamic reconfiguration overhead while meeting the communication requirements among the tasks. Experimental results show that the presented approach achieves up to 75% of resources saving and up to 89% of reconfiguration overhead reduction with respect to other state-of-the-art techniques for reconfigurable platforms.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Reconfigurable hardware can be used to build a multitasking system where tasks are assigned to HW resources at run-time according to the requirements of the running applications. These tasks are frequently represented as direct acyclic graphs and their execution is typically controlled by an embedded processor that schedules the graph execution. In order to improve the efficiency of the system, the scheduler can apply prefetch and reuse techniques that can greatly reduce the reconfiguration latencies. For an embedded processor all these computations represent a heavy computational load that can significantly reduce the system performance. To overcome this problem we have implemented a HW scheduler using reconfigurable resources. In addition we have implemented both prefetch and replacement techniques that obtain as good results as previous complex SW approaches, while demanding just a few clock cycles to carry out the computations. We consider that the HW cost of the system (in our experiments 3% of a Virtex-II PRO xc2vp30 FPGA) is affordable taking into account the great efficiency of the techniques applied to hide the reconfiguration latency and the negligible run-time penalty introduced by the scheduler computations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Weblabs are spreading their influence in Science and Engineering (S&E) courses providing a way to remotely conduct real experiments. Typically, they are implemented by different architectures and infrastructures supported by Instruments and Modules (I&Ms) able to be remotely controlled and observed. Besides the inexistence of a standard solution for implementing weblabs, their reconfiguration is limited to a setup procedure that enables interconnecting a set of preselected I&Ms into an Experiment Under Test (EUT). Moreover, those I&Ms are not able to be replicated or shared by different weblab infrastructures, since they are usually based on hardware platforms. Thus, to overcome these limitations, this paper proposes a standard solution that uses I&Ms embedded into Field-Programmable Gate Array (FPGAs) devices. It is presented an architecture based on the IEEE1451.0 Std. supported by a FPGA-based weblab infrastructure able to be remotely reconfigured with I&Ms, described through standard Hardware Description Language (HDL) files, using a Reconfiguration Tool (RecTool).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Institutions have been creating their own specific weblab infrastructures. Usually, they use distinct software and hardware architectures comprehending instruments and modules (I&M) able to be parameterized but difficult to be shared. These aspects are impairing their widespread in education, since collaboration between institutions, in developing and sharing resources, is still low. To handle both aspects, this paper proposes the adoption of the IEEE1451.0 Std. with FPGA technology for creating reconfigurable weblab infrastructures. It is suggested the adoption of an IEEE1451.0 infrastructure with compatible instruments, described in Hardware Description Languages (HDL), to be reconfigured in FPGA-based boards. Besides an overview of the IEEE1451.0 Std., this paper presents a solution currently under development which seeks to enable the reconfiguration and the remote control of weblab infrastructures using a set of IEEE1451.0 HTTP commands.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dissertação de mestrado integrado em Engenharia Electrónica Industrial e Computadores

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this work, the feasibility of the floating-gate technology in analog computing platforms in a scaled down general-purpose CMOS technology is considered. When the technology is scaled down the performance of analog circuits tends to get worse because the process parameters are optimized for digital transistors and the scaling involves the reduction of supply voltages. Generally, the challenge in analog circuit design is that all salient design metrics such as power, area, bandwidth and accuracy are interrelated. Furthermore, poor flexibility, i.e. lack of reconfigurability, the reuse of IP etc., can be considered the most severe weakness of analog hardware. On this account, digital calibration schemes are often required for improved performance or yield enhancement, whereas high flexibility/reconfigurability can not be easily achieved. Here, it is discussed whether it is possible to work around these obstacles by using floating-gate transistors (FGTs), and analyze problems associated with the practical implementation. FGT technology is attractive because it is electrically programmable and also features a charge-based built-in non-volatile memory. Apart from being ideal for canceling the circuit non-idealities due to process variations, the FGTs can also be used as computational or adaptive elements in analog circuits. The nominal gate oxide thickness in the deep sub-micron (DSM) processes is too thin to support robust charge retention and consequently the FGT becomes leaky. In principle, non-leaky FGTs can be implemented in a scaled down process without any special masks by using “double”-oxide transistors intended for providing devices that operate with higher supply voltages than general purpose devices. However, in practice the technology scaling poses several challenges which are addressed in this thesis. To provide a sufficiently wide-ranging survey, six prototype chips with varying complexity were implemented in four different DSM process nodes and investigated from this perspective. The focus is on non-leaky FGTs, but the presented autozeroing floating-gate amplifier (AFGA) demonstrates that leaky FGTs may also find a use. The simplest test structures contain only a few transistors, whereas the most complex experimental chip is an implementation of a spiking neural network (SNN) which comprises thousands of active and passive devices. More precisely, it is a fully connected (256 FGT synapses) two-layer spiking neural network (SNN), where the adaptive properties of FGT are taken advantage of. A compact realization of Spike Timing Dependent Plasticity (STDP) within the SNN is one of the key contributions of this thesis. Finally, the considerations in this thesis extend beyond CMOS to emerging nanodevices. To this end, one promising emerging nanoscale circuit element - memristor - is reviewed and its applicability for analog processing is considered. Furthermore, it is discussed how the FGT technology can be used to prototype computation paradigms compatible with these emerging two-terminal nanoscale devices in a mature and widely available CMOS technology.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The memory hierarchy is the main bottleneck in modern computer systems as the gap between the speed of the processor and the memory continues to grow larger. The situation in embedded systems is even worse. The memory hierarchy consumes a large amount of chip area and energy, which are precious resources in embedded systems. Moreover, embedded systems have multiple design objectives such as performance, energy consumption, and area, etc. Customizing the memory hierarchy for specific applications is a very important way to take full advantage of limited resources to maximize the performance. However, the traditional custom memory hierarchy design methodologies are phase-ordered. They separate the application optimization from the memory hierarchy architecture design, which tend to result in local-optimal solutions. In traditional Hardware-Software co-design methodologies, much of the work has focused on utilizing reconfigurable logic to partition the computation. However, utilizing reconfigurable logic to perform the memory hierarchy design is seldom addressed. In this paper, we propose a new framework for designing memory hierarchy for embedded systems. The framework will take advantage of the flexible reconfigurable logic to customize the memory hierarchy for specific applications. It combines the application optimization and memory hierarchy design together to obtain a global-optimal solution. Using the framework, we performed a case study to design a new software-controlled instruction memory that showed promising potential.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We advocate the use of systolic design techniques to create custom hardware for Custom Computing Machines. We have developed a hardware genetic algorithm based on systolic arrays to illustrate the feasibility of the approach. The architecture is independent of the lengths of chromosomes used and can be scaled in size to accommodate different population sizes. An FPGA prototype design can process 16 million genes per second.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In academia, it is common to create didactic processors, facing practical disciplines in the area of Hardware Computer and can be used as subjects in software platforms, operating systems and compilers. Often, these processors are described without ISA standard, which requires the creation of compilers and other basic software to provide the hardware / software interface and hinder their integration with other processors and devices. Using reconfigurable devices described in a HDL language allows the creation or modification of any microarchitecture component, leading to alteration of the functional units of data path processor as well as the state machine that implements the control unit even as new needs arise. In particular, processors RISP enable modification of machine instructions, allowing entering or modifying instructions, and may even adapt to a new architecture. This work, as the object of study addressing educational soft-core processors described in VHDL, from a proposed methodology and its application on two processors with different complexity levels, shows that it s possible to tailor processors for a standard ISA without causing an increase in the level hardware complexity, ie without significant increase in chip area, while its level of performance in the application execution remains unchanged or is enhanced. The implementations also allow us to say that besides being possible to replace the architecture of a processor without changing its organization, RISP processor can switch between different instruction sets, which can be expanded to toggle between different ISAs, allowing a single processor become adaptive hybrid architecture, which can be used in embedded systems and heterogeneous multiprocessor environments

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Blind Source Separation (BSS) refers to the problem of estimate original signals from observed linear mixtures with no knowledge about the sources or the mixing process. Independent Component Analysis (ICA) is a technique mainly applied to BSS problem and from the algorithms that implement this technique, FastICA is a high performance iterative algorithm of low computacional cost that uses nongaussianity measures based on high order statistics to estimate the original sources. The great number of applications where ICA has been found useful reects the need of the implementation of this technique in hardware and the natural paralelism of FastICA favors the implementation of this algorithm on digital hardware. This work proposes the implementation of FastICA on a reconfigurable hardware platform for the viability of it's use in blind source separation problems, more specifically in a hardware prototype embedded in a Field Programmable Gate Array (FPGA) board for the monitoring of beds in hospital environments. The implementations will be carried out by Simulink models and it's synthesizing will be done through the DSP Builder software from Altera Corporation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work proposes hardware architecture, VHDL described, developed to embedded Artificial Neural Network (ANN), Multilayer Perceptron (MLP). The present work idealizes that, in this architecture, ANN applications could easily embed several different topologies of MLP network industrial field. The MLP topology in which the architecture can be configured is defined by a simple and specifically data input (instructions) that determines the layers and Perceptron quantity of the network. In order to set several MLP topologies, many components (datapath) and a controller were developed to execute these instructions. Thus, an user defines a group of previously known instructions which determine ANN characteristics. The system will guarantee the MLP execution through the neural processors (Perceptrons), the components of datapath and the controller that were developed. In other way, the biases and the weights must be static, the ANN that will be embedded must had been trained previously, in off-line way. The knowledge of system internal characteristics and the VHDL language by the user are not needed. The reconfigurable FPGA device was used to implement, simulate and test all the system, allowing application in several real daily problems

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Motion estimation is the main responsible for data reduction in digital video encoding. It is also the most computational damanding step. H.264 is the newest standard for video compression and was planned to double the compression ratio achievied by previous standards. It was developed by the ITU-T Video Coding Experts Group (VCEG) together with the ISO/IEC Moving Picture Experts Group (MPEG) as the product of a partnership effort known as the Joint Video Team (JVT). H.264 presents novelties that improve the motion estimation efficiency, such as the adoption of variable block-size, quarter pixel precision and multiple reference frames. This work defines an architecture for motion estimation in hardware/software, using a full search algorithm, variable block-size and mode decision. This work consider the use of reconfigurable devices, soft-processors and development tools for embedded systems such as Quartus II, SOPC Builder, Nios II and ModelSim

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper deals with the design of a network-on-chip reconfigurable pseudorandom number generation unit that can map and execute meta-heuristic algorithms in hardware. The unit can be configured to implement one of the following five linear generator algorithms: a multiplicative congruential, a mixed congruential, a standard multiple recursive, a mixed multiple recursive, and a multiply-with-carry. The generation unit can be used both as a pseudorandom and a message passing-based server, which is able to produce pseudorandom numbers on demand, sending them to the network-on-chip blocks that originate the service request. The generator architecture has been mapped to a field programmable gate array, and showed that millions of numbers in 32-, 64-, 96-, or 128-bit formats can be produced in tens of milliseconds. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Uma arquitetura reconfigurável e multiprocessada para a implementação física de Redes de Petri foi desenvolvida em VHDL e mapeada sobre um FPGA. Convencionalmente, as Redes de Petri são transformadas em uma linguagem de descrição de hardware no nível de transferências entre registradores e um processo de síntese de alto nível é utilizado para gerar as funções booleanas e tabelas de transição de estado para que se possa, finalmente, mapeá-las num FPGA (Morris et al., 2000) (Soto and Pereira, 2001). A arquitetura proposta possui blocos lógicos reconfiguráveis desenvolvidos exclusivamente para a implementação dos lugares e das transições da rede, não sendo necessária a descrição da rede em níveis de abstração intermediários e nem a utilização de um processo de síntese para realizar o mapeamento da rede na arquitetura. A arquitetura permite o mapeamento de modelos de Redes de Petri com diferenciação entre as marcas e associação de tempo no disparo das transições, sendo composta por um arranjo de processadores reconfiguráveis, cada um dos quais representando o comportamento de uma transição da Rede de Petri a ser mapeada e por um sistema de comunicação, implementado por um conjunto de roteadores que são capazes de enviar pacotes de dados de um processador reconfigurável a outro. A arquitetura proposta foi validada num FPGA de 10.570 elementos lógicos com uma topologia que permitiu a implementação de Redes de Petri de até 9 transições e 36 lugares, atingindo uma latência de 15,4ns e uma vazão de até 17,12GB/s com uma freqüência de operação de 64,58MHz.