998 resultados para Processor architecture


Relevância:

100.00% 100.00%

Publicador:

Resumo:

An embedded architecture of optical vector matrix multiplier (OVMM) is presented. The embedded architecture is aimed at optimising the data flow of vector matrix multiplier (VMM) to promote its performance. Data dependence is discussed when the OVMM is connected to a cluster system. A simulator is built to analyse the performance according to the architecture. According to the simulation, Amdahl's law is used to analyse the hybrid opto-electronic system. It is found that the electronic part and its interaction with optical part form the bottleneck of system.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Message-Driven Processor is a node of a large-scale multiprocessor being developed by the Concurrent VLSI Architecture Group. It is intended to support fine-grained, message passing, parallel computation. It contains several novel architectural features, such as a low-latency network interface, extensive type-checking hardware, and on-chip memory that can be used as an associative lookup table. This document is a programmer's guide to the MDP. It describes the processor's register architecture, instruction set, and the data types supported by the processor. It also details the MDP's message sending and exception handling facilities.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An area-efficient high-throughput architecture based on distributed arithmetic is proposed for 3D discrete wavelet transform (DWT). The 3D DWT processor was designed in VHDL and mapped to a Xilinx Virtex-E FPGA. The processor runs up to 85 MHz, which can process the five-level DWT analysis of a 128 x 128 x 128 fMRI volume image in 20 ms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A novel power-efficient systolic array architecture is proposed for full search block matching (FSBM) motion estimation, where the partial distortion elimination algorithm is used to dynamically switch off the computation of eliminated partial candidate blocks. The RTL-level simulation shows that the proposed architecture can reduce the power consumption of the computation part of the algorithm to about 60% of that of the conventional 2D systolic arrays.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Current data-intensive image processing applications push traditional embedded architectures to their limits. FPGA based hardware acceleration is a potential solution but the programmability gap and time consuming HDL design flow is significant. The proposed research approach to develop “FPGA based programmable hardware acceleration platform” that uses, large number of Streaming Image processing Processors (SIPPro) potentially addresses these issues. SIPPro is pipelined in-order soft-core processor architecture with specific optimisations for image processing applications. Each SIPPro core uses 1 DSP48, 2 Block RAMs and 370 slice-registers, making the processor as compact as possible whilst maintaining flexibility and programmability. It is area efficient, scalable and high performance softcore architecture capable of delivering 530 MIPS per core using Xilinx Zynq SoC (ZC7Z020-3). To evaluate the feasibility of the proposed architecture, a Traffic Sign Recognition (TSR) algorithm has been prototyped on a Zedboard with the color and morphology operations accelerated using multiple SIPPros. Simulation and experimental results demonstrate that the processing platform is able to achieve a speedup of 15 and 33 times for color filtering and morphology operations respectively, with a significant reduced design effort and time.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A high-sample rate 3D median filtering processor architecture is proposed, based on a novel 3D median filtering algorithm, that can reduce the computing complexity in comparison with the traditional bubble sorting algorithm. A 3 x 3 x 3 filter processor is implemented in VHDL, and the simulation verifies that the processor can process a 128 x 128 x 96 MRI image in 0.03 seconds while running at 50 MHz.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper a novel scalable public-key processor architecture is presented that supports modular exponentiation and Elliptic Curve Cryptography over both prime GF(p) and binary GF(2) extension fields. This is achieved by a high performance instruction set that provides a comprehensive range of integer and polynomial basis field arithmetic. The instruction set and associated hardware are generic in nature and do not specifically support any cryptographic algorithms or protocols. Firmware within the device is used to efficiently implement complex and data intensive arithmetic. A firmware library has been developed in order to demonstrate support for numerous exponentiation and ECC approaches, such as different coordinate systems and integer recoding methods. The processor has been developed as a high-performance asymmetric cryptography platform in the form of a scalable Verilog RTL core. Various features of the processor may be scaled, such as the pipeline width and local memory subsystem, in order to suit area, speed and power requirements. The processor is evaluated and compares favourably with previous work in terms of performance while offering an unparalleled degree of flexibility. © 2006 IEEE.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A scheduling method for implementing a generic linear QR array processor architecture is presented. This improves on previous work. It also considerably simplifies the derivation of schedules for a folded linear system, where detailed account has to be taken of processor cell latency. The architecture and scheduling derived provide the basis of a generator for the rapid design of System-on-a-Chip (SoC) cores for QR decomposition.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Multi-core processors is a design philosophy that has become mainstream in scientific and engineering applications. Increasing performance and gate capacity of recent FPGA devices has permitted complex logic systems to be implemented on a single programmable device. By using VHDL here we present an implementation of one multi-core processor by using the PLASMA IP core based on the (most) MIPS I ISA and give an overview of the processor architecture and share theexecution results.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, a parallel-matching processor architecture with early jump-out (EJO) control is proposed to carry out high-speed biometric fingerprint database retrieval. The processor performs the fingerprint retrieval by using minutia point matching. An EJO method is applied to the proposed architecture to speed up the large database retrieval. The processor is implemented on a Xilinx Virtex-E, and occupies 6,825 slices and runs at up to 65 MHz. The software/hardware co-simulation benchmark with a database of 10,000 fingerprints verifies that the matching speed can achieve the rate of up to 1.22 million fingerprints per second. EJO results in about a 22% gain in computing efficiency.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A parallel processor architecture based on a communicating sequential processor chip, the transputer, is described. The architecture is easily linearly extensible to enable separate functions to be included in the controller. To demonstrate the power of the resulting controller some experimental results are presented comparing PID and full inverse dynamics on the first three joints of a Puma 560 robot. Also examined are some of the sample rate issues raised by the asynchronous updating of inertial parameters, and the need for full inverse dynamics at every sample interval is questioned.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Con este proyecto se pretende crear un procedimiento general para la implantación de aplicaciones de procesado de imágenes en cámaras de video IP y la distribución de dicha información mediante Arquitecturas Orientadas a Servicios (SOA). El objetivo principal es crear una aplicación que se ejecute en una cámara de video IP y realice un procesado básico sobre las imágenes capturadas (detección de colores, formas y patrones) permitiendo distribuir el resultado del procesado mediante las arquitecturas SOA descritas en la especificación DPWS (Device Profile for Web Services). El estudio se va a centrar principalmente en la transformación automática de código de procesado de imágenes escrito en Matlab (archivos .m) a un código C ANSI (archivos .c) que posteriormente se compilará para la arquitectura del procesador de la cámara (arquitectura CRIS, similar a la RISC pero con un conjunto reducido de instrucciones). ABSTRACT. This project aims to create a general procedure for the implementation of image processing applications in IP video cameras and the distribution of such information through Service Oriented Architectures (SOA). The main goal is to create an application that runs on IP video camera and carry out a basic processing on the captured images ( color detection, shapes and patterns) allowing to distribute the result of process by SOA architectures described in the DPWS specification (Device Profile for Web Services). The study will focus primarily on the automated transform of image processing code written in Matlab files (. M) to ANSI C code files (. C) which is then compiled to the processor architecture of the camera (CRIS architecture , similar to the RISC but with a reduced instruction set).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Paper submitted to Euromicro Symposium on Digital Systems Design (DSD), Belek-Antalya, Turkey, 2003.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Over the past few decades, we have been enjoying tremendous benefits thanks to the revolutionary advancement of computing systems, driven mainly by the remarkable semiconductor technology scaling and the increasingly complicated processor architecture. However, the exponentially increased transistor density has directly led to exponentially increased power consumption and dramatically elevated system temperature, which not only adversely impacts the system's cost, performance and reliability, but also increases the leakage and thus the overall power consumption. Today, the power and thermal issues have posed enormous challenges and threaten to slow down the continuous evolvement of computer technology. Effective power/thermal-aware design techniques are urgently demanded, at all design abstraction levels, from the circuit-level, the logic-level, to the architectural-level and the system-level. ^ In this dissertation, we present our research efforts to employ real-time scheduling techniques to solve the resource-constrained power/thermal-aware, design-optimization problems. In our research, we developed a set of simple yet accurate system-level models to capture the processor's thermal dynamic as well as the interdependency of leakage power consumption, temperature, and supply voltage. Based on these models, we investigated the fundamental principles in power/thermal-aware scheduling, and developed real-time scheduling techniques targeting at a variety of design objectives, including peak temperature minimization, overall energy reduction, and performance maximization. ^ The novelty of this work is that we integrate the cutting-edge research on power and thermal at the circuit and architectural-level into a set of accurate yet simplified system-level models, and are able to conduct system-level analysis and design based on these models. The theoretical study in this work serves as a solid foundation for the guidance of the power/thermal-aware scheduling algorithms development in practical computing systems.^

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Today's SoCs are complex designs with multiple embedded processors, memory subsystems, and application specific peripherals. The memory architecture of embedded SoCs strongly influences the power and performance of the entire system. Further, the memory subsystem constitutes a major part (typically up to 70%) of the silicon area for the current day SoC. In this article, we address the on-chip memory architecture exploration for DSP processors which are organized as multiple memory banks, where banks can be single/dual ported with non-uniform bank sizes. In this paper we propose two different methods for physical memory architecture exploration and identify the strengths and applicability of these methods in a systematic way. Both methods address the memory architecture exploration for a given target application by considering the application's data access characteristics and generates a set of Pareto-optimal design points that are interesting from a power, performance and VLSI area perspective. To the best of our knowledge, this is the first comprehensive work on memory space exploration at physical memory level that integrates data layout and memory exploration to address the system objectives from both hardware design and application software development perspective. Further we propose an automatic framework that explores the design space identifying 100's of Pareto-optimal design points within a few hours of running on a standard desktop configuration.