991 resultados para Hardware implementations


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we investigate various algorithms for performing Fast Fourier Transformation (FFT)/Inverse Fast Fourier Transformation (IFFT), and proper techniques for maximizing the FFT/IFFT execution speed, such as pipelining or parallel processing, and use of memory structures with pre-computed values (look up tables -LUT) or other dedicated hardware components (usually multipliers). Furthermore, we discuss the optimal hardware architectures that best apply to various FFT/IFFT algorithms, along with their abilities to exploit parallel processing with minimal data dependences of the FFT/IFFT calculations. An interesting approach that is also considered in this paper is the application of the integrated processing-in-memory Intelligent RAM (IRAM) chip to high speed FFT/IFFT computing. The results of the assessment study emphasize that the execution speed of the FFT/IFFT algorithms is tightly connected to the capabilities of the FFT/IFFT hardware to support the provided parallelism of the given algorithm. Therefore, we suggest that the basic Discrete Fourier Transform (DFT)/Inverse Discrete Fourier Transform (IDFT) can also provide high performances, by utilizing a specialized FFT/IFFT hardware architecture that can exploit the provided parallelism of the DFT/IDF operations. The proposed improvements include simplified multiplications over symbols given in polar coordinate system, using sinе and cosine look up tables, and an approach for performing parallel addition of N input symbols.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we investigate various algorithms for performing Fast Fourier Transformation (FFT)/Inverse Fast Fourier Transformation (IFFT), and proper techniquesfor maximizing the FFT/IFFT execution speed, such as pipelining or parallel processing, and use of memory structures with pre-computed values (look up tables -LUT) or other dedicated hardware components (usually multipliers). Furthermore, we discuss the optimal hardware architectures that best apply to various FFT/IFFT algorithms, along with their abilities to exploit parallel processing with minimal data dependences of the FFT/IFFT calculations. An interesting approach that is also considered in this paper is the application of the integrated processing-in-memory Intelligent RAM (IRAM) chip to high speed FFT/IFFT computing. The results of the assessment study emphasize that the execution speed of the FFT/IFFT algorithms is tightly connected to the capabilities of the FFT/IFFT hardware to support the provided parallelism of the given algorithm. Therefore, we suggest that the basic Discrete Fourier Transform (DFT)/Inverse Discrete Fourier Transform (IDFT) can also provide high performances, by utilizing a specialized FFT/IFFT hardware architecture that can exploit the provided parallelism of the DFT/IDF operations. The proposed improvements include simplified multiplications over symbols given in polar coordinate system, using sinе and cosine look up tables,and an approach for performing parallel addition of N input symbols.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Various digital watermarking (WM) techniques for still imaging have been studied in the last several years. Recently, many new WM schemes have been proposed for other types of digital multimedia data, such as text, audio and video. This paper presents a brief overview of existing digital video WM. We classify WM techniques and discuss the properties of video WM. Since each WM application has its own specific requirements, WM design must take the intended application into consideration. Video WM applications are also discussed in the paper. The features of video WM implementations in software and hardware and their differences are presented through the description of four examples of existing work.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The design of full programmable type-2 membership function circuit is presented in this paper. This circuit is used to implement the fuzzifier block of Type-2 Fuzzy Logic Controller chip. In this paper the type-2 fuzzy set was obtained by blurring the width of the type-1 fuzzy set. This circuit allows programming the height and the shape of the membership function. It operates in current mode, with supply voltage of 3.3V. The simulation results of interval type-2 membership function circuit have been done in CMOS 0.35μm technology using Mentor Graphics software. © 2011 IEEE.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper presents an up to date review of digital watermarking (WM) from a VLSI designer point of view. The reader is introduced to basic principles and terms in the field of image watermarking. It goes through a brief survey on WM theory, laying out common classification criterions and discussing important design considerations and trade-offs. Elementary WM properties such as robustness, computational complexity and their influence on image quality are discussed. Common attacks and testing benchmarks are also briefly mentioned. It is shown that WM design must take the intended application into account. The difference between software and hardware implementations is explained through the introduction of a general scheme of a WM system and two examples from previous works. A versatile methodology to aid in a reliable and modular design process is suggested. Relating to mixed-signal VLSI design and testing, the proposed methodology allows an efficient development of a CMOS image sensor with WM capabilities.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The constant increase in digital systems complexity definitely demands the automation of the corresponding synthesis process. This paper presents a computational environment designed to produce both software and hardware implementations of a system. The tool for code generation has been named ACG8051. As for the hardware synthesis there has been produced a larger environment consisting of four programs, namely: PIPE2TAB, AGPS, TABELA, and TAB2VHDL. ACG8051 and PIPE2TAB use place/transition net descriptions from PIPE as inputs. ACG8051 is aimed at generating assembly code for the 8051 micro-controller. PIPE2TAB produces a tabular version of a Mealy type finite state machine of the system, its output is fed into AGPS that is used for state allocation. The resulting digital system is then input to TABELA, which minimizes control functions and outputs of the digital system. Finally, the output generated by TABELA is fed to TAB2VHDL that produces a VHDL description of the system at the register transfer level. Thus, we present here a set of tools designed to take a high-level description of a digital system, represented by a place/transition net, and produces as output both an assembly code that can be immediately run on an 8051 micro-controller, and a VHDL description that can be used to directly implement the hardware parts either on an FPGA or as an ASIC.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper proposes a solution to improve the performance of the first order Early Error Sensing (EES) Adaptive Time Delay Tanlock Loops (ATDTL) presented in (Al-Zaabi, Al-Qutayri e Al-Araji, 2005), regarding to frequency estimation and tracking time. The EES-ATDTL are phaselocked-loops (PLL) used to hardware implementations, due to their simple structure. Fixed-points theorems are used to determine conditions for rapid convergence of the estimation process and a estimative of the frecuency input is obtained with a Gaussian filter that improves the gain adaptation. The mathematical models are the presented by (Al-Araji, Al-Qutayri e Al-Zaabi, 2006). Simulations have been performed to evaluate the theoretical results.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Within the membrane computing research field, there are many papers about software simulations and a few about hardware implementations. In both cases, algorithms for implementing membrane systems in software and hardware that try to take advantages of massive parallelism are implemented. P-systems are parallel and non deterministic systems which simulate membranes behavior when processing information. This paper presents software techniques based on the proper utilization of virtual memory of a computer. There is a study of how much virtual memory is necessary to host a membrane model. This method improves performance in terms of time.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Information reconciliation is a crucial procedure in the classical post-processing of quantum key distribution (QKD). Poor reconciliation e?ciency, revealing more information than strictly needed, may compromise the maximum attainable distance, while poor performance of the algorithm limits the practical throughput in a QKD device. Historically, reconciliation has been mainly done using close to minimal information disclosure but heavily interactive procedures, like Cascade, or using less e?cient but also less interactive ?just one message is exchanged? procedures, like the ones based in low-density parity-check (LDPC) codes. The price to pay in the LDPC case is that good e?ciency is only attained for very long codes and in a very narrow range centered around the quantum bit error rate (QBER) that the code was designed to reconcile, thus forcing to have several codes if a broad range of QBER needs to be catered for. Real world implementations of these methods are thus very demanding, either on computational or communication resources or both, to the extent that the last generation of GHz clocked QKD systems are ?nding a bottleneck in the classical part. In order to produce compact, high performance and reliable QKD systems it would be highly desirable to remove these problems. Here we analyse the use of short-length LDPC codes in the information reconciliation context using a low interactivity, blind, protocol that avoids an a priori error rate estimation. We demonstrate that 2×103 bits length LDPC codes are suitable for blind reconciliation. Such codes are of high interest in practice, since they can be used for hardware implementations with very high throughput.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Esta tesis presenta un novedoso marco de referencia para el análisis y optimización del retardo de codificación y descodificación para vídeo multivista. El objetivo de este marco de referencia es proporcionar una metodología sistemática para el análisis del retardo en codificadores y descodificadores multivista y herramientas útiles en el diseño de codificadores/descodificadores para aplicaciones con requisitos de bajo retardo. El marco de referencia propuesto caracteriza primero los elementos que tienen influencia en el comportamiento del retardo: i) la estructura de predicción multivista, ii) el modelo hardware del codificador/descodificador y iii) los tiempos de proceso de cuadro. En segundo lugar, proporciona algoritmos para el cálculo del retardo de codificación/ descodificación de cualquier estructura arbitraria de predicción multivista. El núcleo de este marco de referencia consiste en una metodología para el análisis del retardo de codificación/descodificación multivista que es independiente de la arquitectura hardware del codificador/descodificador, completada con un conjunto de modelos que particularizan este análisis del retardo con las características de la arquitectura hardware del codificador/descodificador. Entre estos modelos, aquellos basados en teoría de grafos adquieren especial relevancia debido a su capacidad de desacoplar la influencia de los diferentes elementos en el comportamiento del retardo en el codificador/ descodificador, mediante una abstracción de su capacidad de proceso. Para revelar las posibles aplicaciones de este marco de referencia, esta tesis presenta algunos ejemplos de su utilización en problemas de diseño que afectan a codificadores y descodificadores multivista. Este escenario de aplicación cubre los siguientes casos: estrategias para el diseño de estructuras de predicción que tengan en consideración requisitos de retardo además del comportamiento tasa-distorsión; diseño del número de procesadores y análisis de los requisitos de velocidad de proceso en codificadores/ descodificadores multivista dado un retardo objetivo; y el análisis comparativo del comportamiento del retardo en codificadores multivista con diferentes capacidades de proceso e implementaciones hardware. ABSTRACT This thesis presents a novel framework for the analysis and optimization of the encoding and decoding delay for multiview video. The objective of this framework is to provide a systematic methodology for the analysis of the delay in multiview encoders and decoders and useful tools in the design of multiview encoders/decoders for applications with low delay requirements. The proposed framework characterizes firstly the elements that have an influence in the delay performance: i) the multiview prediction structure ii) the hardware model of the encoder/decoder and iii) frame processing times. Secondly, it provides algorithms for the computation of the encoding/decoding delay of any arbitrary multiview prediction structure. The core of this framework consists in a methodology for the analysis of the multiview encoding/decoding delay that is independent of the hardware architecture of the encoder/decoder, which is completed with a set of models that particularize this delay analysis with the characteristics of the hardware architecture of the encoder/decoder. Among these models, the ones based in graph theory acquire special relevance due to their capacity to detach the influence of the different elements in the delay performance of the encoder/decoder, by means of an abstraction of its processing capacity. To reveal possible applications of this framework, this thesis presents some examples of its utilization in design problems that affect multiview encoders and decoders. This application scenario covers the following cases: strategies for the design of prediction structures that take into consideration delay requirements in addition to the rate-distortion performance; design of number of processors and analysis of processor speed requirements in multiview encoders/decoders given a target delay; and comparative analysis of the encoding delay performance of multiview encoders with different processing capabilities and hardware implementations.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Efficient hardware implementations of arithmetic operations in the Galois field are highly desirable for several applications, such as coding theory, computer algebra and cryptography. Among these operations, multiplication is of special interest because it is considered the most important building block. Therefore, high-speed algorithms and hardware architectures for computing multiplication are highly required. In this paper, bit-parallel polynomial basis multipliers over the binary field GF(2(m)) generated using type II irreducible pentanomials are considered. The multiplier here presented has the lowest time complexity known to date for similar multipliers based on this type of irreducible pentanomials.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Conferência: 39th Annual Conference of the IEEE Industrial-Electronics-Society (IECON), Vienna, Austria, Nov 10-14, 2013

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This Thesis has the main target to make a research about FPAA/dpASPs devices and technologies applied to control systems. These devices provide easy way to emulate analog circuits that can be reconfigurable by programming tools from manufactures and in case of dpASPs are able to be dynamically reconfigurable on the fly. It is described different kinds of technologies commercially available and also academic projects from researcher groups. These technologies are very recent and are in ramp up development to achieve a level of flexibility and integration to penetrate more easily the market. As occurs with CPLD/FPGAs, the FPAA/dpASPs technologies have the target to increase the productivity, reducing the development time and make easier future hardware reconfigurations reducing the costs. FPAA/dpAsps still have some limitations comparing with the classic analog circuits due to lower working frequencies and emulation of complex circuits that require more components inside the integrated circuit. However, they have great advantages in sensor signal condition, filter circuits and control systems. This thesis focuses practical implementations of these technologies to control system PID controllers. The result of the experiments confirms the efficacy of FPAA/dpASPs on signal condition and control systems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The furious pace of Moore's Law is driving computer architecture into a realm where the the speed of light is the dominant factor in system latencies. The number of clock cycles to span a chip are increasing, while the number of bits that can be accessed within a clock cycle is decreasing. Hence, it is becoming more difficult to hide latency. One alternative solution is to reduce latency by migrating threads and data, but the overhead of existing implementations has previously made migration an unserviceable solution so far. I present an architecture, implementation, and mechanisms that reduces the overhead of migration to the point where migration is a viable supplement to other latency hiding mechanisms, such as multithreading. The architecture is abstract, and presents programmers with a simple, uniform fine-grained multithreaded parallel programming model with implicit memory management. In other words, the spatial nature and implementation details (such as the number of processors) of a parallel machine are entirely hidden from the programmer. Compiler writers are encouraged to devise programming languages for the machine that guide a programmer to express their ideas in terms of objects, since objects exhibit an inherent physical locality of data and code. The machine implementation can then leverage this locality to automatically distribute data and threads across the physical machine by using a set of high performance migration mechanisms. An implementation of this architecture could migrate a null thread in 66 cycles -- over a factor of 1000 improvement over previous work. Performance also scales well; the time required to move a typical thread is only 4 to 5 times that of a null thread. Data migration performance is similar, and scales linearly with data block size. Since the performance of the migration mechanism is on par with that of an L2 cache, the implementation simulated in my work has no data caches and relies instead on multithreading and the migration mechanism to hide and reduce access latencies.