Biblioteca Digital

17 resultados para Floating point

em Repositório Científico do Instituto Politécnico de Lisboa - Portugal

Efficient Implementation Of A Single-Precision Floating-Point Arithmetic Unit on FPGA

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a single precision floating point arithmetic unit with support for multiplication, addition, fused multiply-add, reciprocal, square-root and inverse squareroot with high-performance and low resource usage. The design uses a piecewise 2nd order polynomial approximation to implement reciprocal, square-root and inverse square-root. The unit can be configured with any number of operations and is capable to calculate any function with a throughput of one operation per cycle. The floatingpoint multiplier of the unit is also used to implement the polynomial approximation and the fused multiply-add operation. We have compared our implementation with other state-of-the-art proposals, including the Xilinx Core-Gen operators, and conclude that the approach has a high relative performance/area efficiency. © 2014 Technical University of Munich (TUM).

Optimal design of piezolaminated structures using B-spline strip finite element models

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A package of B-spline finite strip models is developed for the linear analysis of piezolaminated plates and shells. This package is associated to a global optimization technique in order to enhance the performance of these types of structures, subjected to various types of objective functions and/or constraints, with discrete and continuous design variables. The models considered are based on a higher-order displacement field and one can apply them to the static, free vibration and buckling analyses of laminated adaptive structures with arbitrary lay-ups, loading and boundary conditions. Genetic algorithms, with either binary or floating point encoding of design variables, were considered to find optimal locations of piezoelectric actuators as well as to determine the best voltages applied to them in order to obtain a desired structure shape. These models provide an overall economy of computing effort for static and vibration problems.

Receptor MIMO em FPGA baseado no esquema de Alamouti

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A oferta de serviços baseados em comunicações sem fios tem vindo a crescer exponencialmente na última década. Cada vez mais são exigidas maiores taxas de transmissão assim como uma melhor QoS, sem comprometer a potência de transmissão ou argura de banda disponível. A tecnologia MIMO consegue oferecer um aumento da capacidade destes sistemas sem requerer aumento da largura de banda ou da potência transmitida. O trabalho desenvolvido nesta dissertação consistiu no estudo dos sistemas MIMO, caracterizados pela utilização de múltiplas antenas para transmitir e receber a informação. Com um sistema deste tipo consegue-se obter um ganho de diversidade espacial utilizando códigos espaço-temporais, que exploram simultaneamente o domínio espacial e o domínio do tempo. Nesta dissertação é dado especial ênfase à codificação por blocos no espaço-tempo de Alamouti, a qual será implementada em FPGA, nomeadamente a parte de recepção. Esta implementação é efectuada para uma configuração de antenas 2x1, utilizando vírgula flutuante e para três tipos de modulação: BPSK, QPSK e 16-QAM. Por fim será analisada a relação entre a precisão alcançada na representação numérica dos resultados e os recursos consumidos pela FPGA. Com a arquitectura adoptada conseguem se obter taxas de transferência na ordem dos 29,141 Msimb/s (sem pipelines) a 262,674 Msimb/s (com pipelines), para a modulação BPSK.

An Efficient Long Distance Echo Canceller

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper describes an implementation of a long distance echo canceller, operating on full-duplex with hands-free and in real-time with a single Digital Signal Processor (DSP). The proposed solution is based on short length adaptive filters centered on the positions of the most significant echoes, which are tracked by time delay estimators, for which we use a new approach. To deal with double talking situations a speech detector is employed. The floating-point DSP TMS320C6713 from Texas Instruments is used with software written in C++, with compiler optimizations for fast execution. The resulting algorithm enables long distance echo cancellation with low computational requirements, suited for embbeded systems. It reaches greater echo return loss enhancement and shows faster convergence speed when compared to the conventional approach. The experimental results approach the CCITT G.165 recommendation levels.

Implementação em hardware reconfigurável de método de separação de dados hiperespetrais

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Relatório do Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia de Electrónica e Telecomunicações

Sistema de reconhecimento de impressões digitais baseado em FPGA

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Trabalho de Projeto para obtenção do grau de Mestre em Engenharia de Eletrónica e Telecomunicações

Trends Of CPU, GPU and FPGA for high-performance computing

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Floating-point computing with more than one TFLOP of peak performance is already a reality in recent Field-Programmable Gate Arrays (FPGA). General-Purpose Graphics Processing Units (GPGPU) and recent many-core CPUs have also taken advantage of the recent technological innovations in integrated circuit (IC) design and had also dramatically improved their peak performances. In this paper, we compare the trends of these computing architectures for high-performance computing and survey these platforms in the execution of algorithms belonging to different scientific application domains. Trends in peak performance, power consumption and sustained performances, for particular applications, show that FPGAs are increasing the gap to GPUs and many-core CPUs moving them away from high-performance computing with intensive floating-point calculations. FPGAs become competitive for custom floating-point or fixed-point representations, for smaller input sizes of certain algorithms, for combinational logic problems and parallel map-reduce problems. © 2014 Technical University of Munich (TUM).

Algorithm-oriented design of efficient many-core architectures applied to dense matrix multiplication

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Recent integrated circuit technologies have opened the possibility to design parallel architectures with hundreds of cores on a single chip. The design space of these parallel architectures is huge with many architectural options. Exploring the design space gets even more difficult if, beyond performance and area, we also consider extra metrics like performance and area efficiency, where the designer tries to design the architecture with the best performance per chip area and the best sustainable performance. In this paper we present an algorithm-oriented approach to design a many-core architecture. Instead of doing the design space exploration of the many core architecture based on the experimental execution results of a particular benchmark of algorithms, our approach is to make a formal analysis of the algorithms considering the main architectural aspects and to determine how each particular architectural aspect is related to the performance of the architecture when running an algorithm or set of algorithms. The architectural aspects considered include the number of cores, the local memory available in each core, the communication bandwidth between the many-core architecture and the external memory and the memory hierarchy. To exemplify the approach we did a theoretical analysis of a dense matrix multiplication algorithm and determined an equation that relates the number of execution cycles with the architectural parameters. Based on this equation a many-core architecture has been designed. The results obtained indicate that a 100 mm(2) integrated circuit design of the proposed architecture, using a 65 nm technology, is able to achieve 464 GFLOPs (double precision floating-point) for a memory bandwidth of 16 GB/s. This corresponds to a performance efficiency of 71 %. Considering a 45 nm technology, a 100 mm(2) chip attains 833 GFLOPs which corresponds to 84 % of peak performance These figures are better than those obtained by previous many-core architectures, except for the area efficiency which is limited by the lower memory bandwidth considered. The results achieved are also better than those of previous state-of-the-art many-cores architectures designed specifically to achieve high performance for matrix multiplication.

A many-core co-processor for embedded parallel computing on FPGA

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Single processor architectures are unable to provide the required performance of high performance embedded systems. Parallel processing based on general-purpose processors can achieve these performances with a considerable increase of required resources. However, in many cases, simplified optimized parallel cores can be used instead of general-purpose processors achieving better performance at lower resource utilization. In this paper, we propose a configurable many-core architecture to serve as a co-processor for high-performance embedded computing on Field-Programmable Gate Arrays. The architecture consists of an array of configurable simple cores with support for floating-point operations interconnected with a configurable interconnection network. For each core it is possible to configure the size of the internal memory, the supported operations and number of interfacing ports. The architecture was tested in a ZYNQ-7020 FPGA in the execution of several parallel algorithms. The results show that the proposed many-core architecture achieves better performance than that achieved with a parallel generalpurpose processor and that up to 32 floating-point cores can be implemented in a ZYNQ-7020 SoC FPGA.

Parallel GPU architecture for hyperspectral unmixing based on augmented Lagrangian method

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Hyperspectral imaging has become one of the main topics in remote sensing applications, which comprise hundreds of spectral bands at different (almost contiguous) wavelength channels over the same area generating large data volumes comprising several GBs per flight. This high spectral resolution can be used for object detection and for discriminate between different objects based on their spectral characteristics. One of the main problems involved in hyperspectral analysis is the presence of mixed pixels, which arise when the spacial resolution of the sensor is not able to separate spectrally distinct materials. Spectral unmixing is one of the most important task for hyperspectral data exploitation. However, the unmixing algorithms can be computationally very expensive, and even high power consuming, which compromises the use in applications under on-board constraints. In recent years, graphics processing units (GPUs) have evolved into highly parallel and programmable systems. Specifically, several hyperspectral imaging algorithms have shown to be able to benefit from this hardware taking advantage of the extremely high floating-point processing performance, compact size, huge memory bandwidth, and relatively low cost of these units, which make them appealing for onboard data processing. In this paper, we propose a parallel implementation of an augmented Lagragian based method for unsupervised hyperspectral linear unmixing on GPUs using CUDA. The method called simplex identification via split augmented Lagrangian (SISAL) aims to identify the endmembers of a scene, i.e., is able to unmix hyperspectral data sets in which the pure pixel assumption is violated. The efficient implementation of SISAL method presented in this work exploits the GPU architecture at low level, using shared memory and coalesced accesses to memory.

Grid structure impact in sparse point representation of derivatives

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the Sparse Point Representation (SPR) method the principle is to retain the function data indicated by significant interpolatory wavelet coefficients, which are defined as interpolation errors by means of an interpolating subdivision scheme. Typically, a SPR grid is coarse in smooth regions, and refined close to irregularities. Furthermore, the computation of partial derivatives of a function from the information of its SPR content is performed in two steps. The first one is a refinement procedure to extend the SPR by the inclusion of new interpolated point values in a security zone. Then, for points in the refined grid, such derivatives are approximated by uniform finite differences, using a step size proportional to each point local scale. If required neighboring stencils are not present in the grid, the corresponding missing point values are approximated from coarser scales using the interpolating subdivision scheme. Using the cubic interpolation subdivision scheme, we demonstrate that such adaptive finite differences can be formulated in terms of a collocation scheme based on the wavelet expansion associated to the SPR. For this purpose, we prove some results concerning the local behavior of such wavelet reconstruction operators, which stand for SPR grids having appropriate structures. This statement implies that the adaptive finite difference scheme and the one using the step size of the finest level produce the same result at SPR grid points. Consequently, in addition to the refinement strategy, our analysis indicates that some care must be taken concerning the grid structure, in order to keep the truncation error under a certain accuracy limit. Illustrating results are presented for 2D Maxwell's equation numerical solutions.

A Step-up mu-Power Converter for Solar Energy Harvesting Applications, using Hill Climbing Maximum Power Point Tracking

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a step-up micro-power converter for solar energy harvesting applications. The circuit uses a SC voltage tripler architecture, controlled by an MPPT circuit based on the Hill Climbing algorithm. This circuit was designed in a 0.13 mu m CMOS technology in order to work with an a-Si PV cell. The circuit has a local power supply voltage, created using a scaled down SC voltage tripler, controlled by the same MPPT circuit, to make the circuit robust to load and illumination variations. The SC circuits use a combination of PMOS and NMOS transistors to reduce the occupied area. A charge re-use scheme is used to compensate the large parasitic capacitors associated to the MOS transistors. The simulation results show that the circuit can deliver a power of 1266 mu W to the load using 1712 mu W of power from the PV cell, corresponding to an efficiency as high as 73.91%. The simulations also show that the circuit is capable of starting up with only 19% of the maximum illumination level.

Sensing Cloud Optimization to Solve ED of Units with Valve-Point Effects and Multi-fuels

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper a solution to an highly constrained and non-convex economical dispatch (ED) problem with a meta-heuristic technique named Sensing Cloud Optimization (SCO) is presented. The proposed meta-heuristic is based on a cloud of particles whose central point represents the objective function value and the remaining particles act as sensors "to fill" the search space and "guide" the central particle so it moves into the best direction. To demonstrate its performance, a case study with multi-fuel units and valve- point effects is presented.

Offshore wind turbine simulation: multibody drive train. Bacl-to-back NPC (Neutral Point Clamped) converters: Fractional-order control

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper is on offshore wind energy conversion systems installed on the deep water and equipped with back-to-back neutral point clamped full-power converter, permanent magnet synchronous generator with an AC link. The model for the drive train is a five-mass model which incorporates the dynamic of the structure and the tower in order to emulate the effect of the moving surface. A three-level converter and a four-level converter are the two options with a fractional-order control strategy considered to equip the conversion system. Simulation studies are carried out to assess the quality of the energy injected into the electric grid. Finally, conclusions are presented. (C) 2014 Elsevier Ltd. All rights reserved.

Logarithmically proportional objective function for planar surfaces recognition in 3D point cloud

Relevância:

20.00% 20.00%

Publicador:

Resumo:

3D laser scanning is becoming a standard technology to generate building models of a facility's as-is condition. Since most constructions are constructed upon planar surfaces, recognition of them paves the way for automation of generating building models. This paper introduces a new logarithmically proportional objective function that can be used in both heuristic and metaheuristic (MH) algorithms to discover planar surfaces in a point cloud without exploiting any prior knowledge about those surfaces. It can also adopt itself to the structural density of a scanned construction. In this paper, a metaheuristic method, genetic algorithm (GA), is used to test this introduced objective function on a synthetic point cloud. The results obtained show the proposed method is capable to find all plane configurations of planar surfaces (with a wide variety of sizes) in the point cloud with a minor distance to the actual configurations. © 2014 IEEE.

«
1
2
»