669 resultados para Programmable calculators.
Resumo:
ASICs offer the best realization of DSP algorithms in terms of performance, but the cost is prohibitive, especially when the volumes involved are low. However, if the architecture synthesis trajectory for such algorithms is such that the target architecture can be identified as an interconnection of elementary parameterized computational structures, then it is possible to attain a close match, both in terms of performance and power with respect to an ASIC, for any algorithmic parameters of the given algorithm. Such an architecture is weakly programmable (configurable) and can be viewed as an application specific integrated processor (ASIP). In this work, we present a methodology to synthesize ASIPs for DSP algorithms. (C) 1999 Elsevier Science B.V. All rights reserved.
Resumo:
In-situ transmission electron microscopy (TEM) has developed rapidly over the last decade. In particular, with the inclusion of scanning probes in TEM holders, allows both mechanical and electrical testing to be performed whilst simultaneously imaging the microstructure at high resolution. In-situ TEM nanoindentation and tensile experiments require only an axial displacement perpendicular to the test surface. However, here, through the development of a novel in-situ TEM triboprobe, other surface characterisation experiments are now possible, with the introduction of a fully programmable 3D positioning system. Programmable lateral displacement control allows scratch tests to be performed at high resolution with simultaneous imaging of the changing microstructure. With the addition of repeated cyclic movements, both nanoscale fatigue and friction experiments can also now be performed. We demonstrate a range of movement profiles for a variety of applications, in particular, lateral sliding wear. The developed NanoLAB TEM triboprobe also includes a new closed loop vision control system for intuitive control during positioning and alignment. It includes an automated online calibration to ensure that the fine piezotube is controlled accurately throughout any type of test. Both the 3D programmability and the closed loop vision feedback system are demonstrated here.
Resumo:
In this paper we explore an implementation of a high-throughput, streaming application on REDEFINE-v2, which is an enhancement of REDEFINE. REDEFINE is a polymorphic ASIC combining the flexibility of a programmable solution with the execution speed of an ASIC. In REDEFINE Compute Elements are arranged in an 8x8 grid connected via a Network on Chip (NoC) called RECONNECT, to realize the various macrofunctional blocks of an equivalent ASIC. For a 1024-FFT we carry out an application-architecture design space exploration by examining the various characterizations of Compute Elements in terms of the size of the instruction store. We further study the impact by using application specific, vectorized FUs. By setting up different partitions of the FFT algorithm for persistent execution on REDEFINE-v2, we derive the benefits of setting up pipelined execution for higher performance. The impact of the REDEFINE-v2 micro-architecture for any arbitrary N-point FFT (N > 4096) FFT is also analyzed. We report the various algorithm-architecture tradeoffs in terms of area and execution speed with that of an ASIC implementation. In addition we compare the performance gain with respect to a GPP.
Resumo:
In this paper we propose the architecture of a SoC fabric onto which applications described in a HLL are synthesized. The fabric is a homogeneous layout of computation, storage and communication resources on silicon. Through a process of composition of resources (as opposed to decomposition of applications), application specific computational structures are defined on the fabric at runtime to realize different modules of the applications in hardware. Applications synthesized on this fabric offers performance comparable to ASICs while retaining the programmability of processing cores. We outline the application synthesis methodology through examples, and compare our results with software implementations on traditional platforms with unbounded resources.
Resumo:
A methodology is presented for the synthesis of analog circuits using piecewise linear (PWL) approximations. The function to be synthesized is divided into PWL segments such that each segment can be realized using elementary MOS current-mode programmable-gain circuits. A number of these elementary current-mode circuits when connected in parallel, it is possible to realize piecewise linear approximation of any arbitrary analog function with in the allowed approximation error bounds. Simulation results show a close agreement between the desired function and the synthesized output. The number of PWL segments used for approximation and hence the circuit area is determined by the required accuracy and the smoothness of the resulting function.
Resumo:
ASICs offer the best realization of DSP algorithms in terms of performance, but the cost is prohibitive, especially when the volumes involved are low. However, if the architecture synthesis trajectory for such algorithms is such that the target architecture can be identified as an interconnection of elementary parameterized computational structures, then it is possible to attain a close match, both in terms of performance and power with respect to an ASIC, for any algorithmic parameters of the given algorithm. Such an architecture is weakly programmable (configurable) and can be viewed as an application specific instruction-set processor (ASIP). In this work, we present a methodology to synthesize ASIPs for DSP algorithms.
Resumo:
An all-digital technique is proposed for generating an accurate delay irrespective of the inaccuracies of a controllable delay line. A subsampling technique-based delay measurement unit (DMU) capable of measuring delays accurately for the full period range is used as the feedback element to build accurate fractional period delays based on input digital control bits. The proposed delay generation system periodically measures and corrects the error and maintains it at the minimum value without requiring any special calibration phase. Up to 40x improvement in accuracy is demonstrated for a commercial programmable delay generator chip. The time-precision trade-off feature of the DMU is utilized to reduce the locking time. Loop dynamics are adjusted to stabilize the delay after the minimum error is achieved, thus avoiding additional jitter. Measurement results from a high-end oscilloscope also validate the effectiveness of the proposed system in improving accuracy.
Resumo:
A CMOS gas sensor array platform with digital read-out containing 27 sensor pixels and a reference pixel is presented. A signal conditioning circuit at each pixel includes digitally programmable gain stages for sensor signal amplification followed by a second order continuous time delta sigma modulator for digitization. Each sensor pixel can be functionalized with a distinct sensing material that facilitates transduction based on impedance change. Impedance spectrum (up to 10 KHz) of the sensor is obtained off-chip by computing the fast Fourier transform of sensor and reference pixel outputs. The reference pixel also compensates for the phase shift introduced by the signal processing circuits. The chip also contains a temperature sensor with digital readout for ambient temperature measurement. A sensor pixel is functionalized with polycarbazole conducting polymer for sensing volatile organic gases and measurement results are presented. The chip is fabricated in a 0.35 CMOS technology and requires a single step post processing for functionalization. It consumes 57 mW from a 3.3 V supply.
Resumo:
With ever increasing network speed, scalable and reliable detection of network port scans has become a major challenge. In this paper, we present a scalable and flexible architecture and a novel algorithm, to detect and block port scans in real time. The proposed architecture detects fast scanners as well as stealth scanners having large inter-probe periods. FPGA implementation of the proposed system gives an average throughput of 2 Gbps with a system clock frequency of 100 MHz on Xilinx Virtex-II Pro FPGA. Experimental results on real network trace show the effectiveness of the proposed system in detecting and blocking network scans with very low false positives and false negatives.
Resumo:
Video decoders used in emerging applications need to be flexible to handle a large variety of video formats and deliver scalable performance to handle wide variations in workloads. In this paper we propose a unified software and hardware architecture for video decoding to achieve scalable performance with flexibility. The light weight processor tiles and the reconfigurable hardware tiles in our architecture enable software and hardware implementations to co-exist, while a programmable interconnect enables dynamic interconnection of the tiles. Our process network oriented compilation flow achieves realization agnostic application partitioning and enables seamless migration across uniprocessor, multi-processor, semi hardware and full hardware implementations of a video decoder. An application quality of service aware scheduler monitors and controls the operation of the entire system. We prove the concept through a prototype of the architecture on an off-the-shelf FPGA. The FPGA prototype shows a scaling in performance from QCIF to 1080p resolutions in four discrete steps. We also demonstrate that the reconfiguration time is short enough to allow migration from one configuration to the other without any frame loss.
Resumo:
This paper presents a comparative evaluation of the average and switching models of a dc-dc boost converter from the point of view of real-time simulation. Both the models are used to simulate the converter in real-time on a Field Programmable Gate Array (FPGA) platform. The converter is considered to function over a wide range of operating conditions, and could do transition between continuous conduction mode (CCM) and discontinuous conduction mode (DCM). While the average model is known to be computationally efficient from the perspective of off-line simulation, the same is shown here to consume more logical resources than the switching model for real-time simulation of the dc-dc converter. Further, evaluation of the boundary condition between CCM and DCM is found to be the main reason for the increased consumption of resources by the average model.
Resumo:
A Field Programmable Gate Array (FPGA) based hardware accelerator for multi-conductor parasitic capacitance extraction, using Method of Moments (MoM), is presented in this paper. Due to the prohibitive cost of solving a dense algebraic system formed by MoM, linear complexity fast solver algorithms have been developed in the past to expedite the matrix-vector product computation in a Krylov sub-space based iterative solver framework. However, as the number of conductors in a system increases leading to a corresponding increase in the number of right-hand-side (RHS) vectors, the computational cost for multiple matrix-vector products present a time bottleneck, especially for ill-conditioned system matrices. In this work, an FPGA based hardware implementation is proposed to parallelize the iterative matrix solution for multiple RHS vectors in a low-rank compression based fast solver scheme. The method is applied to accelerate electrostatic parasitic capacitance extraction of multiple conductors in a Ball Grid Array (BGA) package. Speed-ups up to 13x over equivalent software implementation on an Intel Core i5 processor for dense matrix-vector products and 12x for QR compressed matrix-vector products is achieved using a Virtex-6 XC6VLX240T FPGA on Xilinx's ML605 board.
Resumo:
Support vector machines (SVM) are a popular class of supervised models in machine learning. The associated compute intensive learning algorithm limits their use in real-time applications. This paper presents a fully scalable architecture of a coprocessor, which can compute multiple rows of the kernel matrix in parallel. Further, we propose an extended variant of the popular decomposition technique, sequential minimal optimization, which we call hybrid working set (HWS) algorithm, to effectively utilize the benefits of cached kernel columns and the parallel computational power of the coprocessor. The coprocessor is implemented on Xilinx Virtex 7 field-programmable gate array-based VC707 board and achieves a speedup of upto 25x for kernel computation over single threaded computation on Intel Core i5. An application speedup of upto 15x over software implementation of LIBSVM and speedup of upto 23x over SVMLight is achieved using the HWS algorithm in unison with the coprocessor. The reduction in the number of iterations and sensitivity of the optimization time to variation in cache size using the HWS algorithm are also shown.
Resumo:
In this article, a Field Programmable Gate Array (FPGA)-based hardware accelerator for 3D electromagnetic extraction, using Method of Moments (MoM) is presented. As the number of nets or ports in a system increases, leading to a corresponding increase in the number of right-hand-side (RHS) vectors, the computational cost for multiple matrix-vector products presents a time bottleneck in a linear-complexity fast solver framework. In this work, an FPGA-based hardware implementation is proposed toward a two-level parallelization scheme: (i) matrix level parallelization for single RHS and (ii) pipelining for multiple-RHS. The method is applied to accelerate electrostatic parasitic capacitance extraction of multiple nets in a Ball Grid Array (BGA) package. The acceleration is shown to be linearly scalable with FPGA resources and speed-ups over 10x against equivalent software implementation on a 2.4GHz Intel Core i5 processor is achieved using a Virtex-6 XC6VLX240T FPGA on Xilinx's ML605 board with the implemented design operating at 200MHz clock frequency. (c) 2016 Wiley Periodicals, Inc. Microwave Opt Technol Lett 58:776-783, 2016
Resumo:
We present a method of rapidly producing computer-generated holograms that exhibit geometric occlusion in the reconstructed image. Conceptually, a bundle of rays is shot from every hologram sample into the object volume.We use z buffering to find the nearest intersecting object point for every ray and add its complex field contribution to the corresponding hologram sample. Each hologram sample belongs to an independent operation, allowing us to exploit the parallel computing capability of modern programmable graphics processing units (GPUs). Unlike algorithms that use points or planar segments as the basis for constructing the hologram, our algorithm's complexity is dependent on fixed system parameters, such as the number of ray-casting operations, and can therefore handle complicated models more efficiently. The finite number of hologram pixels is, in effect, a windowing function, and from analyzing the Wigner distribution function of windowed free-space transfer function we find an upper limit on the cone angle of the ray bundle. Experimentally, we found that an angular sampling distance of 0:01' for a 2:66' cone angle produces acceptable reconstruction quality. © 2009 Optical Society of America.