97 resultados para implementations


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Real time digital signal processing demands high performance implementations of division and square root. This can only be achieved by the design of fast and efficient arithmetic algorithms which address practical VLSI architectural design issues. In this paper, new algorithms for division and square root are described. The new schemes are based on pre-scaling the operands and modifying the classical SRT method such that the result digits and the remainders are computed concurrently and the computations in adjacent rows are overlapped. Consequently, their performance exceeds that of the SRT methods. The hardware cost for higher radices is considerably more than that of the SRT methods but for many applications, this is not prohibitive. A system of equations is presented which enables both an analysis of the method for any radix and the parameters of implementations to be easily determined. This is illustrated for the case of radix 2 and radix 4. In addition, a highly regular array architecture combining the division and square root method is described. © 1994 Kluwer Academic Publishers.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, a new reconfigurable multi-standard architecture is introduced for integer-pixel motion estimation and a standard-cell based chip design study is presented. This has been designed to cover most of the common block-based video compression standards, including MPEG-2, MPEG-4, H.263, H.264, AVS and WMV-9. The architecture exhibits simpler control, high throughput and relative low hardware cost and highly competitive when compared with excising designs for specific video standards. It can also, through the use of control signals, be dynamically reconfigured at run-time to accommodate different system constraint such as the trade-off in power dissipation and video-quality. The computational rates achieved make the circuit suitable for high end video processing applications. Silicon design studies indicate that circuits based on this approach incur only a relatively small penalty in terms of power dissipation and silicon area when compared with implementations for specific standards.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A methodology has been developed which allows a non-specialist to rapidly design silicon wavelet transform cores for a variety of specifications. The cores include both forward and inverse orthonormal wavelet transforms. This methodology is based on efficient, modular and scaleable architectures utilising time-interleaved coefficients for the wavelet transform filters. The cores are parameterized in terms of wavelet type and data and coefficient word lengths. The designs have been captured in VHDL and are hence portable across a range of silicon foundries as well as FPGA and PLD implementations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A generic, parameterisable key scheduling core is presented, which can be utilised in pipelinable private-key encryption algorithms. The data encryption standard (DES) algorithm, which lends itself readily to pipelining, is utilised to exemplify this novel key scheduling method and the broader applicability of the method to other encryption algorithms is illustrated. The DES design is implemented on Xilinx Virtex FPGA technology. Utilising the novel method, a 16-stage pipelined DES design is achieved, which can run at an encryption rate of 3.87 Gbit/s. This result is among the fastest hardware implementations and is a factor 28 times faster than software implementations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A rapid design methodology for biorthogonal wavelet transform cores has been developed. This methodology is based on a generic, scaleable architecture for the wavelet filters. The architecture offers efficient hardware utilization by combining the linear phase property of biorthogonal filters with decimation in a MAC based implementation. The design has been captured in VHDL and parameterized in terms of wavelet type, data word length and coefficient word length. The control circuit is embedded within the cores and allows them to be cascaded without any interface glue logic for any desired level of decomposition. The design time to produce silicon layout of a biorthogonal wavelet based system is typically less than a day. The resulting silicon cores produced are comparable in area and performance to hand-crafted designs. The designs are portable across a range of foundries and are also applicable to FPGA and PLD implementations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A methodology for the production of silicon cores for wavelet packet decomposition has been developed. The scheme utilizes efficient scalable architectures for both orthonormal and biorthogonal wavelet transforms. The cores produced from these architectures can be readily scaled for any wavelet function and are easily configurable for any subband structure. The cores are fully parameterized in terms of wavelet choice and appropriate wordlengths. Designs produced are portable across a range of silicon foundries as well as FPGA and PLD technologies. A number of exemplar implementations have been produced.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a thorough investigation of the combined allocator design for Networks-on-Chip (NoC). Particularly, we discuss the interlock of the combined NoC allocator, which is caused by the lock mechanism of priority updating between the local and global arbiters. Architectures and implementations of three interlock-free combined allocators are presented in detail. Their cost, critical path, as well as network level performance are demonstrated based on 65-nm standard cell technology.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

As ubiquitous computing becomes a reality, sensitive information is increasingly processed and transmitted by smart cards, mobile devices and various types of embedded systems. This has led to the requirement of a new class of lightweight cryptographic algorithm to ensure security in these resource constrained environments. The International Organization for Standardization (ISO) has recently standardised two low-cost block ciphers for this purpose, Clefia and Present. In this paper we provide the first comprehensive hardware architecture comparison between these ciphers, as well as a comparison with the current National Institute of Standards and Technology (NIST) standard, the Advanced Encryption Standard.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

High-dimensional gene expression data provide a rich source of information because they capture the expression level of genes in dynamic states that reflect the biological functioning of a cell. For this reason, such data are suitable to reveal systems related properties inside a cell, e.g., in order to elucidate molecular mechanisms of complex diseases like breast or prostate cancer. However, this is not only strongly dependent on the sample size and the correlation structure of a data set, but also on the statistical hypotheses tested. Many different approaches have been developed over the years to analyze gene expression data to (I) identify changes in single genes, (II) identify changes in gene sets or pathways, and (III) identify changes in the correlation structure in pathways. In this paper, we review statistical methods for all three types of approaches, including subtypes, in the context of cancer data and provide links to software implementations and tools and address also the general problem of multiple hypotheses testing. Further, we provide recommendations for the selection of such analysis methods.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Refactoring is the process of changing the structure of a program without changing its behaviour. Refactoring has so far only really been deployed effectively for sequential programs. However, with the increased availability of multicore (and, soon, manycore) systems, refactoring can play an important role in helping both expert and non-expert parallel programmers structure and implement their parallel programs. This paper describes the design of a new refactoring tool that is aimed at increasing the programmability of parallel systems. To motivate our design, we refactor a number of examples in C, C++ and Erlang into good parallel implementations, using a set of formal pattern rewrite rules. © 2013 Springer-Verlag Berlin Heidelberg.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

With the over-provisioned routing resource on FPGA, the topology choice for NoC implementation on FPGA is more flexible than on ASIC. However, it is well understood that the global wire routing impacts the performance of NoC on FPGA because the topology is routed by using fixed routing fabric. An important question that arises is: will the benefit of diameter reduction by using a highly connective topology outweigh the impact of global routing? To answer this question, we investigate FPGA based packet switched NoC implementations with different sizes and topologies, and quantitatively measure the impact of global routing to each of these networks. The result shows that with sufficient routing resources on modern FPGA, the global routing is not on the critical path of the system, and thus is not a dominating factor for the performance of practical multi-hop NoC system. © 2011 IEEE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper investigates sub-integer implementations of the adaptive Gaussian mixture model (GMM) for background/foreground segmentation to allow the deployment of the method on low cost/low power processors that lack Floating Point Unit (FPU). We propose two novel integer computer arithmetic techniques to update Gaussian parameters. Specifically, the mean value and the variance of each Gaussian are updated by a redefined and generalised "round'' operation that emulates the original updating rules for a large set of learning rates. Weights are represented by counters that are updated following stochastic rules to allow a wider range of learning rates and the weight trend is approximated by a line or a staircase. We demonstrate that the memory footprint and computational cost of GMM are significantly reduced, without significantly affecting the performance of background/foreground segmentation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Enhancing sampling and analyzing simulations are central issues in molecular simulation. Recently, we introduced PLUMED, an open-source plug-in that provides some of the most popular molecular dynamics (MD) codes with implementations of a variety of different enhanced sampling algorithms and collective variables (CVs). The rapid changes in this field, in particular new directions in enhanced sampling and dimensionality reduction together with new hardware, require a code that is more flexible and more efficient. We therefore present PLUMED 2 here a,complete rewrite of the code in an object-oriented programming language (C++). This new version introduces greater flexibility and greater modularity, which both extends its core capabilities and makes it far easier to add new methods and CVs. It also has a simpler interface with the MD engines and provides a single software library containing both tools and core facilities. Ultimately, the new code better serves the ever-growing community of users and contributors in coping with the new challenges arising in the field.

Program summary

Program title: PLUMED 2

Catalogue identifier: AEEE_v2_0

Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AEEE_v2_0.html

Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland

Licensing provisions: Yes

No. of lines in distributed program, including test data, etc.: 700646

No. of bytes in distributed program, including test data, etc.: 6618136

Distribution format: tar.gz

Programming language: ANSI-C++.

Computer: Any computer capable of running an executable produced by a C++ compiler.

Operating system: Linux operating system, Unix OSs.

Has the code been vectorized or parallelized?: Yes, parallelized using MPI.

RAM: Depends on the number of atoms, the method chosen and the collective variables used.

Classification: 3, 7.7, 23. Catalogue identifier of previous version: AEEE_v1_0.

Journal reference of previous version: Comput. Phys. Comm. 180 (2009) 1961.

External routines: GNU libmatheval, Lapack, Bias, MPI. (C) 2013 Elsevier B.V. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We introduce a general scheme for sequential one-way quantum computation where static systems with long-living quantum coherence (memories) interact with moving systems that may possess very short coherence times. Both the generation of the cluster state needed for the computation and its consumption by measurements are carried out simultaneously. As a consequence, effective clusters of one spatial dimension fewer than in the standard approach are sufficient for computation. In particular, universal computation requires only a one-dimensional array of memories. The scheme applies to discrete-variable systems of any dimension as well as to continuous-variable ones, and both are treated equivalently under the light of local complementation of graphs. In this way our formalism introduces a general framework that encompasses and generalizes in a unified manner some previous system-dependent proposals. The procedure is intrinsically well suited for implementations with atom-photon interfaces.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We propose a general framework to effectively `open' a high-Q resonator, that is, to release the quantum state initially prepared in it in the form of a traveling electromagnetic wave. This is achieved by employing a mediating mode that scatters coherently the radiation from the resonator into a one-dimensional continuum of modes such as a waveguide. The same mechanism may be used to `feed' a desired quantum field to an initially empty cavity. Switching between an `open' and `closed' resonator may then be obtained by controlling either the detuning of the scatterer or the amount of time it spends in the resonator. First, we introduce the model in its general form, identifying (i) the traveling mode that optimally retains the full quantum information of the resonator field and (ii) a suitable figure of merit that we study analytically in terms of the system parameters. Then, we discuss two feasible implementations based on ensembles of two-level atoms interacting with cavity fields. In addition, we discuss how to integrate traditional cavity QED in our proposal using three-level atoms.