10 resultados para Compute Unified Device Architecture (CUDA)
Resumo:
Field programmable gate array (FPGA) technology is a powerful platform for implementing computationally complex, digital signal processing (DSP) systems. Applications that are multi-modal, however, are designed for worse case conditions. In this paper, genetic sequencing techniques are applied to give a more sophisticated decomposition of the algorithmic variations, thus allowing an unified hardware architecture which gives a 10-25% area saving and 15% power saving for a digital radar receiver.
Resumo:
In this theoretical paper, the analysis of the effect that ON-state active-device resistance has on the performance of a Class-E tuned power amplifier using a shunt inductor topology is presented. The work is focused on the relatively unexplored area of design facilitation of Class-E tuned amplifiers where intrinsically low-output-capacitance monolithic microwave integrated circuit switching devices such as pseudomorphic high electron mobility transistors are used. In the paper, the switching voltage and current waveforms in the presence of ON-resistance are analyzed in order to provide insight into circuit properties such as RF output power, drain efficiency, and power-output capability. For a given amplifier specification, a design procedure is illustrated whereby it is possible to compute optimal circuit component values which account for prescribed switch resistance loss. Furthermore, insight into how ON-resistance affects transistor selection in terms of peak switch voltage and current requirements is described. Finally, a design example is given in order to validate the theoretical analysis against numerical simulation.
Resumo:
This paper outlines the design and development of a Java-based, unified and flexible natural language dialogue system that enables users to interact using natural language, e.g. speech. A number of software development issues are considered with the aim of designing an architecture that enables different discourse components to be readily and flexibly combined in a manner that permits information to be easily shared. Use of XML schemas assists this component interaction. The paper describes how a range of Java language features were employed to support the development of the architecture, providing an illustration of how a modern programming language makes tractable the development of a complex dialogue system.
Resumo:
Continuing achievements in hardware technology are bringing ubiquitous computing closer to reality. The notion of a connected, interactive and autonomous environment is common to all sensor networks, biosystems and radio frequency identification (RFID) devices, and the emergence of significant deployments and sophisticated applications can be expected. However, as more information is collected and transmitted, security issues will become vital for such a fully connected environment. In this study the authors consider adding security features to low-cost devices such as RFID tags. In particular, the authors consider the implementation of a digital signature architecture that can be used for device authentication, to prevent tag cloning, and for data authentication to prevent transmission forgery. The scheme is built around the signature variant of the cryptoGPS identification scheme and the SHA-1 hash function. When implemented on 130 nm CMOS the full design uses 7494 gates and consumes 4.72 mu W of power, making it smaller and more power efficient than previous low-cost digital signature designs. The study also presents a low-cost SHA-1 hardware architecture which is the smallest standardised hash function design to date.
Resumo:
Task dataflow languages simplify the specification of parallel programs by dynamically detecting and enforcing dependencies between tasks. These languages are, however, often restricted to a single level of parallelism. This language design is reflected in the runtime system, where a master thread explicitly generates a task graph and worker threads execute ready tasks and wake-up their dependents. Such an approach is incompatible with state-of-the-art schedulers such as the Cilk scheduler, that minimize the creation of idle tasks (work-first principle) and place all task creation and scheduling off the critical path. This paper proposes an extension to the Cilk scheduler in order to reconcile task dependencies with the work-first principle. We discuss the impact of task dependencies on the properties of the Cilk scheduler. Furthermore, we propose a low-overhead ticket-based technique for dependency tracking and enforcement at the object level. Our scheduler also supports renaming of objects in order to increase task-level parallelism. Renaming is implemented using versioned objects, a new type of hyper object. Experimental evaluation shows that the unified scheduler is as efficient as the Cilk scheduler when tasks have no dependencies. Moreover, the unified scheduler is more efficient than SMPSS, a particular implementation of a task dataflow language.
Resumo:
Passive person detection and localization is an emerging area in UWB localization systems, whereby people are not required to carry any UWB ranging device. Based on experimental data, we propose a novel method to detect static persons in the absence of template waveforms, and to compute distances to these persons. Our method makes very little assumptions on the environment and can achieve ranging performances on the order of 50 cm, using off-the-shelf UWB devices. © 2013 IEEE.
Resumo:
In this paper, we present a unified approach to an energy-efficient variation-tolerant design of Discrete Wavelet Transform (DWT) in the context of image processing applications. It is to be noted that it is not necessary to produce exactly correct numerical outputs in most image processing applications. We exploit this important feature and propose a design methodology for DWT which shows energy quality tradeoffs at each level of design hierarchy starting from the algorithm level down to the architecture and circuit levels by taking advantage of the limited perceptual ability of the Human Visual System. A unique feature of this design methodology is that it guarantees robustness under process variability and facilitates aggressive voltage over-scaling. Simulation results show significant energy savings (74% - 83%) with minor degradations in output image quality and avert catastrophic failures under process variations compared to a conventional design. © 2010 IEEE.
Resumo:
Side-channel analysis of cryptographic systems can allow for the recovery of secret information by an adversary even where the underlying algorithms have been shown to be provably secure. This is achieved by exploiting the unintentional leakages inherent in the underlying implementation of the algorithm in software or hardware. Within this field of research, a class of attacks known as profiling attacks, or more specifically as used here template attacks, have been shown to be extremely efficient at extracting secret keys. Template attacks assume a strong adversarial model, in that an attacker has an identical device with which to profile the power consumption of various operations. This can then be used to efficiently attack the target device. Inherent in this assumption is that the power consumption across the devices under test is somewhat similar. This central tenet of the attack is largely unexplored in the literature with the research community generally performing the profiling stage on the same device as being attacked. This is beneficial for evaluation or penetration testing as it is essentially the best case scenario for an attacker where the model built during the profiling stage matches exactly that of the target device, however it is not necessarily a reflection on how the attack will work in reality. In this work, a large scale evaluation of this assumption is performed, comparing the key recovery performance across 20 identical smart-cards when performing a profiling attack.
Resumo:
Very high speed and low area hardware architectures of the SHACAL-1 encryption algorithm are presented in this paper. The SHACAL algorithm was a submission to the New European Schemes for Signatures, Integrity and Encryption (NESSIE) project and it is based on the SHA-1 hash algorithm. To date, there have been no performance metrics published on hardware implementations of this algorithm. A fully pipelined SHACAL-1 encryption architecture is described in this paper and when implemented on a Virtex-II X2V4000 FPGA device, it runs at a throughput of 17 Gbps. A fully pipelined decryption architecture achieves a speed of 13 Gbps when implemented on the same device. In addition, iterative architectures of the algorithm are presented. The SHACAL-1 decryption algorithm is derived and also presented in this paper, since it was not provided in the submission to NESSIE. © Springer-Verlag Berlin Heidelberg 2003.