Biblioteca Digital

We propose a methodology for optimizing the execution of data parallel (sub-)tasks on CPU and GPU cores of the same heterogeneous architecture. The methodology is based on two main components: i) an analytical performance model for scheduling tasks among CPU and GPU cores, such that the global execution time of the overall data parallel pattern is optimized; and ii) an autonomic module which uses the analytical performance model to implement the data parallel computations in a completely autonomic way, requiring no programmer intervention to optimize the computation across CPU and GPU cores. The analytical performance model uses a small set of simple parameters to devise a partitioning-between CPU and GPU cores-of the tasks derived from structured data parallel patterns/algorithmic skeletons. The model takes into account both hardware related and application dependent parameters. It computes the percentage of tasks to be executed on CPU and GPU cores such that both kinds of cores are exploited and performance figures are optimized. The autonomic module, implemented in FastFlow, executes a generic map (reduce) data parallel pattern scheduling part of the tasks to the GPU and part to CPU cores so as to achieve optimal execution time. Experimental results on state-of-the-art CPU/GPU architectures are shown that assess both performance model properties and autonomic module effectiveness. © 2013 IEEE.

Veja mais

Histogram of oriented gradients front end processing: An FPGA based processor approach

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Field Programmable Gate Array (FPGA) implementation of the commonly used Histogram of Oriented Gradients (HOG) algorithm is explored. The HOG algorithm is employed to extract features for object detection. A key focus has been to explore the use of a new FPGA-based processor which has been targeted at image processing. The paper gives details of the mapping and scheduling factors that influence the performance and the stages that were undertaken to allow the algorithm to be deployed on FPGA hardware, whilst taking into account the specific IPPro architecture features. We show that multi-core IPPro performance can exceed that of against state-of-the-art FPGA designs by up to 3.2 times with reduced design and implementation effort and increased flexibility all on a low cost, Zynq programmable system.

Veja mais

Turbo-smt: Accelerating coupled sparse matrix-tensor factorizations by 200x

Relevância:

80.00% 80.00%

Publicador:

Resumo:

How can we correlate the neural activity in the human brain as it responds to typed words, with properties of these terms (like ‘edible’, ‘fits in hand’)? In short, we want to find latent variables, that jointly explain both the brain activity, as well as the behavioral responses. This is one of many settings of the Coupled Matrix-Tensor Factorization (CMTF) problem.

Can we accelerate any CMTF solver, so that it runs within a few minutes instead of tens of hours to a day, while maintaining good accuracy? We introduce Turbo-SMT, a meta-method capable of doing exactly that: it boosts the performance of any CMTF algorithm, by up to 200x, along with an up to 65 fold increase in sparsity, with comparable accuracy to the baseline.

We apply Turbo-SMT to BrainQ, a dataset consisting of a (nouns, brain voxels, human subjects) tensor and a (nouns, properties) matrix, with coupling along the nouns dimension. Turbo-SMT is able to find meaningful latent variables, as well as to predict brain activity with competitive accuracy.

Veja mais

Accelerating integer-based fully homomorphic encryption using Comba multiplication

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Fully Homomorphic Encryption (FHE) is a recently developed cryptographic technique which allows computations on encrypted data. There are many interesting applications for this encryption method, especially within cloud computing. However, the computational complexity is such that it is not yet practical for real-time applications. This work proposes optimised hardware architectures of the encryption step of an integer-based FHE scheme with the aim of improving its practicality. A low-area design and a high-speed parallel design are proposed and implemented on a Xilinx Virtex-7 FPGA, targeting the available DSP slices, which offer high-speed multiplication and accumulation. Both use the Comba multiplication scheduling method to manage the large multiplications required with uneven sized multiplicands and to minimise the number of read and write operations to RAM. Results show that speed up factors of 3.6 and 10.4 can be achieved for the encryption step with medium-sized security parameters for the low-area and parallel designs respectively, compared to the benchmark software implementation on an Intel Core2 Duo E8400 platform running at 3 GHz.

Veja mais

Multi-bubble sonoluminescence: Laboratory curiosity, or real world application?

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Sonoluminescence (SL) involves the conversion of mechanical [ultra]sound energy into light. Whilst the phenomenon is invariably inefficient, typically converting just 10^-4 of the incident acoustic energy into photons, it is nonetheless extraordinary, as the resultant energy density of the emergent photons exceeds that of the ultrasonic driving field by a factor of some 10 ¹². Sonoluminescence has specific [as yet untapped] advantages in that it can be effected at remote locations in an essentially wireless format. The only [usual] requirement is energy transduction via the violent oscillation of microscopic bubbles within the propagating medium. The dependence of sonoluminescent output on the generating sound field's parameters, such as pulse duration, duty cycle, and position within the field, have been observed and measured previously, and several relevant aspects are discussed presently. We also extrapolate the logic from a recently published analysis relating to the ensuing dynamics of bubble 'clouds' that have been stimulated by ultrasound. Here, the intention was to develop a relevant [yet computationally simplistic] model that captured the essential physical qualities expected from real sonoluminescent microbubble clouds. We focused on the inferred temporal characteristics of SL light output from a population of such bubbles, subjected to intermediate [0.5-2MPa] ultrasonic pressures. Finally, whilst direct applications for sonoluminescent light output are thought unlikely in the main, we proceed to frame the state-of-the- art against several presently existing technologies that could form adjunct approaches with distinct potential for enhancing present sonoluminescent light output that may prove useful in real world [biomedical] applications.

Veja mais

Empirical evaluation of multi-device profiling side-channel attacks

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Side-channel analysis of cryptographic systems can allow for the recovery of secret information by an adversary even where the underlying algorithms have been shown to be provably secure. This is achieved by exploiting the unintentional leakages inherent in the underlying implementation of the algorithm in software or hardware. Within this field of research, a class of attacks known as profiling attacks, or more specifically as used here template attacks, have been shown to be extremely efficient at extracting secret keys. Template attacks assume a strong adversarial model, in that an attacker has an identical device with which to profile the power consumption of various operations. This can then be used to efficiently attack the target device. Inherent in this assumption is that the power consumption across the devices under test is somewhat similar. This central tenet of the attack is largely unexplored in the literature with the research community generally performing the profiling stage on the same device as being attacked. This is beneficial for evaluation or penetration testing as it is essentially the best case scenario for an attacker where the model built during the profiling stage matches exactly that of the target device, however it is not necessarily a reflection on how the attack will work in reality. In this work, a large scale evaluation of this assumption is performed, comparing the key recovery performance across 20 identical smart-cards when performing a profiling attack.

Veja mais

Vertex Clustering of Augmented Graph Streams

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper we propose a graph stream clustering algorithm with a unied similarity measure on both structural and attribute properties of vertices, with each attribute being treated as a vertex. Unlike others, our approach does not require an input parameter for the number of clusters, instead, it dynamically creates new sketch-based clusters and periodically merges existing similar clusters. Experiments on two publicly available datasets reveal the advantages of our approach in detecting vertex clusters in the graph stream. We provide a detailed investigation into how parameters affect the algorithm performance. We also provide a quantitative evaluation and comparison with a well-known offline community detection algorithm which shows that our streaming algorithm can achieve comparable or better average cluster purity.

Veja mais

998 resultados para 010200 APPLIED MATHEMATICS

Filtro por publicador