Biblioteca Digital

116 resultados para parallel processing systems

VLSI architectures for vector quantization

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The real time implementation of an efficient signal compression technique, Vector Quantization (VQ), is of great importance to many digital signal coding applications. In this paper, we describe a new family of bit level systolic VLSI architectures which offer an attractive solution to this problem. These architectures are based on a bit serial, word parallel approach and high performance and efficiency can be achieved for VQ applications of a wide range of bandwidths. Compared with their bit parallel counterparts, these bit serial circuits provide better alternatives for VQ implementations in terms of performance and cost. © 1995 Kluwer Academic Publishers.

High performance VLSI architecture for Wave Digital Filtering

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The application of fine grain pipelining techniques in the design of high performance Wave Digital Filters (WDFs) is described. It is shown that significant increases in the sampling rate of bit parallel circuits can be achieved using most significant bit (msb) first arithmetic. A novel VLSI architecture for implementing two-port adaptor circuits is described which embodies these ideas. The circuit in question is highly regular, uses msb first arithmetic and is implemented using simple carry-save adders. © 1992 Kluwer Academic Publishers.

Bit-Level systolic architectures for high performance IIR filtering

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Several novel systolic architectures for implementing densely pipelined bit parallel IIR filter sections are presented. The fundamental problem of latency in the feedback loop is overcome by employing redundant arithmetic in combination with bit-level feedback, allowing a basic first-order section to achieve a wordlength-independent latency of only two clock cycles. This is extended to produce a building block from which higher order sections can be constructed. The architecture is then refined by combining the use of both conventional and redundant arithmetic, resulting in two new structures offering substantial hardware savings over the original design. In contrast to alternative techniques, bit-level pipelinability is achieved with no net cost in hardware. © 1989 Kluwer Academic Publishers.

Efficient Runtime Thread Management for the Nano-Threads Programming Model

Relevância:

90.00% 90.00%

Publicador:

A Tool to Schedule Parallel Applications on Multiprocessors: The NANOS CPU Manager

Relevância:

90.00% 90.00%

Publicador:

Inference and Declaration of Independence in Task-Parallel Programs

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The inherent difficulty of thread-based shared-memory programming has recently motivated research in high-level, task-parallel programming models. Recent advances of Task-Parallel models add implicit synchronization, where the system automatically detects and satisfies data dependencies among spawned tasks. However, dynamic dependence analysis incurs significant runtime overheads, because the runtime must track task resources and use this information to schedule tasks while avoiding conflicts and races.
We present SCOOP, a compiler that effectively integrates static and dynamic analysis in code generation. SCOOP combines context-sensitive points-to, control-flow, escape, and effect analyses to remove redundant dependence checks at runtime. Our static analysis can work in combination with existing dynamic analyses and task-parallel runtimes that use annotations to specify tasks and their memory footprints. We use our static dependence analysis to detect non-conflicting tasks and an existing dynamic analysis to handle the remaining dependencies. We evaluate the resulting hybrid dependence analysis on a set of task-parallel programs.

An accelerated method of developing DBT projects using CDIO partner institutions as parallel processors

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The cycle of the academic year impacts on efforts to refine and improve major group design-build-test (DBT) projects since the time to run and evaluate projects is generally a full calendar year. By definition these major projects have a high degree of complexity since they act as the vehicle for the application of a range of technical knowledge and skills. There is also often an extensive list of desired learning outcomes which extends to include professional skills and attributes such as communication and team working. It is contended that student project definition and operation, like any other designed product, requires a number of iterations to achieve optimisation. The problem however is that if this cycle takes four or more years then by the time a project’s operational structure is fine tuned it is quite possible that the project theme is no longer relevant. The majority of the students will also inevitably experience a sub-optimal project experience over the 5 year development period. It would be much better if the ratio were flipped so that in 1 year an optimised project definition could be achieved which had sufficient longevity that it could run in the same efficient manner for 4 further years. An increased number of parallel investigators would also enable more varied and adventurous project concepts to be examined than a single institution could undertake alone in the same time frame.
This work-in-progress paper describes a parallel processing methodology for the accelerated definition of new student DBT project concepts. This methodology has been devised and implemented by a number of CDIO partner institutions in the UK & Ireland region. An agreed project theme was operated in parallel in one academic year with the objective of replacing a multi-year iterative cycle. Additionally the close collaboration and peer learning derived from the interaction between the coordinating academics facilitated the development of faculty teaching skills in line with CDIO standard 10.

IPPro: FPGA based Image Processing Processor

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The paper presents IPPro which is a high performance, scalable soft-core processor targeted for image processing applications. It has been based on the Xilinx DSP48E1 architecture using the ZYNQ Field Programmable Gate Array and is a scalar 16-bit RISC processor that operates at 526MHz, giving 526MIPS of performance. Each IPPro core uses 1 DSP48, 1 Block RAM and 330 Kintex-7 slice-registers, thus making the processor as compact as possible whilst maintaining flexibility and programmability. A key aspect of the approach is in reducing the application design time and implementation effort by using multiple IPPro processors in a SIMD mode. For different applications, this allows us to exploit different levels of parallelism and mapping for the specified processing architecture with the supported instruction set. In this context, a Traffic Sign Recognition (TSR) algorithm has been prototyped on a Zedboard with the colour and morphology operations accelerated using multiple IPPros. Simulation and experimental results demonstrate that the processing platform is able to achieve a speedup of 15 to 33 times for colour filtering and morphology operations respectively, with a reduced design effort and time.

Logic and Memory Design Based on Unequal Error Protection for Voltage-scalable, Robust and Adaptive DSP Systems

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper, we propose a system level design approach considering voltage over-scaling (VOS) that achieves error resiliency using unequal error protection of different computation elements, while incurring minor quality degradation. Depending on user specifications and severity of process variations/channel noise, the degree of VOS in each block of the system is adaptively tuned to ensure minimum system power while providing "just-the-right" amount of quality and robustness. This is achieved, by taking into consideration block level interactions and ensuring that under any change of operating conditions, only the "less-crucial" computations, that contribute less to block/system output quality, are affected. The proposed approach applies unequal error protection to various blocks of a system-logic and memory-and spans multiple layers of design hierarchy-algorithm, architecture and circuit. The design methodology when applied to a multimedia subsystem shows large power benefits ( up to 69% improvement in power consumption) at reasonable image quality while tolerating errors introduced due to VOS, process variations, and channel noise.

Histogram of oriented gradients front end processing: An FPGA based processor approach

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The Field Programmable Gate Array (FPGA) implementation of the commonly used Histogram of Oriented Gradients (HOG) algorithm is explored. The HOG algorithm is employed to extract features for object detection. A key focus has been to explore the use of a new FPGA-based processor which has been targeted at image processing. The paper gives details of the mapping and scheduling factors that influence the performance and the stages that were undertaken to allow the algorithm to be deployed on FPGA hardware, whilst taking into account the specific IPPro architecture features. We show that multi-core IPPro performance can exceed that of against state-of-the-art FPGA designs by up to 3.2 times with reduced design and implementation effort and increased flexibility all on a low cost, Zynq programmable system.

Network Based Malware Detection in Virtualised Environment

Relevância:

90.00% 90.00%

Publicador:

Resumo:

While virtualisation can provide many benefits to a networks infrastructure, securing the virtualised environment is a big challenge. The security of a fully virtualised solution is dependent on the security of each of its underlying components, such as the hypervisor, guest operating systems and storage.

This paper presents a single security service running on the hypervisor that could potentially work to provide security service to all virtual machines running on the system. This paper presents a hypervisor hosted framework which performs specialised security tasks for all underlying virtual machines to protect against any malicious attacks by passively analysing the network traffic of VMs. This framework has been implemented using Xen Server and has been evaluated by detecting a Zeus Server setup and infected clients, distributed over a number of virtual machines. This framework is capable of detecting and identifying all infected VMs with no false positive or false negative detection.

Accelerating integer-based fully homomorphic encryption using Comba multiplication

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Fully Homomorphic Encryption (FHE) is a recently developed cryptographic technique which allows computations on encrypted data. There are many interesting applications for this encryption method, especially within cloud computing. However, the computational complexity is such that it is not yet practical for real-time applications. This work proposes optimised hardware architectures of the encryption step of an integer-based FHE scheme with the aim of improving its practicality. A low-area design and a high-speed parallel design are proposed and implemented on a Xilinx Virtex-7 FPGA, targeting the available DSP slices, which offer high-speed multiplication and accumulation. Both use the Comba multiplication scheduling method to manage the large multiplications required with uneven sized multiplicands and to minimise the number of read and write operations to RAM. Results show that speed up factors of 3.6 and 10.4 can be achieved for the encryption step with medium-sized security parameters for the low-area and parallel designs respectively, compared to the benchmark software implementation on an Intel Core2 Duo E8400 platform running at 3 GHz.

Cross-layer Energy-Efficiency Optimization of Packet Based Wireless MIMO Communication Systems

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Energy in today's short-range wireless communication is mostly spent on the analog- and digital hardware rather than on radiated power. Hence,purely information-theoretic considerations fail to achieve the lowest energy per information bit and the optimization process must carefully consider the overall transceiver. In this paper, we propose to perform cross-layer optimization, based on an energy-aware rate adaptation scheme combined with a physical layer that is able to properly adjust its processing effort to the data rate and the channel conditions to minimize the energy consumption per information bit. This energy proportional behavior is enabled by extending the classical system modes with additional configuration parameters at the various layers. Fine grained models of the power consumption of the hardware are developed to provide awareness of the physical layer capabilities to the medium access control layer. The joint application of the proposed energy-aware rate adaptation and modifications to the physical layer of an IEEE802.11n system, improves energy-efficiency (averaged over many noise and channel realizations) in all considered scenarios by up to 44%.

FPGA-based Tabu Search for Detection in Large-Scale MIMO Systems

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The increasing scale of Multiple-Input Multiple- Output (MIMO) topologies employed in forthcoming wireless communications standards presents a substantial implementation challenge to designers of embedded baseband signal processing architectures for MIMO transceivers. Specifically the increased scale of such systems has a substantial impact on the perfor- mance/cost balance of detection algorithms for these systems. Whilst in small-scale systems Sphere Decoding (SD) algorithms offer the best quasi-ML performance/cost balance, in larger systems heuristic detectors, such Tabu-Search (TS) detectors are superior. This paper addresses a dearth of research in architectures for TS-based MIMO detection, presenting the first known realisations of TS detectors for 4 × 4 and 10 × 10 MIMO systems. To the best of the authors’ knowledge, these are the largest single-chip detectors on record.

A neural network based tool for semi-automatic code transformation

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A neural network based tool has been developed to assist in the process of code transformation. The tool offers advice on appropriate transformations within a knowledge-driven, semi-automatic parallelisation environment. We have identified the essential characteristics of codes relevant to loop transformations. A Kohonen network is used to discover structure in the characterised codes thus revealing new knowledge that may be brought to bear on the mapping between codes and transformations or transformation sequences. A transform selector based on this process has been developed and successfully applied to the parallelisation of sequential codes.

«
1
2
3
4
5
6
7
8
»