955 resultados para Modular programming.
Resumo:
Performance evaluation of parallel software and architectural exploration of innovative hardware support face a common challenge with emerging manycore platforms: they are limited by the slow running time and the low accuracy of software simulators. Manycore FPGA prototypes are difficult to build, but they offer great rewards. Software running on such prototypes runs orders of magnitude faster than current simulators. Moreover, researchers gain significant architectural insight during the modeling process. We use the Formic FPGA prototyping board [1], which specifically targets scalable and cost-efficient multi-board prototyping, to build and test a 64-board model of a 512-core, MicroBlaze-based, non-coherent hardware prototype with a full network-on-chip in a 3D-mesh topology. We expand the hardware architecture to include the ARM Versatile Express platforms and build a 520-core heterogeneous prototype of 8 Cortex-A9 cores and 512 MicroBlaze cores. We then develop an MPI library for the prototype and evaluate it extensively using several bare-metal and MPI benchmarks. We find that our processor prototype is highly scalable, models faithfully single-chip multicore architectures, and is a very efficient platform for parallel programming research, being 50,000 times faster than software simulation.
Resumo:
This paper introduces hybrid address spaces as a fundamental design methodology for implementing scalable runtime systems on many-core architectures without hardware support for cache coherence. We use hybrid address spaces for an implementation of MapReduce, a programming model for large-scale data processing, and the implementation of a remote memory access (RMA) model. Both implementations are available on the Intel SCC and are portable to similar architectures. We present the design and implementation of HyMR, a MapReduce runtime system whereby different stages and the synchronization operations between them alternate between a distributed memory address space and a shared memory address space, to improve performance and scalability. We compare HyMR to a reference implementation and we find that HyMR improves performance by a factor of 1.71× over a set of representative MapReduce benchmarks. We also compare HyMR with Phoenix++, a state-of-art implementation for systems with hardware-managed cache coherence in terms of scalability and sustained to peak data processing bandwidth, where HyMR demon- strates improvements of a factor of 3.1× and 3.2× respectively. We further evaluate our hybrid remote memory access (HyRMA) programming model and assess its performance to be superior of that of message passing.
Resumo:
A formal specification of a complex programming language statement is presented. The subject matter was selected as being typical of the kind confronting a small software house. It is shown that formal specification notations may be applied, with benefit, to 'messy' problems. Emphasis is placed upon producing a specification which is readable by, and useful to a reader not familiar with formal notations.
Resumo:
This paper describes the result of a project to develop climate adaptation design strategies funded by the UK’s Technology Strategy Board. The aim of the project was to look at the threats and opportunities presented by industrialized and house-building techniques in the light of predicted future increases in flooding and overheating due to anthropogenic climate change. The paper shows that the thermal performance of houses built to the current UK Building Regulations is not adequate to cope with changing weather patterns, and in light of this, develops a detailed design for a new house: one that is industrially produced and climatically resilient, but affordable. This detailed concept IDEAhaus of a modular house is not only flood-proof to a water depth of 750 mm, but also is designed to utilize passive cooling, which dramatically reduces the amount of overheating, both now and in the future.
Resumo:
Large loads result in expensive foundations which are a substantial proportion of the capital cost of flap-type Wave Energy Converters (WECs). Devices such as Oyster 800, currently deployed at the European Marine Energy Centre (EMEC), comprise a single flap for the full width of the machine. Splitting a flap-type device into smaller vertical flap modules, to make a ‘modular-flap’, might reduce the total foundation loads, whilst still providing acceptable performance in terms of energy conversion.
This paper investigates the foundation loads of an undamped modular-flap device, comparing them to those for a rigid flap of an equivalent width. Physical modelling in a wave tank is used, with loads recorded using a six degree of freedom (DoF) load cell. Both fatigue and extreme loading analysis was conducted. The rotations of the flaps were also recorded, using a motion-tracking system.
Resumo:
Structured parallel programming is recognised as a viable and effective means of tackling parallel programming problems. Recently, a set of simple and powerful parallel building blocks RISC pb2l) has been proposed to support modelling and implementation of parallel frameworks. In this work we demonstrate how that same parallel building block set may be used to model both general purpose parallel programming abstractions, not usually listed in classical skeleton sets, and more specialized domain specific parallel patterns. We show how an implementation of RISC pb2 l can be realised via the FastFlow framework and present experimental evidence of the feasibility and efficiency of the approach.
Resumo:
Electrolytic capacitors are extensively used in power converters but they are bulky, unreliable, and have short lifetimes. This paper proposes a new capacitor-free high step-up dc-dc converter design for renewable energy applications such as photovoltaics (PVs) and fuel cells. The primary side of the converter includes three interleaved inductors, three main switches, and an active clamp circuit. As a result, the input current ripple is greatly reduced, eliminating the necessity for an input capacitor. In addition, zero voltage switching (ZVS) is achieved during switching transitions for all active switches, so that switching losses can be greatly reduced. Furthermore, a three-phase modular structure and six pulse rectifiers are employed to reduce the output voltage ripple. Since magnetic energy stored in the leakage inductance is recovered, the reverse-recovery issue of the diodes is effectively solved. The proposed converter is justified by simulation and experimental tests on a 1-kW prototype.
Resumo:
The overall aim of the work presented in this paper has been to develop Montgomery modular multiplication architectures suitable for implementation on modern reconfigurable hardware. Accordingly, novel high-radix systolic array Montgomery multiplier designs are presented, as we believe that the inherent regular structure and absence of global interconnect associated with these, make them well-suited for implementation on modern FPGAs. Unlike previous approaches, each processing element (PE) comprises both an adder and a multiplier. The inclusion of a multiplier in the PE means that the need to pre-compute or store any multiples of the operands is avoided. This also allows very high-radix implementations to be realised, further reducing the amount of clock cycles per modular multiplication, while still maintaining a competitive critical delay. For demonstrative purposes, 512-bit and 1024-bit FPGA implementations using radices of 2(8) and 2(16) are presented. The subsequent throughput rates are the fastest reported to date.
Resumo:
In this paper we extend the minimum-cost network flow approach to multi-target tracking, by incorporating a motion model, allowing the tracker to better cope with longterm occlusions and missed detections. In our new method, the tracking problem is solved iteratively: Firstly, an initial tracking solution is found without the help of motion information. Given this initial set of tracklets, the motion at each detection is estimated, and used to refine the tracking solution.
Finally, special edges are added to the tracking graph, allowing a further revised tracking solution to be found, where distant tracklets may be linked based on motion similarity. Our system has been tested on the PETS S2.L1 and Oxford town-center sequences, outperforming the baseline system, and achieving results comparable with the current state of the art.
Resumo:
Approximate execution is a viable technique for energy-con\-strained environments, provided that applications have the mechanisms to produce outputs of the highest possible quality within the given energy budget.
We introduce a framework for energy-constrained execution with controlled and graceful quality loss. A simple programming model allows users to express the relative importance of computations for the quality of the end result, as well as minimum quality requirements. The significance-aware runtime system uses an application-specific analytical energy model to identify the degree of concurrency and approximation that maximizes quality while meeting user-specified energy constraints. Evaluation on a dual-socket 8-core server shows that the proposed
framework predicts the optimal configuration with high accuracy, enabling energy-constrained executions that result in significantly higher quality compared to loop perforation, a compiler approximation technique.