991 resultados para parallel programs
Resumo:
Multi-core and many-core platforms are becoming increasingly heterogeneous and asymmetric. This significantly increases the porting and tuning effort required for parallel codes, which in turn often leads to a growing gap between peak machine power and actual application performance. In this work a first step toward the automated optimization of high level skeleton-based parallel code is discussed. The paper presents an abstract annotation model for skeleton programs aimed at formally describing suitable mapping of parallel activities on a high-level platform representation. The derived mapping and scheduling strategies are used to generate optimized run-time code. © 2013 Springer-Verlag Berlin Heidelberg.
Resumo:
The management of non-functional features (performance, security, power management, etc.) is traditionally a difficult, error prone task for programmers of parallel applications. To take care of these non-functional features, autonomic managers running policies represented as rules using sensors and actuators to monitor and transform a running parallel application may be used. We discuss an approach aimed at providing formal tool support to the integration of independently developed autonomic managers taking care of different non-functional concerns within the same parallel application. Our approach builds on the Behavioural Skeleton experience (autonomic management of non-functional features in structured parallel applications) and on previous results on conflict detection and resolution in rule-based systems. © 2013 Springer-Verlag Berlin Heidelberg.
Resumo:
Processor architectures has taken a turn towards many-core processors, which integrate multiple processing cores on a single chip to increase overall performance, and there are no signs that this trend will stop in the near future. Many-core processors are harder to program than multi-core and single-core processors due to the need of writing parallel or concurrent programs with high degrees of parallelism. Moreover, many-cores have to operate in a mode of strong scaling because of memory bandwidth constraints. In strong scaling increasingly finer-grain parallelism must be extracted in order to keep all processing cores busy.
Task dataflow programming models have a high potential to simplify parallel program- ming because they alleviate the programmer from identifying precisely all inter-task de- pendences when writing programs. Instead, the task dataflow runtime system detects and enforces inter-task dependences during execution based on the description of memory each task accesses. The runtime constructs a task dataflow graph that captures all tasks and their dependences. Tasks are scheduled to execute in parallel taking into account dependences specified in the task graph.
Several papers report important overheads for task dataflow systems, which severely limits the scalability and usability of such systems. In this paper we study efficient schemes to manage task graphs and analyze their scalability. We assume a programming model that supports input, output and in/out annotations on task arguments, as well as commutative in/out and reductions. We analyze the structure of task graphs and identify versions and generations as key concepts for efficient management of task graphs. Then, we present three schemes to manage task graphs building on graph representations, hypergraphs and lists. We also consider a fourth edge-less scheme that synchronizes tasks using integers. Analysis using micro-benchmarks shows that the graph representation is not always scalable and that the edge-less scheme introduces least overhead in nearly all situations.
Resumo:
This work presents a novel algorithm for decomposing NFA automata into one-state-active modules for parallel execution on Multiprocessor Systems on Chip (MP-SoC). Furthermore, performance related studies based on a 16-PE system for Snort, Bro and Linux-L7 regular expressions are presented. ©2009 IEEE.
Resumo:
Performance evaluation of parallel software and architectural exploration of innovative hardware support face a common challenge with emerging manycore platforms: they are limited by the slow running time and the low accuracy of software simulators. Manycore FPGA prototypes are difficult to build, but they offer great rewards. Software running on such prototypes runs orders of magnitude faster than current simulators. Moreover, researchers gain significant architectural insight during the modeling process. We use the Formic FPGA prototyping board [1], which specifically targets scalable and cost-efficient multi-board prototyping, to build and test a 64-board model of a 512-core, MicroBlaze-based, non-coherent hardware prototype with a full network-on-chip in a 3D-mesh topology. We expand the hardware architecture to include the ARM Versatile Express platforms and build a 520-core heterogeneous prototype of 8 Cortex-A9 cores and 512 MicroBlaze cores. We then develop an MPI library for the prototype and evaluate it extensively using several bare-metal and MPI benchmarks. We find that our processor prototype is highly scalable, models faithfully single-chip multicore architectures, and is a very efficient platform for parallel programming research, being 50,000 times faster than software simulation.
Resumo:
Technical market indicators are tools used by technical an- alysts to understand trends in trading markets. Technical (market) indicators are often calculated in real-time, as trading progresses. This paper presents a mathematically- founded framework for calculating technical indicators. Our framework consists of a domain specific language for the un- ambiguous specification of technical indicators, and a run- time system based on Click, for computing the indicators. We argue that our solution enhances the ease of program- ming due to aligning our domain-specific language to the mathematical description of technical indicators, and that it enables executing programs in kernel space for decreased latency, without exposing the system to users’ programming errors.
Resumo:
This article presents a systematic review of research on the achievement outcomes of all types of approaches to teaching science in elementary schools. Study inclusion criteria included use of randomized or matched control groups, a study duration of at least 4 weeks, and use of achievement measures independent of the experimental treatment. A total of 23 studies met these criteria. Among studies evaluating inquiry-based teaching approaches, programs that used science kits did not show positive outcomes on science achievement measures (weighted ES=+0.02 in 7 studies), but inquiry-based programs that emphasized professional development but not kits did show positive outcomes (weighted ES=+0.36 in 10 studies). Technological approaches integrating video and computer resources with teaching and cooperative learning showed positive outcomes in a few small, matched studies (ES=+0.42 in 6 studies). The review concludes that science teaching methods focused on enhancing teachers’ classroom instruction throughout the year, such as cooperative learning and science-reading integration, as well as approaches that give teachers technology tools to enhance instruction, have significant potential to improve science learning.
Resumo:
The cycle of the academic year impacts on efforts to refine and improve major group design-build-test (DBT) projects since the time to run and evaluate projects is generally a full calendar year. By definition these major projects have a high degree of complexity since they act as the vehicle for the application of a range of technical knowledge and skills. There is also often an extensive list of desired learning outcomes which extends to include professional skills and attributes such as communication and team working. It is contended that student project definition and operation, like any other designed product, requires a number of iterations to achieve optimisation. The problem however is that if this cycle takes four or more years then by the time a project’s operational structure is fine tuned it is quite possible that the project theme is no longer relevant. The majority of the students will also inevitably experience a sub-optimal project experience over the 5 year development period. It would be much better if the ratio were flipped so that in 1 year an optimised project definition could be achieved which had sufficient longevity that it could run in the same efficient manner for 4 further years. An increased number of parallel investigators would also enable more varied and adventurous project concepts to be examined than a single institution could undertake alone in the same time frame.
This work-in-progress paper describes a parallel processing methodology for the accelerated definition of new student DBT project concepts. This methodology has been devised and implemented by a number of CDIO partner institutions in the UK & Ireland region. An agreed project theme was operated in parallel in one academic year with the objective of replacing a multi-year iterative cycle. Additionally the close collaboration and peer learning derived from the interaction between the coordinating academics facilitated the development of faculty teaching skills in line with CDIO standard 10.
Resumo:
Following earlier work demonstrating the utility of Orc as a means of specifying and reasoning about grid applications we propose the enhancement of such specifications with metadata that provide a means to extend an Orc specification with implementation oriented information. We argue that such specifications provide a useful refinement step in allowing reasoning about implementation related issues ahead of actual implementation or even prototyping. As examples, we demonstrate how such extended specifications can be used for investigating security related issues and for evaluating the cost of handling grid resource faults. The approach emphasises a semi-formal style of reasoning that makes maximum use of programmer domain knowledge and experience.
Resumo:
This article proposes a closed-loop control scheme based on joint-angle feedback for cable-driven parallel manipulators (CDPMs), which is able to overcome various difficulties resulting from the flexible nature of the driven cables to achieve higher control accuracy. By introducing a unique structure design that accommodates built-in encoders in passive joints, the seven degrees of freedom (7-DOF) CDPM can obtain joint angle values without external sensing devices, and it is used for feedback control together with a proper closed-loop control algorithm. The control algorithm has been derived from the time differential of the kinematic formulation, which relates the joint angular velocities to the time derivative of cable lengths. In addition, the Lyapunov stability theory and Monte Carlo method have been used to mathematically verify the self-feedback control law that has tolerance for parameter errors. With the aid of co-simulation technique, the self-feedback closed-loop control is applied on a 7-DOF CDPM and it shows higher motion accuracy than the one with an open-loop control. The trajectory tracking experiment on the motion control of the 7-DOF CDPM demonstrated a good performance of the self-feedback control method.
Resumo:
A 3-DOF (degrees-of-freedom) multi-mode translational/spherical PM (parallel mechanism) with lockable joints is a novel reconfigurable PM. It has both 3-DOF spatial translational operation mode and 3-DOF spherical operation mode. This paper presents an approach to the type synthesis of translational/spherical PMs with lockable joints. Using the proposed approach, several 3-DOF translational/spherical PMs are obtained. It is found that these translational/spherical PMs do not encounter constraint singular configurations and self-motion of sub-chain of a leg during reconfiguration. The approach can also be used for synthesizing other classes of multi-mode PMs with lockable joints, multi-mode PMs with variable kinematic joints, partially decoupled PMs, and reconfigurable PMs with a reconfigurable platform.
Resumo:
Parallel robot (PR) is a mechanical system that utilized multiple computer-controlled limbs to support one common platform or end effector. Comparing to a serial robot, a PR generally has higher precision and dynamic performance and, therefore, can be applied to many applications. The PR research has attracted a lot of attention in the last three decades, but there are still many challenging issues to be solved before achieving PRs’ full potential. This chapter introduces the state-of-the-art PRs in the aspects of synthesis, design, analysis, and control. The future directions will also be discussed at the end.