161 resultados para NPB (NAS parallel benchmarks)
Resumo:
We propose a methodology for optimizing the execution of data parallel (sub-)tasks on CPU and GPU cores of the same heterogeneous architecture. The methodology is based on two main components: i) an analytical performance model for scheduling tasks among CPU and GPU cores, such that the global execution time of the overall data parallel pattern is optimized; and ii) an autonomic module which uses the analytical performance model to implement the data parallel computations in a completely autonomic way, requiring no programmer intervention to optimize the computation across CPU and GPU cores. The analytical performance model uses a small set of simple parameters to devise a partitioning-between CPU and GPU cores-of the tasks derived from structured data parallel patterns/algorithmic skeletons. The model takes into account both hardware related and application dependent parameters. It computes the percentage of tasks to be executed on CPU and GPU cores such that both kinds of cores are exploited and performance figures are optimized. The autonomic module, implemented in FastFlow, executes a generic map (reduce) data parallel pattern scheduling part of the tasks to the GPU and part to CPU cores so as to achieve optimal execution time. Experimental results on state-of-the-art CPU/GPU architectures are shown that assess both performance model properties and autonomic module effectiveness. © 2013 IEEE.
Resumo:
Bonded-in rod connections in timber possess many desirable attributes in terms of efficiency, manufacture, performance, aesthetics and cost. In recent years research has been conducted on such connections using fibre reinforced polymers (FRPs) as an alternative to steel. This research programme investigates the pull-out capacity of Basalt FRP rods bonded-in in low grade Irish Sitka Spruce. Embedded length is thought to be the most influential variable contributing to pull- out capacity of bonded-in rods after rod diameter. Previous work has established an optimum embedded length of 15 times the hole diameter. However, this work only considered the effects of axial stress on the bond using a pull-compression testing system which may have given an artificially high pull out capacity as bending effects were neglected. A hinge system was utilised that allows the effects of bending force to be taken in to consideration along with axial forces in a pull-out test. This paper describes an experimental programme where such pull-bending tests were carried out on samples constructed of 12mm diameter BFRP bars with a 2mm glueline thickness and embedded lengths between 80mm and 280mm bonded-in to low-grade timber with an epoxy resin. Nine repetitions of each were tested. A clear increase in pull-out strength was found with increasing embedded length.
Resumo:
For some time, the satisfiability formulae that have been the most difficult to solve for their size have been crafted to be unsatisfiable by the use of cardinality constraints. Recent solvers have introduced explicit checking of such constraints, rendering previously difficult formulae trivial to solve. A family of unsatisfiable formulae is described that is derived from the sgen4 family but cannot be solved using cardinality constraints detection and reasoning alone. These formulae were found to be the most difficult during the SAT2014 competition by a significant margin and include the shortest unsolved benchmark in the competition, sgen6-1200-5-1.cnf.
Resumo:
We introduce a new parallel pattern derived from a specific application domain and show how it turns out to have application beyond its domain of origin. The pool evolution pattern models the parallel evolution of a population subject to mutations and evolving in such a way that a given fitness function is optimized. The pattern has been demonstrated to be suitable for capturing and modeling the parallel patterns underpinning various evolutionary algorithms, as well as other parallel patterns typical of symbolic computation. In this paper we introduce the pattern, we discuss its implementation on modern multi/many core architectures and finally present experimental results obtained with FastFlow and Erlang implementations to assess its feasibility and scalability.
Resumo:
Inspired by the commercial application of the Exechon machine, this paper proposed a novel parallel kinematic machine (PKM) named Exe-Variant. By exchanging the sequence of kinematic pairs in each limb of the Exechon machine, the Exe-Variant PKM claims an arrangement of 2UPR/1SPR topology and consists of two identical UPR limbs and one SPR limb. The inverse kinematics of the 2UPR/1SPR parallel mechanism was firstly analyzed based on which a conceptual design of the Exe-Variant was carried out. Then an algorithm of reachable workspace searching for the Exe-Variant and the Exchon was proposed. Finally, the workspaces of two example systems of the Exechon and the Exe-Variant with approximate dimensions were numerically simulated and compared. The comparison shows that the Exe-Variant possesses a competitive workspace with the Exechon machine, indicating it can be used as a promising reconfigurable module in a hybrid 5-DOF machine tool system.
Resumo:
In order to carry out high-precision machining of aerospace structural components with large size, thin wall and complex surface, this paper proposes a novel parallel kinematic machine (PKM) and formulates its semi-analytical theoretical stiffness model considering gravitational effects that is verified by stiffness experiments. From the viewpoint of topology structure, the novel PKM consists of two substructures in terms of the redundant and overconstrained parallel mechanisms that are connected by two interlinked revolute joints. The theoretical stiffness model of the novel PKM is established based upon the virtual work principle and deformation superposition principle after mapping the stiffness models of substructures from joint space to operated space by Jacobian matrices and considering the deformation contributions of interlinked revolute joints to two substructures. Meanwhile, the component gravities are treated as external payloads exerting on the end reference point of the novel PKM resorting to static equivalence principle. This approach is proved by comparing the theoretical stiffness values with experimental stiffness values in the same configurations, which also indicates equivalent gravity can be employed to describe the actual distributed gravities in an acceptable accuracy manner. Finally, on the basis of the verified theoretical stiffness model, the stiffness distributions of the novel PKM are illustrated and the contributions of component gravities to the stiffness of the novel PKM are discussed.
Resumo:
Energy efficiency is an essential requirement for all contemporary computing systems. We thus need tools to measure the energy consumption of computing systems and to understand how workloads affect it. Significant recent research effort has targeted direct power measurements on production computing systems using on-board sensors or external instruments. These direct methods have in turn guided studies of software techniques to reduce energy consumption via workload allocation and scaling. Unfortunately, direct energy measurements are hampered by the low power sampling frequency of power sensors. The coarse granularity of power sensing limits our understanding of how power is allocated in systems and our ability to optimize energy efficiency via workload allocation.
We present ALEA, a tool to measure power and energy consumption at the granularity of basic blocks, using a probabilistic approach. ALEA provides fine-grained energy profiling via sta- tistical sampling, which overcomes the limitations of power sens- ing instruments. Compared to state-of-the-art energy measurement tools, ALEA provides finer granularity without sacrificing accuracy. ALEA achieves low overhead energy measurements with mean error rates between 1.4% and 3.5% in 14 sequential and paral- lel benchmarks tested on both Intel and ARM platforms. The sampling method caps execution time overhead at approximately 1%. ALEA is thus suitable for online energy monitoring and optimization. Finally, ALEA is a user-space tool with a portable, machine-independent sampling method. We demonstrate two use cases of ALEA, where we reduce the energy consumption of a k-means computational kernel by 37% and an ocean modelling code by 33%, compared to high-performance execution baselines, by varying the power optimization strategy between basic blocks.