855 resultados para Parallel Computation


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Includes bibliographies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work was partially supported by the Bulgarian National Science Fund under Contract No MM 1405. Part of the results were announced at the Fifth International Workshop on Optimal Codes and Related Topics (OCRT), White Lagoon, June 2007, Bulgaria

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This research focuses on automatically adapting a search engine size in response to fluctuations in query workload. Deploying a search engine in an Infrastructure as a Service (IaaS) cloud facilitates allocating or deallocating computer resources to or from the engine. Our solution is to contribute an adaptive search engine that will repeatedly re-evaluate its load and, when appropriate, switch over to a dierent number of active processors. We focus on three aspects and break them out into three sub-problems as follows: Continually determining the Number of Processors (CNP), New Grouping Problem (NGP) and Regrouping Order Problem (ROP). CNP means that (in the light of the changes in the query workload in the search engine) there is a problem of determining the ideal number of processors p active at any given time to use in the search engine and we call this problem CNP. NGP happens when changes in the number of processors are determined and it must also be determined which groups of search data will be distributed across the processors. ROP is how to redistribute this data onto processors while keeping the engine responsive and while also minimising the switchover time and the incurred network load. We propose solutions for these sub-problems. For NGP we propose an algorithm for incrementally adjusting the index to t the varying number of virtual machines. For ROP we present an ecient method for redistributing data among processors while keeping the search engine responsive. Regarding the solution for CNP, we propose an algorithm determining the new size of the search engine by re-evaluating its load. We tested the solution performance using a custom-build prototype search engine deployed in the Amazon EC2 cloud. Our experiments show that when we compare our NGP solution with computing the index from scratch, the incremental algorithm speeds up the index computation 2{10 times while maintaining a similar search performance. The chosen redistribution method is 25% to 50% faster than other methods and reduces the network load around by 30%. For CNP we present a deterministic algorithm that shows a good ability to determine a new size of search engine. When combined, these algorithms give an adapting algorithm that is able to adjust the search engine size with a variable workload.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This book constitutes the refereed proceedings of the 14th International Conference on Parallel Problem Solving from Nature, PPSN 2016, held in Edinburgh, UK, in September 2016. The total of 93 revised full papers were carefully reviewed and selected from 224 submissions. The meeting began with four workshops which offered an ideal opportunity to explore specific topics in intelligent transportation Workshop, landscape-aware heuristic search, natural computing in scheduling and timetabling, and advances in multi-modal optimization. PPSN XIV also included sixteen free tutorials to give us all the opportunity to learn about new aspects: gray box optimization in theory; theory of evolutionary computation; graph-based and cartesian genetic programming; theory of parallel evolutionary algorithms; promoting diversity in evolutionary optimization: why and how; evolutionary multi-objective optimization; intelligent systems for smart cities; advances on multi-modal optimization; evolutionary computation in cryptography; evolutionary robotics - a practical guide to experiment with real hardware; evolutionary algorithms and hyper-heuristics; a bridge between optimization over manifolds and evolutionary computation; implementing evolutionary algorithms in the cloud; the attainment function approach to performance evaluation in EMO; runtime analysis of evolutionary algorithms: basic introduction; meta-model assisted (evolutionary) optimization. The papers are organized in topical sections on adaption, self-adaption and parameter tuning; differential evolution and swarm intelligence; dynamic, uncertain and constrained environments; genetic programming; multi-objective, many-objective and multi-level optimization; parallel algorithms and hardware issues; real-word applications and modeling; theory; diversity and landscape analysis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A parallel pipelined array of cells suitable for realtime computation of histograms is proposed. The cell architecture builds on previous work to now allow operating on a stream of data at 1 pixel per clock cycle. This new cell is more suitable for interfacing to camera sensors or to microprocessors of 8-bit data buses which are common in consumer digital cameras. Arrays using the new proposed cells are obtained via C-slow retiming techniques and can be clocked at a 65% faster frequency than previous arrays. This achieves over 80% of the performance of two-pixel per clock cycle parallel pipelined arrays.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We advocate the Loop-of-stencil-reduce pattern as a means of simplifying the implementation of data-parallel programs on heterogeneous multi-core platforms. Loop-of-stencil-reduce is general enough to subsume map, reduce, map-reduce, stencil, stencil-reduce, and, crucially, their usage in a loop in both data-parallel and streaming applications, or a combination of both. The pattern makes it possible to deploy a single stencil computation kernel on different GPUs. We discuss the implementation of Loop-of-stencil-reduce in FastFlow, a framework for the implementation of applications based on the parallel patterns. Experiments are presented to illustrate the use of Loop-of-stencil-reduce in developing data-parallel kernels running on heterogeneous systems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

[EN]The increasing use of microstrip technology require more accurate analysis methods like full wave method of moments. However, this involves a great computational effort. To reduce the computation time, an alternative parallel method to analyze irregular microstrip structures is presented in this paper. This method calculates the unknown surface current on the planar structure trough a irregular rectangular division using basis and weighted functions. The parallel algorithm performs the calculus of a [Z] matrix and then solves the system using current densities as the unknowns. This parallel program was implemented in the IBM-SP2 using MPI library.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present Dithen, a novel computation-as-a-service (CaaS) cloud platform specifically tailored to the parallel ex-ecution of large-scale multimedia tasks. Dithen handles the upload/download of both multimedia data and executable items, the assignment of compute units to multimedia workloads, and the reactive control of the available compute units to minimize the cloud infrastructure cost under deadline-abiding execution. Dithen combines three key properties: (i) the reactive assignment of individual multimedia tasks to available computing units according to availability and predetermined time-to-completion constraints; (ii) optimal resource estimation based on Kalman-filter estimates; (iii) the use of additive increase multiplicative decrease (AIMD) algorithms (famous for being the resource management in the transport control protocol) for the control of the number of units servicing workloads. The deployment of Dithen over Amazon EC2 spot instances is shown to be capable of processing more than 80,000 video transcoding, face detection and image processing tasks (equivalent to the processing of more than 116 GB of compressed data) for less than $1 in billing cost from EC2. Moreover, the proposed AIMD-based control mechanism, in conjunction with the Kalman estimates, is shown to provide for more than 27% reduction in EC2 spot instance cost against methods based on reactive resource estimation. Finally, Dithen is shown to offer a 38% to 500% reduction of the billing cost against the current state-of-the-art in CaaS platforms on Amazon EC2 (Amazon Lambda and Amazon Autoscale). A baseline version of Dithen is currently available at dithen.com.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

As the complexity of parallel applications increase, the performance limitations resulting from computational load imbalance become dominant. Mapping the problem space to the processors in a parallel machine in a manner that balances the workload of each processors will typically reduce the run-time. In many cases the computation time required for a given calculation cannot be predetermined even at run-time and so static partition of the problem returns poor performance. For problems in which the computational load across the discretisation is dynamic and inhomogeneous, for example multi-physics problems involving fluid and solid mechanics with phase changes, the workload for a static subdomain will change over the course of a computation and cannot be estimated beforehand. For such applications the mapping of loads to process is required to change dynamically, at run-time in order to maintain reasonable efficiency. The issue of dynamic load balancing are examined in the context of PHYSICA, a three dimensional unstructured mesh multi-physics continuum mechanics computational modelling code.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Computers employing some degree of data flow organisation are now well established as providing a possible vehicle for concurrent computation. Although data-driven computation frees the architecture from the constraints of the single program counter, processor and global memory, inherent in the classic von Neumann computer, there can still be problems with the unconstrained generation of fresh result tokens if a pure data flow approach is adopted. The advantages of allowing serial processing for those parts of a program which are inherently serial, and of permitting a demand-driven, as well as data-driven, mode of operation are identified and described. The MUSE machine described here is a structured architecture supporting both serial and parallel processing which allows the abstract structure of a program to be mapped onto the machine in a logical way.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis focuses on the dynamics of underactuated cable-driven parallel robots (UACDPRs), including various aspects of robotic theory and practice, such as workspace computation, parameter identification, and trajectory planning. After a brief introduction to CDPRs, UACDPR kinematic and dynamic models are analyzed, under the relevant assumption of inextensible cables. The free oscillatory motion of the end-effector (EE), which is a unique feature of underactuated mechanisms, is studied in detail, from both a kinematic and a dynamic perspective. The free (small) oscillations of the EE around equilibria are proved to be harmonic and the corresponding natural oscillation frequencies are analytically computed. UACDPR workspace computation and analysis are then performed. A new performance index is proposed for the analysis of the influence of actuator errors on cable tensions around equilibrium configurations, and a new type of workspace, called tension-error-insensitive, is defined as the set of poses that a UACDPR EE can statically attain even in presence of actuation errors, while preserving tensions between assigned (positive) bounds. EE free oscillations are then employed to conceive a novel procedure aimed at identifying the EE inertial parameters. This approach does not require the use of force or torque measurements. Moreover, a self-calibration procedure for the experimental determination of UACDPR initial cable lengths is developed, which consequently enables the robot to automatically infer the EE initial pose at machine start-up. Lastly, trajectory planning of UACDPRs is investigated. Two alternative methods are proposed, which aim at (i) reducing EE oscillations even when model parameters are uncertain or (ii) eliminate EE oscillations in case model parameters are perfectly known. EE oscillations are reduced in real-time by dynamically scaling a nominal trajectory and filtering it with an input shaper, whereas they can be eliminated if an off-line trajectory is computed that accounts for the system internal dynamics.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Underactuated cable-driven parallel robots (UACDPRs) shift a 6-degree-of-freedom end-effector (EE) with fewer than 6 cables. This thesis proposes a new automatic calibration technique that is applicable for under-actuated cable-driven parallel robots. The purpose of this work is to develop a method that uses free motion as an exciting trajectory for the acquisition of calibration data. The key point of this approach is to find a relationship between the unknown parameters to be calibrated (the lengths of the cables) and the parameters that could be measured by sensors (the swivel pulley angles measured by the encoders and roll-and-pitch angles measured by inclinometers on the platform). The equations involved are the geometrical-closure equations and the finite-difference velocity equations, solved using the least-squares algorithm. Simulations are performed on a parallel robot driven by 4 cables for validation. The final purpose of the calibration method is, still, the determination of the platform initial pose. As a consequence of underactuation, the EE is underconstrained and, for assigned cable lengths, the EE pose cannot be obtained by means of forward kinematics only. Hence, a direct-kinematics algorithm for a 4-cable UACDPR using redundant sensor measurements is proposed. The proposed method measures two orientation parameters of the EE besides cable lengths, in order to determine the other four pose variables, namely 3 position coordinates and one additional orientation parameter. Then, we study the performance of the direct-kinematics algorithm through the computation of the sensitivity of the direct-kinematics solution to measurement errors. Furthermore, position and orientation error upper limits are computed for bounded cable lengths errors resulting from the calibration procedure, and roll and pitch angles errors which are due to inclinometer inaccuracies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cable-driven parallel robots offer significant advantages in terms of workspace dimensions and payload capability. They are attractive for many industrial tasks to be performed on a large scale, such as handling and manufacturing, without a substantial increase in costs and mechanical complexity with respect to a small-scale application. However, since cables can only sustain tensile stresses, cable tensions must be kept within positive limits during the end-effector motion. This problem can be managed by overconstraining the end-effector and controlling cable tensions. Tension control is typically achieved by mounting a load sensor on all cables, and using specific control algorithms to avoid cable slackness or breakage while the end-effector is controlled in a desired position. These algorithms require multiple cascade control loops and they can be complex and computationally demanding. To simplify the control of overconstrained cable-driven parallel robots, this Thesis proposes suitable mechanical design and hybrid control strategies. It is shown how a convenient design of the cable guidance system allows kinematic modeling to be simplified, without introducing geometric approximations. This guidance system employs swiveling pulleys equipped with position and tension sensors and provides a parallelogram arrangement of cables. Furthermore, a hybrid force/position control in the robot joint space is adopted. According to this strategy, a particular set of cables is chosen to be tension-controlled, whereas the other cables are length-controlled. The force-controlled cables are selected based on the computation of a novel index called force-distribution sensitivity to cable-tension errors. This index aims to evaluate the maximum expected cable-tension error in the length-controlled cables if a unit tension error is committed in the force-controlled cables. In practice, the computation of the force-distribution sensitivity allows determining which cables are best to be force-controlled, to ensure the lowest error in the overall force distribution when a hybrid force/position joint-space strategy is used.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Continuum parallel robots (CPRs) are manipulators employing multiple flexible beams arranged in parallel and connected to a rigid end-effector. CPRs promise higher payload and accuracy than serial CRs while keeping great flexibility. As the risk of injury during accidental contacts between a human and a CPR should be reduced, CPRs may be used in large-scale collaborative tasks or assisted robotic surgery. There exist various CPR designs, but the prototype conception is rarely based on performance considerations, and the CPRs realization in mainly based on intuitions or rigid-link parallel manipulators architectures. This thesis focuses on the performance analysis of CPRs, and the tools needed for such evaluation, such as workspace computation algorithms. In particular, workspace computation strategies for CPRs are essential for the performance assessment, since the CPRs workspace may be used as a performance index or it can serve for optimal-design tools. Two new workspace computation algorithms are proposed in this manuscript, the former focusing on the workspace volume computation and the certification of its numerical results, while the latter aims at computing the workspace boundary only. Due to the elastic nature of CPRs, a key performance indicator for these robots is the stability of their equilibrium configurations. This thesis proposes the experimental validation of the equilibrium stability assessment on a real prototype, demonstrating limitations of some commonly used assumptions. Additionally, a performance index measuring the distance to instability is originally proposed in this manuscript. Differently from the majority of the existing approaches, the clear advantage of the proposed index is a sound physical meaning; accordingly, the index can be used for a more straightforward performance quantification, and to derive robot specifications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Modern High-Performance Computing HPC systems are gradually increasing in size and complexity due to the correspondent demand of larger simulations requiring more complicated tasks and higher accuracy. However, as side effects of the Dennard’s scaling approaching its ultimate power limit, the efficiency of software plays also an important role in increasing the overall performance of a computation. Tools to measure application performance in these increasingly complex environments provide insights into the intricate ways in which software and hardware interact. The monitoring of the power consumption in order to save energy is possible through processors interfaces like Intel Running Average Power Limit RAPL. Given the low level of these interfaces, they are often paired with an application-level tool like Performance Application Programming Interface PAPI. Since several problems in many heterogeneous fields can be represented as a complex linear system, an optimized and scalable linear system solver algorithm can decrease significantly the time spent to compute its resolution. One of the most widely used algorithms deployed for the resolution of large simulation is the Gaussian Elimination, which has its most popular implementation for HPC systems in the Scalable Linear Algebra PACKage ScaLAPACK library. However, another relevant algorithm, which is increasing in popularity in the academic field, is the Inhibition Method. This thesis compares the energy consumption of the Inhibition Method and Gaussian Elimination from ScaLAPACK to profile their execution during the resolution of linear systems above the HPC architecture offered by CINECA. Moreover, it also collates the energy and power values for different ranks, nodes, and sockets configurations. The monitoring tools employed to track the energy consumption of these algorithms are PAPI and RAPL, that will be integrated with the parallel execution of the algorithms managed with the Message Passing Interface MPI.