Biblioteca Digital

777 resultados para parallel-machine

ADAM: A Decentralized Parallel Computer Architecture Featuring Fast Thread and Data Migration and a Uniform Hardware Abstraction

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The furious pace of Moore's Law is driving computer architecture into a realm where the the speed of light is the dominant factor in system latencies. The number of clock cycles to span a chip are increasing, while the number of bits that can be accessed within a clock cycle is decreasing. Hence, it is becoming more difficult to hide latency. One alternative solution is to reduce latency by migrating threads and data, but the overhead of existing implementations has previously made migration an unserviceable solution so far. I present an architecture, implementation, and mechanisms that reduces the overhead of migration to the point where migration is a viable supplement to other latency hiding mechanisms, such as multithreading. The architecture is abstract, and presents programmers with a simple, uniform fine-grained multithreaded parallel programming model with implicit memory management. In other words, the spatial nature and implementation details (such as the number of processors) of a parallel machine are entirely hidden from the programmer. Compiler writers are encouraged to devise programming languages for the machine that guide a programmer to express their ideas in terms of objects, since objects exhibit an inherent physical locality of data and code. The machine implementation can then leverage this locality to automatically distribute data and threads across the physical machine by using a set of high performance migration mechanisms. An implementation of this architecture could migrate a null thread in 66 cycles -- over a factor of 1000 improvement over previous work. Performance also scales well; the time required to move a typical thread is only 4 to 5 times that of a null thread. Data migration performance is similar, and scales linearly with data block size. Since the performance of the migration mechanism is on par with that of an L2 cache, the implementation simulated in my work has no data caches and relies instead on multithreading and the migration mechanism to hide and reduce access latencies.

Topological analysis of singularity loci for serial and parallel manipulators

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Singularities of robot manipulators have been intensely studied in the last decades by researchers of many fields. Serial singularities produce some local loss of dexterity of the manipulator, therefore it might be desirable to search for singularityfree trajectories in the jointspace. On the other hand, parallel singularities are very dangerous for parallel manipulators, for they may provoke the local loss of platform control, and jeopardize the structural integrity of links or actuators. It is therefore utterly important to avoid parallel singularities, while operating a parallel machine. Furthermore, there might be some configurations of a parallel manipulators that are allowed by the constraints, but nevertheless are unreachable by any feasible path. The present work proposes a numerical procedure based upon Morse theory, an important branch of differential topology. Such procedure counts and identify the singularity-free regions that are cut by the singularity locus out of the configuration space, and the disjoint regions composing the configuration space of a parallel manipulator. Moreover, given any two configurations of a manipulator, a feasible or a singularity-free path connecting them can always be found, or it can be proved that none exists. Examples of applications to 3R and 6R serial manipulators, to 3UPS and 3UPU parallel wrists, to 3UPU parallel translational manipulators, and to 3RRR planar manipulators are reported in the work.

Dynamic load balancing of distributed memory parallel computational mechanics using unstructured meshes for multi-physical modelling

Relevância:

70.00% 70.00%

Publicador:

Resumo:

As the complexity of parallel applications increase, the performance limitations resulting from computational load imbalance become dominant. Mapping the problem space to the processors in a parallel machine in a manner that balances the workload of each processors will typically reduce the run-time. In many cases the computation time required for a given calculation cannot be predetermined even at run-time and so static partition of the problem returns poor performance. For problems in which the computational load across the discretisation is dynamic and inhomogeneous, for example multi-physics problems involving fluid and solid mechanics with phase changes, the workload for a static subdomain will change over the course of a computation and cannot be estimated beforehand. For such applications the mapping of loads to process is required to change dynamically, at run-time in order to maintain reasonable efficiency. The issue of dynamic load balancing are examined in the context of PHYSICA, a three dimensional unstructured mesh multi-physics continuum mechanics computational modelling code.

An O(N) Algorithm for Three-Dimensional N-Body Simulations

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We develop an algorithm that computes the gravitational potentials and forces on N point-masses interacting in three-dimensional space. The algorithm, based on analytical techniques developed by Rokhlin and Greengard, runs in order N time. In contrast to other fast N-body methods such as tree codes, which only approximate the interaction potentials and forces, this method is exact ?? computes the potentials and forces to within any prespecified tolerance up to machine precision. We present an implementation of the algorithm for a sequential machine. We numerically verify the algorithm, and compare its speed with that of an O(N2) direct force computation. We also describe a parallel version of the algorithm that runs on the Connection Machine in order 0(logN) time. We compare experimental results with those of the sequential implementation and discuss how to minimize communication overhead on the parallel machine.

Limitantes inferiores par ao problema de dimensionamento de lotes em máquinas paralelas

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Lot production size problem and simulation of urban transport

Relevância:

60.00% 60.00%

Publicador:

Resumo:

OBJECTIVES AND STUDY METHOD: There are two subjects in this thesis: “Lot production size for a parallel machine scheduling problem with auxiliary equipment” and “Bus holding for a simulated traffic network”. Although these two themes seem unrelated, the main idea is the optimization of complex systems. The “Lot production size for a parallel machine scheduling problem with auxiliary equipment” deals with a manufacturing setting where sets of pieces form finished products. The aim is to maximize the profit of the finished products. Each piece may be processed in more than one mold. Molds must be mounted on machines with their corresponding installation setup times. The key point of our methodology is to solve the single period lot-sizing decisions for the finished products together with the piece-mold and the mold-machine assignments, relaxing the constraint that a single mold may not be used in two machines at the same time. For the “Bus holding for a simulated traffic network” we deal with One of the most annoying problems in urban bus operations is bus bunching, which happens when two or more buses arrive at a stop nose to tail. Bus bunching reflects an unreliable service that affects transit operations by increasing passenger-waiting times. This work proposes a linear mathematical programming model that establishes bus holding times at certain stops along a transit corridor to avoid bus bunching. Our approach needs real-time input, so we simulate a transit corridor and apply our mathematical model to the data generated. Thus, the inherent variability of a transit system is considered by the simulation, while the optimization model takes into account the key variables and constraints of the bus operation. CONTRIBUTIONS AND CONCLUSIONS: For the “Lot production size for a parallel machine scheduling problem with auxiliary equipment” the relaxation we propose able to find solutions more efficiently, moreover our experimental results show that most of the solutions verify that molds are non-overlapping even if they are installed on several machines. We propose an exact integer linear programming, a Relax&Fix heuristic, and a multistart greedy algorithm to solve this problem. Experimental results on instances based on real-world data show the efficiency of our approaches. The mathematical model and the algorithm for the lot production size problem, showed in this research, can be used for production planners to help in the scheduling of the manufacturing. For the “Bus holding for a simulated traffic network” most of the literature considers quadratic models that minimize passenger-waiting times, but they are harder to solve and therefore difficult to operate by real-time systems. On the other hand, our methodology reduces passenger-waiting times efficiently given our linear programming model, with the characteristic of applying control intervals just every 5 minutes.

Permanent Magnet Synchronous Machine for Parallel Hybrid Vehicle

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The aim of this thesis is to describe hybrid drive design problems, the advantages and difficulties related to the drive. A review of possible hybrid constructions, benefits of parallel, series and series-parallel hybrids is done. In the thesis analytical and finite element calculations of permanent magnet synchronous machines with embedded magnets were done. The finite element calculations were done using Cedrat’s Flux 2D software. This machine is planned to be used as a motor-generator in a low power parallel hybrid vehicle. The boundary conditions for the design were found from Lucas-TVS Ltd., India. Design Requirements, briefly: • The system DC voltage level is 120 V, which implies Uphase = 49 V (RMS) in a three phase system. • The power output of 10 kW at base speed 1500 rpm (Torque of 65 Nm) is desired. • The maximum outer diameter should not be more than 250 mm, and the maximum core length should not exceed 40 mm. The main difficulties which the author met were the dimensional restrictions. After having designed and analyzed several possible constructions they were compared and the final design selected. Dimensioned and detailed design is performed. Effects of different parameters, such as the number of poles, number of turns and magnetic geometry are discussed. The best modification offers considerable reduction of volume.

An abstract machine for restricted and-parallel execution of logic programs

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Although the sequential execution speed of logic programs has been greatly improved by the concepts introduced in the Warren Abstract Machine (WAM), parallel execution represents the only way to increase this speed beyond the natural limits of sequential systems. However, most proposed parallel logic programming execution models lack the performance optimizations and storage efficiency of sequential systems. This paper presents a parallel abstract machine which is an extension of the WAM and is thus capable of supporting ANDParallelism without giving up the optimizations present in sequential implementations. A suitable instruction set, which can be used as a target by a variety of logic programming languages, is also included. Special instructions are provided to support a generalized version of "Restricted AND-Parallelism" (RAP), a technique which reduces the overhead traditionally associated with the run-time management of variable binding conflicts to a series of simple run-time checks, which select one out of a series of compiled execution graphs.

An abstract machine based execution model for computer architecture design and efficient implementation of logic programs in parallel.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The term "Logic Programming" refers to a variety of computer languages and execution models which are based on the traditional concept of Symbolic Logic. The expressive power of these languages offers promise to be of great assistance in facing the programming challenges of present and future symbolic processing applications in Artificial Intelligence, Knowledge-based systems, and many other areas of computing. The sequential execution speed of logic programs has been greatly improved since the advent of the first interpreters. However, higher inference speeds are still required in order to meet the demands of applications such as those contemplated for next generation computer systems. The execution of logic programs in parallel is currently considered a promising strategy for attaining such inference speeds. Logic Programming in turn appears as a suitable programming paradigm for parallel architectures because of the many opportunities for parallel execution present in the implementation of logic programs. This dissertation presents an efficient parallel execution model for logic programs. The model is described from the source language level down to an "Abstract Machine" level suitable for direct implementation on existing parallel systems or for the design of special purpose parallel architectures. Few assumptions are made at the source language level and therefore the techniques developed and the general Abstract Machine design are applicable to a variety of logic (and also functional) languages. These techniques offer efficient solutions to several areas of parallel Logic Programming implementation previously considered problematic or a source of considerable overhead, such as the detection and handling of variable binding conflicts in AND-Parallelism, the specification of control and management of the execution tree, the treatment of distributed backtracking, and goal scheduling and memory management issues, etc. A parallel Abstract Machine design is offered, specifying data areas, operation, and a suitable instruction set. This design is based on extending to a parallel environment the techniques introduced by the Warren Abstract Machine, which have already made very fast and space efficient sequential systems a reality. Therefore, the model herein presented is capable of retaining sequential execution speed similar to that of high performance sequential systems, while extracting additional gains in speed by efficiently implementing parallel execution. These claims are supported by simulations of the Abstract Machine on sample programs.

Integrating structural and input design of a 2-DOF high-speed parallel manipulator: A flexible model-based approach

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper discusses the integrated design of parallel manipulators, which exhibit varying dynamics. This characteristic affects the machine stability and performance. The design methodology consists of four main steps: (i) the system modeling using flexible multibody technique, (ii) the synthesis of reduced-order models suitable for control design, (iii) the systematic flexible model-based input signal design, and (iv) the evaluation of some possible machine designs. The novelty in this methodology is to take structural flexibilities into consideration during the input signal design; therefore, enhancing the standard design process which mainly considers rigid bodies dynamics. The potential of the proposed strategy is exploited for the design evaluation of a two degree-of-freedom high-speed parallel manipulator. The results are experimentally validated. (C) 2010 Elsevier Ltd. All rights reserved.

Parallel processing and image analysis in the eyes of mantis shrimps

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The compound eyes of mantis shrimps, a group of tropical marine crustaceans, incorporate principles of serial and parallel processing of visual information that may be applicable to artificial imaging systems. Their eyes include numerous specializations for analysis of the spectral and polarizational properties of light, and include more photoreceptor classes for analysis of ultraviolet light, color, and polarization than occur in any other known visual system. This is possible because receptors in different regions of the eye are anatomically diverse and incorporate unusual structural features, such as spectral filters, not seen in other compound eyes. Unlike eyes of most other animals, eyes of mantis shrimps must move to acquire some types of visual information and to integrate color and polarization with spatial vision. Information leaving the retina appears to be processed into numerous parallel data streams leading into the central nervous system, greatly reducing the analytical requirements at higher levels. Many of these unusual features of mantis shrimp vision may inspire new sensor designs for machine vision

Parallel texts alignment

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Trabalho apresentado no âmbito do Mestrado em Engenharia Informática, como requisito parcial para obtenção do grau de Mestre em Engenharia Informática

Including the workload effect in the parallel program signature

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Performance prediction and application behavior modeling have been the subject of exten- sive research that aim to estimate applications performance with an acceptable precision. A novel approach to predict the performance of parallel applications is based in the con- cept of Parallel Application Signatures that consists in extract an application most relevant parts (phases) and the number of times they repeat (weights). Executing these phases in a target machine and multiplying its exeuction time by its weight an estimation of the application total execution time can be made. One of the problems is that the performance of an application depends on the program workload. Every type of workload affects differently how an application performs in a given system and so affects the signature execution time. Since the workloads used in most scientific parallel applications have dimensions and data ranges well known and the behavior of these applications are mostly deterministic, a model of how the programs workload affect its performance can be obtained. We create a new methodology to model how a program’s workload affect the parallel application signature. Using regression analysis we are able to generalize each phase time execution and weight function to predict an application performance in a target system for any type of workload within predefined range. We validate our methodology using a synthetic program, benchmarks applications and well known real scientific applications.

Torque vibration model of axial-flux surface-mounted permanent magnet synchronous machine

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In order that the radius and thus ununiform structure of the teeth and otherelectrical and magnetic parts of the machine may be taken into consideration the calculation of an axial flux permanent magnet machine is, conventionally, doneby means of 3D FEM-methods. This calculation procedure, however, requires a lotof time and computer recourses. This study proves that also analytical methods can be applied to perform the calculation successfully. The procedure of the analytical calculation can be summarized into following steps: first the magnet is divided into slices, which makes the calculation for each section individually, and then the parts are submitted to calculation of the final results. It is obvious that using this method can save a lot of designing and calculating time. Thecalculation program is designed to model the magnetic and electrical circuits of surface mounted axial flux permanent magnet synchronous machines in such a way, that it takes into account possible magnetic saturation of the iron parts. Theresult of the calculation is the torque of the motor including the vibrations. The motor geometry and the materials and either the torque or pole angle are defined and the motor can be fed with an arbitrary shape and amplitude of three-phase currents. There are no limits for the size and number of the pole pairs nor for many other factors. The calculation steps and the number of different sections of the magnet are selectable, but the calculation time is strongly depending on this. The results are compared to the measurements of real prototypes. The permanent magnet creates part of the flux in the magnetic circuit. The form and amplitude of the flux density in the air-gap depends on the geometry and material of the magnetic circuit, on the length of the air-gap and remanence flux density of the magnet. Slotting is taken into account by using the Carter factor in the slot opening area. The calculation is simple and fast if the shape of the magnetis a square and has no skew in relation to the stator slots. With a more complicated magnet shape the calculation has to be done in several sections. It is clear that according to the increasing number of sections also the result will become more accurate. In a radial flux motor all sections of the magnets create force with a same radius. In the case of an axial flux motor, each radial section creates force with a different radius and the torque is the sum of these. The magnetic circuit of the motor, consisting of the stator iron, rotor iron, air-gap, magnet and the slot, is modelled with a reluctance net, which considers the saturation of the iron. This means, that several iterations, in which the permeability is updated, has to be done in order to get final results. The motor torque is calculated using the instantaneous linkage flux and stator currents. Flux linkage is called the part of the flux that is created by the permanent magnets and the stator currents passing through the coils in stator teeth. The angle between this flux and the phase currents define the torque created by the magnetic circuit. Due to the winding structure of the stator and in order to limit the leakage flux the slot openings of the stator are normally not made of ferromagnetic material even though, in some cases, semimagnetic slot wedges are used. In the slot opening faces the flux enters the iron almost normally (tangentially with respect to the rotor flux) creating tangential forces in the rotor. This phenomenon iscalled cogging. The flux in the slot opening area on the different sides of theopening and in the different slot openings is not equal and so these forces do not compensate each other. In the calculation it is assumed that the flux entering the left side of the opening is the component left from the geometrical centre of the slot. This torque component together with the torque component calculated using the Lorenz force make the total torque of the motor. It is easy to assume that when all the magnet edges, where the derivative component of the magnet flux density is at its highest, enter the slot openings at the same time, this will have as a result a considerable cogging torque. To reduce the cogging torquethe magnet edges can be shaped so that they are not parallel to the stator slots, which is the common way to solve the problem. In doing so, the edge may be spread along the whole slot pitch and thus also the high derivative component willbe spread to occur equally along the rotation. Besides forming the magnets theymay also be placed somewhat asymmetric on the rotor surface. The asymmetric distribution can be made in many different ways. All the magnets may have a different deflection of the symmetrical centre point or they can be for example shiftedin pairs. There are some factors that limit the deflection. The first is that the magnets cannot overlap. The magnet shape and the relative width compared to the pole define the deflection in this case. The other factor is that a shifting of the poles limits the maximum torque of the motor. If the edges of adjacent magnets are very close to each other the leakage flux from one pole to the other increases reducing thus the air-gap magnetization. The asymmetric model needs some assumptions and simplifications in order to limit the size of the model and calculation time. The reluctance net is made for symmetric distribution. If the magnets are distributed asymmetrically the flux in the different pole pairs will not be exactly the same. Therefore, the assumption that the flux flows from the edges of the model to the next pole pairs, in the calculation model from one edgeto the other, is not correct. If it were wished for that this fact should be considered in multi-pole pair machines, this would mean that all the poles, in other words the whole machine, should be modelled in reluctance net. The error resulting from this wrong assumption is, nevertheless, irrelevant.

Error Modeling and Accuracy Analysis of a Novel Mobile Hybrid Parallel Robot

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Over the last decades, calibration techniques have been widely used to improve the accuracy of robots and machine tools since they only involve software modification instead of changing the design and manufacture of the hardware. Traditionally, there are four steps are required for a calibration, i.e. error modeling, measurement, parameter identification and compensation. The objective of this thesis is to propose a method for the kinematics analysis and error modeling of a newly developed hybrid redundant robot IWR (Intersector Welding Robot), which possesses ten degrees of freedom (DOF) where 6-DOF in parallel and additional 4-DOF in serial. In this article, the problem of kinematics modeling and error modeling of the proposed IWR robot are discussed. Based on the vector arithmetic method, the kinematics model and the sensitivity model of the end-effector subject to the structure parameters is derived and analyzed. The relations between the pose (position and orientation) accuracy and manufacturing tolerances, actuation errors, and connection errors are formulated. Computer simulation is performed to examine the validity and effectiveness of the proposed method.

«
1
2
3
4
5
6
7
8
...
51
52
»