813 resultados para Parallel Algorithms


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis defines Pi, a parallel architecture interface that separates model and machine issues, allowing them to be addressed independently. This provides greater flexibility for both the model and machine builder. Pi addresses a set of common parallel model requirements including low latency communication, fast task switching, low cost synchronization, efficient storage management, the ability to exploit locality, and efficient support for sequential code. Since Pi provides generic parallel operations, it can efficiently support many parallel programming models including hybrids of existing models. Pi also forms a basis of comparison for architectural components.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This report addresses the problem of acquiring objects using articulated robotic hands. Standard grasps are used to make the problem tractable, and a technique is developed for generalizing these standard grasps to increase their flexibility to variations in the problem geometry. A generalized grasp description is applied to a new problem situation using a parallel search through hand configuration space, and the result of this operation is a global overview of the space of good solutions. The techniques presented in this report have been implemented, and the results are verified using the Salisbury three-finger robotic hand.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Scheduling tasks to efficiently use the available processor resources is crucial to minimizing the runtime of applications on shared-memory parallel processors. One factor that contributes to poor processor utilization is the idle time caused by long latency operations, such as remote memory references or processor synchronization operations. One way of tolerating this latency is to use a processor with multiple hardware contexts that can rapidly switch to executing another thread of computation whenever a long latency operation occurs, thus increasing processor utilization by overlapping computation with communication. Although multiple contexts are effective for tolerating latency, this effectiveness can be limited by memory and network bandwidth, by cache interference effects among the multiple contexts, and by critical tasks sharing processor resources with less critical tasks. This thesis presents techniques that increase the effectiveness of multiple contexts by intelligently scheduling threads to make more efficient use of processor pipeline, bandwidth, and cache resources. This thesis proposes thread prioritization as a fundamental mechanism for directing the thread schedule on a multiple-context processor. A priority is assigned to each thread either statically or dynamically and is used by the thread scheduler to decide which threads to load in the contexts, and to decide which context to switch to on a context switch. We develop a multiple-context model that integrates both cache and network effects, and shows how thread prioritization can both maintain high processor utilization, and limit increases in critical path runtime caused by multithreading. The model also shows that in order to be effective in bandwidth limited applications, thread prioritization must be extended to prioritize memory requests. We show how simple hardware can prioritize the running of threads in the multiple contexts, and the issuing of requests to both the local memory and the network. Simulation experiments show how thread prioritization is used in a variety of applications. Thread prioritization can improve the performance of synchronization primitives by minimizing the number of processor cycles wasted in spinning and devoting more cycles to critical threads. Thread prioritization can be used in combination with other techniques to improve cache performance and minimize cache interference between different working sets in the cache. For applications that are critical path limited, thread prioritization can improve performance by allowing processor resources to be devoted preferentially to critical threads. These experimental results show that thread prioritization is a mechanism that can be used to implement a wide range of scheduling policies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis presents a new actuator system consisting of a micro-actuator and a macro-actuator coupled in parallel via a compliant transmission. The system is called the Parallel Coupled Micro-Macro Actuator, or PaCMMA. In this system, the micro-actuator is capable of high bandwidth force control due to its low mass and direct-drive connection to the output shaft. The compliant transmission of the macro-actuator reduces the impedance (stiffness) at the output shaft and increases the dynamic range of force. Performance improvement over single actuator systems was expected in force control, impedance control, force distortion and reduction of transient impact forces. A set of quantitative measures is proposed and the actuator system is evaluated against them: Force Control Bandwidth, Position Bandwidth, Dynamic Range, Impact Force, Impedance ("Backdriveability'"), Force Distortion and Force Performance Space. Several theoretical performance limits are derived from the saturation limits of the system. A control law is proposed and control system performance is compared to the theoretical limits. A prototype testbed was built using permanenent magnet motors and an experimental comparison was performed between this actuator concept and two single actuator systems. The following performance was observed: Force bandwidth of 56Hz, Torque Dynamic Range of 800:1, Peak Torque of 1040mNm, Minimum Torque of 1.3mNm. Peak Impact Force was reduced by an order of magnitude. Distortion at small amplitudes was reduced substantially. Backdriven impedance was reduced by 2-3 orders of magnitude. This actuator system shows promise for manipulator design as well as psychophysical tests of human performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The work described in this thesis began as an inquiry into the nature and use of optimization programs based on "genetic algorithms." That inquiry led, eventually, to three powerful heuristics that are broadly applicable in gradient-ascent programs: First, remember the locations of local maxima and restart the optimization program at a place distant from previously located local maxima. Second, adjust the size of probing steps to suit the local nature of the terrain, shrinking when probes do poorly and growing when probes do well. And third, keep track of the directions of recent successes, so as to probe preferentially in the direction of most rapid ascent. These algorithms lie at the core of a novel optimization program that illustrates the power to be had from deploying them together. The efficacy of this program is demonstrated on several test problems selected from a variety of fields, including De Jong's famous test-problem suite, the traveling salesman problem, the problem of coordinate registration for image guided surgery, the energy minimization problem for determining the shape of organic molecules, and the problem of assessing the structure of sedimentary deposits using seismic data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The furious pace of Moore's Law is driving computer architecture into a realm where the the speed of light is the dominant factor in system latencies. The number of clock cycles to span a chip are increasing, while the number of bits that can be accessed within a clock cycle is decreasing. Hence, it is becoming more difficult to hide latency. One alternative solution is to reduce latency by migrating threads and data, but the overhead of existing implementations has previously made migration an unserviceable solution so far. I present an architecture, implementation, and mechanisms that reduces the overhead of migration to the point where migration is a viable supplement to other latency hiding mechanisms, such as multithreading. The architecture is abstract, and presents programmers with a simple, uniform fine-grained multithreaded parallel programming model with implicit memory management. In other words, the spatial nature and implementation details (such as the number of processors) of a parallel machine are entirely hidden from the programmer. Compiler writers are encouraged to devise programming languages for the machine that guide a programmer to express their ideas in terms of objects, since objects exhibit an inherent physical locality of data and code. The machine implementation can then leverage this locality to automatically distribute data and threads across the physical machine by using a set of high performance migration mechanisms. An implementation of this architecture could migrate a null thread in 66 cycles -- over a factor of 1000 improvement over previous work. Performance also scales well; the time required to move a typical thread is only 4 to 5 times that of a null thread. Data migration performance is similar, and scales linearly with data block size. Since the performance of the migration mechanism is on par with that of an L2 cache, the implementation simulated in my work has no data caches and relies instead on multithreading and the migration mechanism to hide and reduce access latencies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider the often-studied problem of sorting, for a parallel computer. Given an input array distributed evenly over p processors, the task is to compute the sorted output array, also distributed over the p processors. Many existing algorithms take the approach of approximately load-balancing the output, leaving each processor with Θ(n/p) elements. However, in many cases, approximate load-balancing leads to inefficiencies in both the sorting itself and in further uses of the data after sorting. We provide a deterministic parallel sorting algorithm that uses parallel selection to produce any output distribution exactly, particularly one that is perfectly load-balanced. Furthermore, when using a comparison sort, this algorithm is 1-optimal in both computation and communication. We provide an empirical study that illustrates the efficiency of exact data splitting, and shows an improvement over two sample sort algorithms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Bibliography: p. 22-24.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents the research and development of a 3-legged micro Parallel Kinematic Manipulator (PKM) for positioning in micro-machining and assembly operations. The structural characteristics associated with parallel manipulators are evaluated and the PKMs with translational and rotational movements are identified. Based on these identifications, a hybrid 3-UPU (Universal Joint-Prismatic Joint-Universal Joint) parallel manipulator is designed and fabricated. The principles of the operation and modeling of this micro PKM is largely similar to a normal size Stewart Platform (SP). A modular design methodology is introduced for the construction of this micro PKM. Calibration results of this hybrid 3-UPU PKM are discussed in this paper.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present an optimal methodology for synchronized scheduling of production assembly with air transportation to achieve accurate delivery with minimized cost in consumer electronics supply chain (CESC). This problem was motivated by a major PC manufacturer in consumer electronics industry, where it is required to schedule the delivery requirements to meet the customer needs in different parts of South East Asia. The overall problem is decomposed into two sub-problems which consist of an air transportation allocation problem and an assembly scheduling problem. The air transportation allocation problem is formulated as a Linear Programming Problem with earliness tardiness penalties for job orders. For the assembly scheduling problem, it is basically required to sequence the job orders on the assembly stations to minimize their waiting times before they are shipped by flights to their destinations. Hence the second sub-problem is modelled as a scheduling problem with earliness penalties. The earliness penalties are assumed to be independent of the job orders.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper a novel methodology aimed at minimizing the probability of network failure and the failure impact (in terms of QoS degradation) while optimizing the resource consumption is introduced. A detailed study of MPLS recovery techniques and their GMPLS extensions are also presented. In this scenario, some features for reducing the failure impact and offering minimum failure probabilities at the same time are also analyzed. Novel two-step routing algorithms using this methodology are proposed. Results show that these methods offer high protection levels with optimal resource consumption

Relevância:

20.00% 20.00%

Publicador:

Resumo:

IP based networks still do not have the required degree of reliability required by new multimedia services, achieving such reliability will be crucial in the success or failure of the new Internet generation. Most of existing schemes for QoS routing do not take into consideration parameters concerning the quality of the protection, such as packet loss or restoration time. In this paper, we define a new paradigm to develop new protection strategies for building reliable MPLS networks, based on what we have called the network protection degree (NPD). This NPD consists of an a priori evaluation, the failure sensibility degree (FSD), which provides the failure probability and an a posteriori evaluation, the failure impact degree (FID), to determine the impact on the network in case of failure. Having mathematical formulated these components, we point out the most relevant components. Experimental results demonstrate the benefits of the utilization of the NPD, when used to enhance some current QoS routing algorithms to offer a certain degree of protection

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes a multicast implementation based on adaptive routing with anticipated calculation. Three different cost measures for a point-to-multipoint connection: bandwidth cost, connection establishment cost and switching cost can be considered. The application of the method based on pre-evaluated routing tables makes possible the reduction of bandwidth cost and connection establishment cost individually

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In image segmentation, clustering algorithms are very popular because they are intuitive and, some of them, easy to implement. For instance, the k-means is one of the most used in the literature, and many authors successfully compare their new proposal with the results achieved by the k-means. However, it is well known that clustering image segmentation has many problems. For instance, the number of regions of the image has to be known a priori, as well as different initial seed placement (initial clusters) could produce different segmentation results. Most of these algorithms could be slightly improved by considering the coordinates of the image as features in the clustering process (to take spatial region information into account). In this paper we propose a significant improvement of clustering algorithms for image segmentation. The method is qualitatively and quantitative evaluated over a set of synthetic and real images, and compared with classical clustering approaches. Results demonstrate the validity of this new approach