Biblioteca Digital

10 resultados para structured parallel computations

em Massachusetts Institute of Technology

Pi: A Parallel Architecture Interface for Multi-Model Execution

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis defines Pi, a parallel architecture interface that separates model and machine issues, allowing them to be addressed independently. This provides greater flexibility for both the model and machine builder. Pi addresses a set of common parallel model requirements including low latency communication, fast task switching, low cost synchronization, efficient storage management, the ability to exploit locality, and efficient support for sequential code. Since Pi provides generic parallel operations, it can efficiently support many parallel programming models including hybrids of existing models. Pi also forms a basis of comparison for architectural components.

Veja mais

Parallel Methods for Synthesizing Whole-Hand Grasps from Generalized Prototypes

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This report addresses the problem of acquiring objects using articulated robotic hands. Standard grasps are used to make the problem tractable, and a technique is developed for generalizing these standard grasps to increase their flexibility to variations in the problem geometry. A generalized grasp description is applied to a new problem situation using a parallel search through hand configuration space, and the result of this operation is a global overview of the space of good solutions. The techniques presented in this report have been implemented, and the results are verified using the Salisbury three-finger robotic hand.

Veja mais

Thread Scheduling Mechanisms for Multiple-Context Parallel Processors

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Scheduling tasks to efficiently use the available processor resources is crucial to minimizing the runtime of applications on shared-memory parallel processors. One factor that contributes to poor processor utilization is the idle time caused by long latency operations, such as remote memory references or processor synchronization operations. One way of tolerating this latency is to use a processor with multiple hardware contexts that can rapidly switch to executing another thread of computation whenever a long latency operation occurs, thus increasing processor utilization by overlapping computation with communication. Although multiple contexts are effective for tolerating latency, this effectiveness can be limited by memory and network bandwidth, by cache interference effects among the multiple contexts, and by critical tasks sharing processor resources with less critical tasks. This thesis presents techniques that increase the effectiveness of multiple contexts by intelligently scheduling threads to make more efficient use of processor pipeline, bandwidth, and cache resources. This thesis proposes thread prioritization as a fundamental mechanism for directing the thread schedule on a multiple-context processor. A priority is assigned to each thread either statically or dynamically and is used by the thread scheduler to decide which threads to load in the contexts, and to decide which context to switch to on a context switch. We develop a multiple-context model that integrates both cache and network effects, and shows how thread prioritization can both maintain high processor utilization, and limit increases in critical path runtime caused by multithreading. The model also shows that in order to be effective in bandwidth limited applications, thread prioritization must be extended to prioritize memory requests. We show how simple hardware can prioritize the running of threads in the multiple contexts, and the issuing of requests to both the local memory and the network. Simulation experiments show how thread prioritization is used in a variety of applications. Thread prioritization can improve the performance of synchronization primitives by minimizing the number of processor cycles wasted in spinning and devoting more cycles to critical threads. Thread prioritization can be used in combination with other techniques to improve cache performance and minimize cache interference between different working sets in the cache. For applications that are critical path limited, thread prioritization can improve performance by allowing processor resources to be devoted preferentially to critical threads. These experimental results show that thread prioritization is a mechanism that can be used to implement a wide range of scheduling policies.

Veja mais

Parallel Coupled Micro-Macro Actuators

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis presents a new actuator system consisting of a micro-actuator and a macro-actuator coupled in parallel via a compliant transmission. The system is called the Parallel Coupled Micro-Macro Actuator, or PaCMMA. In this system, the micro-actuator is capable of high bandwidth force control due to its low mass and direct-drive connection to the output shaft. The compliant transmission of the macro-actuator reduces the impedance (stiffness) at the output shaft and increases the dynamic range of force. Performance improvement over single actuator systems was expected in force control, impedance control, force distortion and reduction of transient impact forces. A set of quantitative measures is proposed and the actuator system is evaluated against them: Force Control Bandwidth, Position Bandwidth, Dynamic Range, Impact Force, Impedance ("Backdriveability'"), Force Distortion and Force Performance Space. Several theoretical performance limits are derived from the saturation limits of the system. A control law is proposed and control system performance is compared to the theoretical limits. A prototype testbed was built using permanenent magnet motors and an experimental comparison was performed between this actuator concept and two single actuator systems. The following performance was observed: Force bandwidth of 56Hz, Torque Dynamic Range of 800:1, Peak Torque of 1040mNm, Minimum Torque of 1.3mNm. Peak Impact Force was reduced by an order of magnitude. Distortion at small amplitudes was reduced substantially. Backdriven impedance was reduced by 2-3 orders of magnitude. This actuator system shows promise for manipulator design as well as psychophysical tests of human performance.

Veja mais

ADAM: A Decentralized Parallel Computer Architecture Featuring Fast Thread and Data Migration and a Uniform Hardware Abstraction

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The furious pace of Moore's Law is driving computer architecture into a realm where the the speed of light is the dominant factor in system latencies. The number of clock cycles to span a chip are increasing, while the number of bits that can be accessed within a clock cycle is decreasing. Hence, it is becoming more difficult to hide latency. One alternative solution is to reduce latency by migrating threads and data, but the overhead of existing implementations has previously made migration an unserviceable solution so far. I present an architecture, implementation, and mechanisms that reduces the overhead of migration to the point where migration is a viable supplement to other latency hiding mechanisms, such as multithreading. The architecture is abstract, and presents programmers with a simple, uniform fine-grained multithreaded parallel programming model with implicit memory management. In other words, the spatial nature and implementation details (such as the number of processors) of a parallel machine are entirely hidden from the programmer. Compiler writers are encouraged to devise programming languages for the machine that guide a programmer to express their ideas in terms of objects, since objects exhibit an inherent physical locality of data and code. The machine implementation can then leverage this locality to automatically distribute data and threads across the physical machine by using a set of high performance migration mechanisms. An implementation of this architecture could migrate a null thread in 66 cycles -- over a factor of 1000 improvement over previous work. Performance also scales well; the time required to move a typical thread is only 4 to 5 times that of a null thread. Data migration performance is similar, and scales linearly with data block size. Since the performance of the migration mechanism is on par with that of an L2 cache, the implementation simulated in my work has no data caches and relies instead on multithreading and the migration mechanism to hide and reduce access latencies.

Veja mais

Mobile P2Ping: A Super-Peer based Structured P2P System Using a Fleet of City Buses

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recently, researchers have introduced the notion of super-peers to improve signaling efficiency as well as lookup performance of peer-to-peer (P2P) systems. In a separate development, recent works on applications of mobile ad hoc networks (MANET) have seen several proposals on utilizing mobile fleets such as city buses to deploy a mobile backbone infrastructure for communication and Internet access in a metropolitan environment. This paper further explores the possibility of deploying P2P applications such as content sharing and distributed computing, over this mobile backbone infrastructure. Specifically, we study how city buses may be deployed as a mobile system of super-peers. We discuss the main motivations behind our proposal, and outline in detail the design of a super-peer based structured P2P system using a fleet of city buses.

Veja mais

On-the-Fly Maintenance of Series-Parallel Relationships in Fork-Join Multithreaded Programs

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A key capability of data-race detectors is to determine whether one thread executes logically in parallel with another or whether the threads must operate in series. This paper provides two algorithms, one serial and one parallel, to maintain series-parallel (SP) relationships "on the fly" for fork-join multithreaded programs. The serial SP-order algorithm runs in O(1) amortized time per operation. In contrast, the previously best algorithm requires a time per operation that is proportional to Tarjan’s functional inverse of Ackermann’s function. SP-order employs an order-maintenance data structure that allows us to implement a more efficient "English-Hebrew" labeling scheme than was used in earlier race detectors, which immediately yields an improved determinacy-race detector. In particular, any fork-join program running in T₁ time on a single processor can be checked on the fly for determinacy races in O(T₁) time. Corresponding improved bounds can also be obtained for more sophisticated data-race detectors, for example, those that use locks. By combining SP-order with Feng and Leiserson’s serial SP-bags algorithm, we obtain a parallel SP-maintenance algorithm, called SP-hybrid. Suppose that a fork-join program has n threads, T₁ work, and a critical-path length of T[subscript â]. When executed on P processors, we prove that SP-hybrid runs in O((T₁/P + PT[subscript â]) lg n) expected time. To understand this bound, consider that the original program obtains linear speed-up over a 1-processor execution when P = O(T₁/T[subscript â]). In contrast, SP-hybrid obtains linear speed-up when P = O(√T₁/T[subscript â]), but the work is increased by a factor of O(lg n).

Veja mais

Design and Development of 3-DOF Modular Micro Parallel Kinematic Manipulator

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents the research and development of a 3-legged micro Parallel Kinematic Manipulator (PKM) for positioning in micro-machining and assembly operations. The structural characteristics associated with parallel manipulators are evaluated and the PKMs with translational and rotational movements are identified. Based on these identifications, a hybrid 3-UPU (Universal Joint-Prismatic Joint-Universal Joint) parallel manipulator is designed and fabricated. The principles of the operation and modeling of this micro PKM is largely similar to a normal size Stewart Platform (SP). A modular design methodology is introduced for the construction of this micro PKM. Calibration results of this hybrid 3-UPU PKM are discussed in this paper.

Veja mais

Optimal Methodology for Synchronized Scheduling of Parallel Station Assembly with Air Transportation

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present an optimal methodology for synchronized scheduling of production assembly with air transportation to achieve accurate delivery with minimized cost in consumer electronics supply chain (CESC). This problem was motivated by a major PC manufacturer in consumer electronics industry, where it is required to schedule the delivery requirements to meet the customer needs in different parts of South East Asia. The overall problem is decomposed into two sub-problems which consist of an air transportation allocation problem and an assembly scheduling problem. The air transportation allocation problem is formulated as a Linear Programming Problem with earliness tardiness penalties for job orders. For the assembly scheduling problem, it is basically required to sequence the job orders on the assembly stations to minimize their waiting times before they are shipped by flights to their destinations. Hence the second sub-problem is modelled as a scheduling problem with earliness penalties. The earliness penalties are assumed to be independent of the job orders.

Veja mais

Molecular computations for reactions and phase transitions: applications to protein stabilization, hydrates and catalysis

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this work we have made significant contributions in three different areas of interest: therapeutic protein stabilization, thermodynamics of natural gas clathrate-hydrates, and zeolite catalysis. In all three fields, using our various computational techniques, we have been able to elucidate phenomena that are difficult or impossible to explain experimentally. More specifically, in mixed solvent systems for proteins we developed a statistical-mechanical method to model the thermodynamic effects of additives in molecular-level detail. It was the first method demonstrated to have truly predictive (no adjustable parameters) capability for real protein systems. We also describe a novel mechanism that slows protein association reactions, called the “gap effect.” We developed a comprehensive picture of methioine oxidation by hydrogen peroxide that allows for accurate prediction of protein oxidation and provides a rationale for developing strategies to control oxidation. The method of solvent accessible area (SAA) was shown not to correlate well with oxidation rates. A new property, averaged two-shell water coordination number (2SWCN) was identified and shown to correlate well with oxidation rates. Reference parameters for the van der Waals Platteeuw model of clathrate-hydrates were found for structure I and structure II. These reference parameters are independent of the potential form (unlike the commonly used parameters) and have been validated by calculating phase behavior and structural transitions for mixed hydrate systems. These calculations are validated with experimental data for both structures and for systems that undergo transitions from one structure to another. This is the first method of calculating hydrate thermodynamics to demonstrate predictive capability for phase equilibria, structural changes, and occupancy in pure and mixed hydrate systems. We have computed a new mechanism for the methanol coupling reaction to form ethanol and water in the zeolite chabazite. The mechanism at 400°C proceeds via stable intermediates of water, methane, and protonated formaldehyde.

Veja mais

10 resultados para structured parallel computations

em Massachusetts Institute of Technology

Filtro por publicador