855 resultados para parallel computation
Resumo:
This thesis presents a new actuator system consisting of a micro-actuator and a macro-actuator coupled in parallel via a compliant transmission. The system is called the Parallel Coupled Micro-Macro Actuator, or PaCMMA. In this system, the micro-actuator is capable of high bandwidth force control due to its low mass and direct-drive connection to the output shaft. The compliant transmission of the macro-actuator reduces the impedance (stiffness) at the output shaft and increases the dynamic range of force. Performance improvement over single actuator systems was expected in force control, impedance control, force distortion and reduction of transient impact forces. A set of quantitative measures is proposed and the actuator system is evaluated against them: Force Control Bandwidth, Position Bandwidth, Dynamic Range, Impact Force, Impedance ("Backdriveability'"), Force Distortion and Force Performance Space. Several theoretical performance limits are derived from the saturation limits of the system. A control law is proposed and control system performance is compared to the theoretical limits. A prototype testbed was built using permanenent magnet motors and an experimental comparison was performed between this actuator concept and two single actuator systems. The following performance was observed: Force bandwidth of 56Hz, Torque Dynamic Range of 800:1, Peak Torque of 1040mNm, Minimum Torque of 1.3mNm. Peak Impact Force was reduced by an order of magnitude. Distortion at small amplitudes was reduced substantially. Backdriven impedance was reduced by 2-3 orders of magnitude. This actuator system shows promise for manipulator design as well as psychophysical tests of human performance.
Resumo:
The furious pace of Moore's Law is driving computer architecture into a realm where the the speed of light is the dominant factor in system latencies. The number of clock cycles to span a chip are increasing, while the number of bits that can be accessed within a clock cycle is decreasing. Hence, it is becoming more difficult to hide latency. One alternative solution is to reduce latency by migrating threads and data, but the overhead of existing implementations has previously made migration an unserviceable solution so far. I present an architecture, implementation, and mechanisms that reduces the overhead of migration to the point where migration is a viable supplement to other latency hiding mechanisms, such as multithreading. The architecture is abstract, and presents programmers with a simple, uniform fine-grained multithreaded parallel programming model with implicit memory management. In other words, the spatial nature and implementation details (such as the number of processors) of a parallel machine are entirely hidden from the programmer. Compiler writers are encouraged to devise programming languages for the machine that guide a programmer to express their ideas in terms of objects, since objects exhibit an inherent physical locality of data and code. The machine implementation can then leverage this locality to automatically distribute data and threads across the physical machine by using a set of high performance migration mechanisms. An implementation of this architecture could migrate a null thread in 66 cycles -- over a factor of 1000 improvement over previous work. Performance also scales well; the time required to move a typical thread is only 4 to 5 times that of a null thread. Data migration performance is similar, and scales linearly with data block size. Since the performance of the migration mechanism is on par with that of an L2 cache, the implementation simulated in my work has no data caches and relies instead on multithreading and the migration mechanism to hide and reduce access latencies.
Resumo:
We consider the often-studied problem of sorting, for a parallel computer. Given an input array distributed evenly over p processors, the task is to compute the sorted output array, also distributed over the p processors. Many existing algorithms take the approach of approximately load-balancing the output, leaving each processor with Θ(n/p) elements. However, in many cases, approximate load-balancing leads to inefficiencies in both the sorting itself and in further uses of the data after sorting. We provide a deterministic parallel sorting algorithm that uses parallel selection to produce any output distribution exactly, particularly one that is perfectly load-balanced. Furthermore, when using a comparison sort, this algorithm is 1-optimal in both computation and communication. We provide an empirical study that illustrates the efficiency of exact data splitting, and shows an improvement over two sample sort algorithms.
Resumo:
A key capability of data-race detectors is to determine whether one thread executes logically in parallel with another or whether the threads must operate in series. This paper provides two algorithms, one serial and one parallel, to maintain series-parallel (SP) relationships "on the fly" for fork-join multithreaded programs. The serial SP-order algorithm runs in O(1) amortized time per operation. In contrast, the previously best algorithm requires a time per operation that is proportional to Tarjan’s functional inverse of Ackermann’s function. SP-order employs an order-maintenance data structure that allows us to implement a more efficient "English-Hebrew" labeling scheme than was used in earlier race detectors, which immediately yields an improved determinacy-race detector. In particular, any fork-join program running in T₁ time on a single processor can be checked on the fly for determinacy races in O(T₁) time. Corresponding improved bounds can also be obtained for more sophisticated data-race detectors, for example, those that use locks. By combining SP-order with Feng and Leiserson’s serial SP-bags algorithm, we obtain a parallel SP-maintenance algorithm, called SP-hybrid. Suppose that a fork-join program has n threads, T₁ work, and a critical-path length of T[subscript â]. When executed on P processors, we prove that SP-hybrid runs in O((T₁/P + PT[subscript â]) lg n) expected time. To understand this bound, consider that the original program obtains linear speed-up over a 1-processor execution when P = O(T₁/T[subscript â]). In contrast, SP-hybrid obtains linear speed-up when P = O(√T₁/T[subscript â]), but the work is increased by a factor of O(lg n).
Resumo:
This paper presents the research and development of a 3-legged micro Parallel Kinematic Manipulator (PKM) for positioning in micro-machining and assembly operations. The structural characteristics associated with parallel manipulators are evaluated and the PKMs with translational and rotational movements are identified. Based on these identifications, a hybrid 3-UPU (Universal Joint-Prismatic Joint-Universal Joint) parallel manipulator is designed and fabricated. The principles of the operation and modeling of this micro PKM is largely similar to a normal size Stewart Platform (SP). A modular design methodology is introduced for the construction of this micro PKM. Calibration results of this hybrid 3-UPU PKM are discussed in this paper.
Optimal Methodology for Synchronized Scheduling of Parallel Station Assembly with Air Transportation
Resumo:
We present an optimal methodology for synchronized scheduling of production assembly with air transportation to achieve accurate delivery with minimized cost in consumer electronics supply chain (CESC). This problem was motivated by a major PC manufacturer in consumer electronics industry, where it is required to schedule the delivery requirements to meet the customer needs in different parts of South East Asia. The overall problem is decomposed into two sub-problems which consist of an air transportation allocation problem and an assembly scheduling problem. The air transportation allocation problem is formulated as a Linear Programming Problem with earliness tardiness penalties for job orders. For the assembly scheduling problem, it is basically required to sequence the job orders on the assembly stations to minimize their waiting times before they are shipped by flights to their destinations. Hence the second sub-problem is modelled as a scheduling problem with earliness penalties. The earliness penalties are assumed to be independent of the job orders.
Resumo:
Omnidirectional cameras offer a much wider field of view than the perspective ones and alleviate the problems due to occlusions. However, both types of cameras suffer from the lack of depth perception. A practical method for obtaining depth in computer vision is to project a known structured light pattern on the scene avoiding the problems and costs involved by stereo vision. This paper is focused on the idea of combining omnidirectional vision and structured light with the aim to provide 3D information about the scene. The resulting sensor is formed by a single catadioptric camera and an omnidirectional light projector. It is also discussed how this sensor can be used in robot navigation applications
Resumo:
Most network operators have considered reducing Label Switched Routers (LSR) label spaces (i.e. the number of labels that can be used) as a means of simplifying management of underlaying Virtual Private Networks (VPNs) and, hence, reducing operational expenditure (OPEX). This letter discusses the problem of reducing the label spaces in Multiprotocol Label Switched (MPLS) networks using label merging - better known as MultiPoint-to-Point (MP2P) connections. Because of its origins in IP, MP2P connections have been considered to have tree- shapes with Label Switched Paths (LSP) as branches. Due to this fact, previous works by many authors affirm that the problem of minimizing the label space using MP2P in MPLS - the Merging Problem - cannot be solved optimally with a polynomial algorithm (NP-complete), since it involves a hard- decision problem. However, in this letter, the Merging Problem is analyzed, from the perspective of MPLS, and it is deduced that tree-shapes in MP2P connections are irrelevant. By overriding this tree-shape consideration, it is possible to perform label merging in polynomial time. Based on how MPLS signaling works, this letter proposes an algorithm to compute the minimum number of labels using label merging: the Full Label Merging algorithm. As conclusion, we reclassify the Merging Problem as Polynomial-solvable, instead of NP-complete. In addition, simulation experiments confirm that without the tree-branch selection problem, more labels can be reduced
Resumo:
The design of control, estimation or diagnosis algorithms most often assumes that all available process variables represent the system state at the same instant of time. However, this is never true in current network systems, because of the unknown deterministic or stochastic transmission delays introduced by the communication network. During the diagnosing stage, this will often generate false alarms. Under nominal operation, the different transmission delays associated with the variables that appear in the computation form produce discrepancies of the residuals from zero. A technique aiming at the minimisation of the resulting false alarms rate, that is based on the explicit modelling of communication delays and on their best-case estimation is proposed
Resumo:
Crowdsourcing. Social Machines. Human computation. Co-construction Made Real
Resumo:
the introduction of this research paper (especially pg 2-4) and its list of references may be useful to clarify the notions of Bayesian learning applied to trust as explained in the lectures. This is optional reading
Resumo:
Speaker(s): Prof. David Evans Organiser: Dr Tim Chown Time: 22/05/2014 10:45-11:45 Location: B53/4025 Abstract Secure multi-party computation enables two (or more) participants to reliably compute a function that depends on both of their inputs, without revealing those inputs to the other party or needing to trust any other party. It could enable two people who meet at a conference to learn who they known in common without revealing any of their other contacts, or allow a pharmaceutical company to determine the correct dosage of a medication based on a patient’s genome without compromising the privacy of the patient. A general solution to this problem has been known since Yao's pioneering work in the 1980s, but only recently has it become conceivable to use this approach in practice. Over the past few years, my research group has worked towards making secure computation practical for real applications. In this talk, I'll provide a brief introduction to secure computation protocols, describe the techniques we have developed to design scalable and efficient protocols, and share some recent results on improving efficiency and how secure computing applications are developed.