970 resultados para Parallel programming
Resumo:
Incorporating the possibility of attaching attributes to variables in a logic programming system has been shown to allow the addition of general constraint solving capabilities to it. This approach is very attractive in that by adding a few primitives any logic programming system can be turned into a generic constraint logic programming system in which constraint solving can be user defined, and at source level - an extreme example of the "glass box" approach. In this paper we propose a different and novel use for the concept of attributed variables: developing a generic parallel/concurrent (constraint) logic programming system, using the same "glass box" flavor. We argüe that a system which implements attributed variables and a few additional primitives can be easily customized at source level to implement many of the languages and execution models of parallelism and concurrency currently proposed, in both shared memory and distributed systems. We illustrate this through examples.
Resumo:
Thesis (M.S.)--University of Illinois at Urbana-Champaign, 1966.
Resumo:
Symmetric multi-processor (SMP) systems, or multiple-CPU servers, are suitable for implementing parallel algorithms because they employ dedicated communication devices to enhance the inter-processor communication bandwidth, so that a better performance can be obtained. However, the cost for a multiple-CPU server is high and therefore, the server is usually shared among many users. The work-load due to other users will certainly affect the performance of the parallel programs so it is desirable to derive a method to optimize parallel programs under different loading conditions. In this paper, we present a simple method, which can be applied in SPMD type parallel programs, to improve the speedup by controlling the number of threads within the programs.
Resumo:
The Streaming SIMD extension (SSE) is a special feature that is available in the Intel Pentium III and P4 classes of microprocessors. As its name implies, SSE enables the execution of SIMD (Single Instruction Multiple Data) operations upon 32-bit floating-point data therefore, performance of floating-point algorithms can be improved. In electrified railway system simulation, the computation involves the solving of a huge set of simultaneous linear equations, which represent the electrical characteristic of the railway network at a particular time-step and a fast solution for the equations is desirable in order to simulate the system in real-time. In this paper, we present how SSE is being applied to the railway network simulation.
Resumo:
Abstract Computer simulation is a versatile and commonly used tool for the design and evaluation of systems with different degrees of complexity. Power distribution systems and electric railway network are areas for which computer simulations are being heavily applied. A dominant factor in evaluating the performance of a software simulator is its processing time, especially in the cases of real-time simulation. Parallel processing provides a viable mean to reduce the computing time and is therefore suitable for building real-time simulators. In this paper, we present different issues related to solving the power distribution system with parallel computing based on a multiple-CPU server and we will concentrate, in particular, on the speedup performance of such an approach.
Resumo:
A simple but efficient algorithm is presented for linear programming. The algorithm computes the projection matrix exactly once throughout the computation unlike that of Karmarkar’s algorithm where in the projection matrix is computed at each and every iteration. The algorithm is best suitable to be implemented on a parallel architecture. Complexity of the algorithm is being studied.
Resumo:
Negabinary is a component of the positional number system. A complete set of negabinary arithmetic operations are presented, including the basic addition/subtraction logic, the two-step carry-free addition/subtraction algorithm based on negabinary signed-digit (NSD) representation, parallel multiplication, and the fast conversion from NSD to the normal negabinary in the carry-look-ahead mode. All the arithmetic operations can be performed with binary logic. By programming the binary reference bits, addition and subtraction can be realized in parallel with the same binary logic functions. This offers a technique to perform space-variant arithmetic-logic functions with space-invariant instructions. Multiplication can be performed in the tree structure and it is simpler than the modified signed-digit (MSD) counterpart. The parallelism of the algorithms is very suitable for optical implementation. Correspondingly, a general-purpose optical logic system using an electron trapping device is suggested. Various complex logic functions can be performed by programming the illumination of the data arrays without additional temporal latency of the intermediate results. The system can be compact. These properties make the proposed negabinary arithmetic-logic system a strong candidate for future applications in digital optical computing with the development of smart pixel arrays. (C) 1999 Society of Photo-Optical Instrumentation Engineers. [S0091-3286(99)00803-X].
Resumo:
The proliferation of inexpensive workstations and networks has prompted several researchers to use such distributed systems for parallel computing. Attempts have been made to offer a shared-memory programming model on such distributed memory computers. Most systems provide a shared-memory that is coherent in that all processes that use it agree on the order of all memory events. This dissertation explores the possibility of a significant improvement in the performance of some applications when they use non-coherent memory. First, a new formal model to describe existing non-coherent memories is developed. I use this model to prove that certain problems can be solved using asynchronous iterative algorithms on shared-memory in which the coherence constraints are substantially relaxed. In the course of the development of the model I discovered a new type of non-coherent behavior called Local Consistency. Second, a programming model, Mermera, is proposed. It provides programmers with a choice of hierarchically related non-coherent behaviors along with one coherent behavior. Thus, one can trade-off the ease of programming with coherent memory for improved performance with non-coherent memory. As an example, I present a program to solve a linear system of equations using an asynchronous iterative algorithm. This program uses all the behaviors offered by Mermera. Third, I describe the implementation of Mermera on a BBN Butterfly TC2000 and on a network of workstations. The performance of a version of the equation solving program that uses all the behaviors of Mermera is compared with that of a version that uses coherent behavior only. For a system of 1000 equations the former exhibits at least a 5-fold improvement in convergence time over the latter. The version using coherent behavior only does not benefit from employing more than one workstation to solve the problem while the program using non-coherent behavior continues to achieve improved performance as the number of workstations is increased from 1 to 6. This measurement corroborates our belief that non-coherent shared memory can be a performance boon for some applications.
Resumo:
Predictability -- the ability to foretell that an implementation will not violate a set of specified reliability and timeliness requirements -- is a crucial, highly desirable property of responsive embedded systems. This paper overviews a development methodology for responsive systems, which enhances predictability by eliminating potential hazards resulting from physically-unsound specifications. The backbone of our methodology is the Time-constrained Reactive Automaton (TRA) formalism, which adopts a fundamental notion of space and time that restricts expressiveness in a way that allows the specification of only reactive, spontaneous, and causal computation. Using the TRA model, unrealistic systems – possessing properties such as clairvoyance, caprice, infinite capacity, or perfect timing -- cannot even be specified. We argue that this "ounce of prevention" at the specification level is likely to spare a lot of time and energy in the development cycle of responsive systems -- not to mention the elimination of potential hazards that would have gone, otherwise, unnoticed. The TRA model is presented to system developers through the Cleopatra programming language. Cleopatra features a C-like imperative syntax for the description of computation, which makes it easier to incorporate in applications already using C. It is event-driven, and thus appropriate for embedded process control applications. It is object-oriented and compositional, thus advocating modularity and reusability. Cleopatra is semantically sound; its objects can be transformed, mechanically and unambiguously, into formal TRA automata for verification purposes, which can be pursued using model-checking or theorem proving techniques. Since 1989, an ancestor of Cleopatra has been in use as a specification and simulation language for embedded time-critical robotic processes.
Resumo:
Programmers of parallel processes that communicate through shared globally distributed data structures (DDS) face a difficult choice. Either they must explicitly program DDS management, by partitioning or replicating it over multiple distributed memory modules, or be content with a high latency coherent (sequentially consistent) memory abstraction that hides the DDS' distribution. We present Mermera, a new formalism and system that enable a smooth spectrum of noncoherent shared memory behaviors to coexist between the above two extremes. Our approach allows us to define known noncoherent memories in a new simple way, to identify new memory behaviors, and to characterize generic mixed-behavior computations. The latter are useful for programming using multiple behaviors that complement each others' advantages. On the practical side, we show that the large class of programs that use asynchronous iterative methods (AIM) can run correctly on slow memory, one of the weakest, and hence most efficient and fault-tolerant, noncoherence conditions. An example AIM program to solve linear equations, is developed to illustrate: (1) the need for concurrently mixing memory behaviors, and, (2) the performance gains attainable via noncoherence. Other program classes tolerate weak memory consistency by synchronizing in such a way as to yield executions indistinguishable from coherent ones. AIM computations on noncoherent memory yield noncoherent, yet correct, computations. We report performance data that exemplifies the potential benefits of noncoherence, in terms of raw memory performance, as well as application speed.
Resumo:
We consider a problem of scheduling jobs on m parallel machines. The machines are dedicated, i.e., for each job the processing machine is known in advance. We mainly concentrate on the model in which at any time there is one unit of an additional resource. Any job may be assigned the resource and this reduces its processing time. A job that is given the resource uses it at each time of its processing. No two jobs are allowed to use the resource simultaneously. The objective is to minimize the makespan. We prove that the two-machine problem is NP-hard in the ordinary sense, describe a pseudopolynomial dynamic programming algorithm and convert it into an FPTAS. For the problem with an arbitrary number of machines we present an algorithm with a worst-case ratio close to 3/2, and close to 3, if a job can be given several units of the resource. For the problem with a fixed number of machines we give a PTAS. Virtually all algorithms rely on a certain variant of the linear knapsack problem (maximization, minimization, multiple-choice, bicriteria). © 2008 Wiley Periodicals, Inc. Naval Research Logistics, 2008
Resumo:
The cellular prion protein (PrPC) is widely expressed in neural and non-neural tissues, but its function is unknown. Elucidation of the part played by PrPC in adaptive immunity has been a particular conundrum: increased expression of cell surface PrPC has been documented during T-cell activation, yet the functional significance of this activation remains unclear, with conflicting data on the effects of Prnp gene knockout on various parameters of T-cell immunity. We show here that Prnp mRNA is highly inducible within 8–24 h of T-cell activation, with surface protein levels rising from 24 h. When measured in parallel with CD69 and CD25, PrPC is a late activation antigen. Consistent with its up-regulation being a late activation event, PrP deletion did not alter T-cell-antigen presenting cell conjugate formation. Most important, activated PrP0/0 T cells demonstrated much reduced induction of several T helper (Th) 1, Th2, and Th17 cytokines, whereas others, such as TNF- and IL-9, were unaffected. These changes were investigated in the context of an autoimmune model and a bacterial challenge model. In experimental autoimmune encephalomyelitis, PrP-knockout mice showed enhanced disease in the face of reduced IL-17 responses. In a streptococcal sepsis model, this constrained cytokine program was associated with poorer local control of infection, although with reduced bacteremia. The findings indicate that PrPC is a potentially important molecule influencing T-cell activation and effector function.
Resumo:
A BSP superstep is a distributed computation comprising a number of simultaneously executing processes which may generate asynchronous messages. A superstep terminates with a barrier which enforces a global synchronisation and delivers all ongoing communications. Multilevel supersteps can utilise barriers in which subsets of processes, interacting through shared memories, are locally synchronised (partitioned synchronisation). In this paper a state-based semantics, closely related to the classical sequential programming model, is derived for distributed BSP with partitioned synchronisation.