977 resultados para Parallel Programming


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Programmers of parallel processes that communicate through shared globally distributed data structures (DDS) face a difficult choice. Either they must explicitly program DDS management, by partitioning or replicating it over multiple distributed memory modules, or be content with a high latency coherent (sequentially consistent) memory abstraction that hides the DDS' distribution. We present Mermera, a new formalism and system that enable a smooth spectrum of noncoherent shared memory behaviors to coexist between the above two extremes. Our approach allows us to define known noncoherent memories in a new simple way, to identify new memory behaviors, and to characterize generic mixed-behavior computations. The latter are useful for programming using multiple behaviors that complement each others' advantages. On the practical side, we show that the large class of programs that use asynchronous iterative methods (AIM) can run correctly on slow memory, one of the weakest, and hence most efficient and fault-tolerant, noncoherence conditions. An example AIM program to solve linear equations, is developed to illustrate: (1) the need for concurrently mixing memory behaviors, and, (2) the performance gains attainable via noncoherence. Other program classes tolerate weak memory consistency by synchronizing in such a way as to yield executions indistinguishable from coherent ones. AIM computations on noncoherent memory yield noncoherent, yet correct, computations. We report performance data that exemplifies the potential benefits of noncoherence, in terms of raw memory performance, as well as application speed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We consider a problem of scheduling jobs on m parallel machines. The machines are dedicated, i.e., for each job the processing machine is known in advance. We mainly concentrate on the model in which at any time there is one unit of an additional resource. Any job may be assigned the resource and this reduces its processing time. A job that is given the resource uses it at each time of its processing. No two jobs are allowed to use the resource simultaneously. The objective is to minimize the makespan. We prove that the two-machine problem is NP-hard in the ordinary sense, describe a pseudopolynomial dynamic programming algorithm and convert it into an FPTAS. For the problem with an arbitrary number of machines we present an algorithm with a worst-case ratio close to 3/2, and close to 3, if a job can be given several units of the resource. For the problem with a fixed number of machines we give a PTAS. Virtually all algorithms rely on a certain variant of the linear knapsack problem (maximization, minimization, multiple-choice, bicriteria). © 2008 Wiley Periodicals, Inc. Naval Research Logistics, 2008

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The cellular prion protein (PrPC) is widely expressed in neural and non-neural tissues, but its function is unknown. Elucidation of the part played by PrPC in adaptive immunity has been a particular conundrum: increased expression of cell surface PrPC has been documented during T-cell activation, yet the functional significance of this activation remains unclear, with conflicting data on the effects of Prnp gene knockout on various parameters of T-cell immunity. We show here that Prnp mRNA is highly inducible within 8–24 h of T-cell activation, with surface protein levels rising from 24 h. When measured in parallel with CD69 and CD25, PrPC is a late activation antigen. Consistent with its up-regulation being a late activation event, PrP deletion did not alter T-cell-antigen presenting cell conjugate formation. Most important, activated PrP0/0 T cells demonstrated much reduced induction of several T helper (Th) 1, Th2, and Th17 cytokines, whereas others, such as TNF- and IL-9, were unaffected. These changes were investigated in the context of an autoimmune model and a bacterial challenge model. In experimental autoimmune encephalomyelitis, PrP-knockout mice showed enhanced disease in the face of reduced IL-17 responses. In a streptococcal sepsis model, this constrained cytokine program was associated with poorer local control of infection, although with reduced bacteremia. The findings indicate that PrPC is a potentially important molecule influencing T-cell activation and effector function.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A BSP superstep is a distributed computation comprising a number of simultaneously executing processes which may generate asynchronous messages. A superstep terminates with a barrier which enforces a global synchronisation and delivers all ongoing communications. Multilevel supersteps can utilise barriers in which subsets of processes, interacting through shared memories, are locally synchronised (partitioned synchronisation). In this paper a state-based semantics, closely related to the classical sequential programming model, is derived for distributed BSP with partitioned synchronisation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes the design, application, and evaluation of a user friendly, flexible, scalable and inexpensive Advanced Educational Parallel (AdEPar) digital signal processing (DSP) system based on TMS320C25 digital processors to implement DSP algorithms. This system will be used in the DSP laboratory by graduate students to work on advanced topics such as developing parallel DSP algorithms. The graduating senior students who have gained some experience in DSP can also use the system. The DSP laboratory has proved to be a useful tool in the hands of the instructor to teach the mathematically oriented topics of DSP that are often difficult for students to grasp. The DSP laboratory with assigned projects has greatly improved the ability of the students to understand such complex topics as the fast Fourier transform algorithm, linear and circular convolution, the theory and design of infinite impulse response (IIR) and finite impulse response (FIR) filters. The user friendly PC software support of the AdEPar system makes it easy to develop DSP programs for students. This paper gives the architecture of the AdEPar DSP system. The communication between processors and the PC-DSP processor communication are explained. The parallel debugger kernels and the restrictions of the system are described. The programming in the AdEPar is explained, and two benchmarks (parallel FFT and DES) are presented to show the system performance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many scientific applications are programmed using hybrid programming models that use both message passing and shared memory, due to the increasing prevalence of large-scale systems with multicore, multisocket nodes. Previous work has shown that energy efficiency can be improved using software-controlled execution schemes that consider both the programming model and the power-aware execution capabilities of the system. However, such approaches have focused on identifying optimal resource utilization for one programming model, either shared memory or message passing, in isolation. The potential solution space, thus the challenge, increases substantially when optimizing hybrid models since the possible resource configurations increase exponentially. Nonetheless, with the accelerating adoption of hybrid programming models, we increasingly need improved energy efficiency in hybrid parallel applications on large-scale systems. In this work, we present new software-controlled execution schemes that consider the effects of dynamic concurrency throttling (DCT) and dynamic voltage and frequency scaling (DVFS) in the context of hybrid programming models. Specifically, we present predictive models and novel algorithms based on statistical analysis that anticipate application power and time requirements under different concurrency and frequency configurations. We apply our models and methods to the NPB MZ benchmarks and selected applications from the ASC Sequoia codes. Overall, we achieve substantial energy savings (8.74 percent on average and up to 13.8 percent) with some performance gain (up to 7.5 percent) or negligible performance loss.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Refactoring is the process of changing the structure of a program without changing its behaviour. Refactoring has so far only really been deployed effectively for sequential programs. However, with the increased availability of multicore (and, soon, manycore) systems, refactoring can play an important role in helping both expert and non-expert parallel programmers structure and implement their parallel programs. This paper describes the design of a new refactoring tool that is aimed at increasing the programmability of parallel systems. To motivate our design, we refactor a number of examples in C, C++ and Erlang into good parallel implementations, using a set of formal pattern rewrite rules. © 2013 Springer-Verlag Berlin Heidelberg.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes the ParaPhrase project, a new 3-year targeted research project funded under EU Framework 7 Objective 3.4 (Computer Systems), starting in October 2011. ParaPhrase aims to follow a new approach to introducing parallelism using advanced refactoring techniques coupled with high-level parallel design patterns. The refactoring approach will use these design patterns to restructure programs defined as networks of software components into other forms that are more suited to parallel execution. The programmer will be aided by high-level cost information that will be integrated into the refactoring tools. The implementation of these patterns will then use a well-understood algorithmic skeleton approach to achieve good parallelism. A key ParaPhrase design goal is that parallel components are intended to match heterogeneous architectures, defined in terms of CPU/GPU combinations, for example. In order to achieve this, the ParaPhrase approach will map components at link time to the available hardware, and will then re-map them during program execution, taking account of multiple applications, changes in hardware resource availability, the desire to reduce communication costs etc. In this way, we aim to develop a new approach to programming that will be able to produce software that can adapt to dynamic changes in the system environment. Moreover, by using a strong component basis for parallelism, we can achieve potentially significant gains in terms of reducing sharing at a high level of abstraction, and so in reducing or even eliminating the costs that are usually associated with cache management, locking, and synchronisation. © 2013 Springer-Verlag Berlin Heidelberg.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Embedded real-time applications increasingly present high computation requirements, which need to be completed within specific deadlines, but that present highly variable patterns, depending on the set of data available in a determined instant. The current trend to provide parallel processing in the embedded domain allows providing higher processing power; however, it does not address the variability in the processing pattern. Dimensioning each device for its worst-case scenario implies lower average utilization, and increased available, but unusable, processing in the overall system. A solution for this problem is to extend the parallel execution of the applications, allowing networked nodes to distribute the workload, on peak situations, to neighbour nodes. In this context, this report proposes a framework to develop parallel and distributed real-time embedded applications, transparently using OpenMP and Message Passing Interface (MPI), within a programming model based on OpenMP. The technical report also devises an integrated timing model, which enables the structured reasoning on the timing behaviour of these hybrid architectures.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Genetic Programming (GP) is a widely used methodology for solving various computational problems. GP's problem solving ability is usually hindered by its long execution times. In this thesis, GP is applied toward real-time computer vision. In particular, object classification and tracking using a parallel GP system is discussed. First, a study of suitable GP languages for object classification is presented. Two main GP approaches for visual pattern classification, namely the block-classifiers and the pixel-classifiers, were studied. Results showed that the pixel-classifiers generally performed better. Using these results, a suitable language was selected for the real-time implementation. Synthetic video data was used in the experiments. The goal of the experiments was to evolve a unique classifier for each texture pattern that existed in the video. The experiments revealed that the system was capable of correctly tracking the textures in the video. The performance of the system was on-par with real-time requirements.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Distributed systems are one of the most vital components of the economy. The most prominent example is probably the internet, a constituent element of our knowledge society. During the recent years, the number of novel network types has steadily increased. Amongst others, sensor networks, distributed systems composed of tiny computational devices with scarce resources, have emerged. The further development and heterogeneous connection of such systems imposes new requirements on the software development process. Mobile and wireless networks, for instance, have to organize themselves autonomously and must be able to react to changes in the environment and to failing nodes alike. Researching new approaches for the design of distributed algorithms may lead to methods with which these requirements can be met efficiently. In this thesis, one such method is developed, tested, and discussed in respect of its practical utility. Our new design approach for distributed algorithms is based on Genetic Programming, a member of the family of evolutionary algorithms. Evolutionary algorithms are metaheuristic optimization methods which copy principles from natural evolution. They use a population of solution candidates which they try to refine step by step in order to attain optimal values for predefined objective functions. The synthesis of an algorithm with our approach starts with an analysis step in which the wanted global behavior of the distributed system is specified. From this specification, objective functions are derived which steer a Genetic Programming process where the solution candidates are distributed programs. The objective functions rate how close these programs approximate the goal behavior in multiple randomized network simulations. The evolutionary process step by step selects the most promising solution candidates and modifies and combines them with mutation and crossover operators. This way, a description of the global behavior of a distributed system is translated automatically to programs which, if executed locally on the nodes of the system, exhibit this behavior. In our work, we test six different ways for representing distributed programs, comprising adaptations and extensions of well-known Genetic Programming methods (SGP, eSGP, and LGP), one bio-inspired approach (Fraglets), and two new program representations called Rule-based Genetic Programming (RBGP, eRBGP) designed by us. We breed programs in these representations for three well-known example problems in distributed systems: election algorithms, the distributed mutual exclusion at a critical section, and the distributed computation of the greatest common divisor of a set of numbers. Synthesizing distributed programs the evolutionary way does not necessarily lead to the envisaged results. In a detailed analysis, we discuss the problematic features which make this form of Genetic Programming particularly hard. The two Rule-based Genetic Programming approaches have been developed especially in order to mitigate these difficulties. In our experiments, at least one of them (eRBGP) turned out to be a very efficient approach and in most cases, was superior to the other representations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present an optimal methodology for synchronized scheduling of production assembly with air transportation to achieve accurate delivery with minimized cost in consumer electronics supply chain (CESC). This problem was motivated by a major PC manufacturer in consumer electronics industry, where it is required to schedule the delivery requirements to meet the customer needs in different parts of South East Asia. The overall problem is decomposed into two sub-problems which consist of an air transportation allocation problem and an assembly scheduling problem. The air transportation allocation problem is formulated as a Linear Programming Problem with earliness tardiness penalties for job orders. For the assembly scheduling problem, it is basically required to sequence the job orders on the assembly stations to minimize their waiting times before they are shipped by flights to their destinations. Hence the second sub-problem is modelled as a scheduling problem with earliness penalties. The earliness penalties are assumed to be independent of the job orders.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Present operating systems are not built to support parallel computing on clusters - they do not provide services to manage parallelism, i.e., to manage parallel processes and cluster resources. They do not provide support for both programming paradigms, Message Passing (MP) or Distributed Shared Memory (DSM). Due to poor operating systems, users must deal with computers of a cluster rather than to see this cluster as a single powerful computer. There is a need for cluster operating systems. We claim that it is possible to develop a cluster operating system that is able to efficiently manage parallelism, support MP and DSM and offer transparency. To substantiate this claim the first version of a cluster operating system managing parallelism and offering transparency, called GENESIS, has been developed.