995 resultados para Parallel execution


Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper, we examine the issue of memory management in the parallel execution of logic programs. We concentrate on non-deterministic and-parallel schemes which we believe present a relatively general set of problems to be solved, including most of those encountered in the memory management of or-parallel systems. We present a distributed stack memory management model which allows flexible scheduling of goals. Previously proposed models (based on the "Marker model") are lacking in that they impose restrictions on the selection of goals to be executed or they may require consume a large amount of virtual memory. This paper first presents results which imply that the above mentioned shortcomings can have significant performance impacts. An extension of the Marker Model is then proposed which allows flexible scheduling of goals while keeping (virtual) memory consumption down. Measurements are presented which show the advantage of this solution. Methods for handling forward and backward execution, cut and roll back are discussed in the context of the proposed scheme. In addition, the paper shows how the same mechanism for flexible scheduling can be applied to allow the efficient handling of the very general form of suspension that can occur in systems which combine several types of and-parallelism and more sophisticated methods of executing logic programs. We believe that the results are applicable to many and- and or-parallel systems.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper presents an approximation to the study of parallel systems using sequential tools. The Independent And-parallelism in Prolog is an example of parallel processing paradigm in the framework of logic programming, and implementations like parallel processing. But this potential can also be explored using only sequential systems. Being the spirit of this paper to show how this can be done with a standard system, only standard Prolog will be used in the implementations included. Such implementations include tests for parallelism in And-Prolog, a correctnesschecking meta-interpreter of parallel execution for

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper presents an approximation to the study of parallel systems using sequential tools. The Independent And-parallelism in Prolog is an example of parallel processing paradigm in the framework of logic programming, and implementations like parallel processing. But this potential can also be explored using only sequential systems. Being the spirit of this paper to show how this can be done with a standard system, only standard Prolog will be used in the implementations included. Such implementations include tests for parallelism in And-Prolog, a correctnesschecking meta-interpreter of parallel execution for

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We describe a compiler for the Flat Concurrent Prolog language on a message passing multiprocessor architecture. This compiler permits symbolic and declarative programming in the syntax of Guarded Horn Rules, The implementation has been verified and tested on the 64-node PARAM parallel computer developed by C-DAC (Centre for the Development of Advanced Computing, India), Flat Concurrent Prolog (FCP) is a logic programming language designed for concurrent programming and parallel execution, It is a process oriented language, which embodies dataflow synchronization and guarded-command as its basic control mechanisms. An identical algorithm is executed on every processor in the network, We assume regular network topologies like mesh, ring, etc, Each node has a local memory, The algorithm comprises of two important parts: reduction and communication, The most difficult task is to integrate the solutions of problems that arise in the implementation in a coherent and efficient manner. We have tested the efficacy of the compiler on various benchmark problems of the ICOT project that have been reported in the recent book by Evan Tick, These problems include Quicksort, 8-queens, and Prime Number Generation, The results of the preliminary tests are favourable, We are currently examining issues like indexing and load balancing to further optimize our compiler.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Accurate and timely prediction of weather phenomena, such as hurricanes and flash floods, require high-fidelity compute intensive simulations of multiple finer regions of interest within a coarse simulation domain. Current weather applications execute these nested simulations sequentially using all the available processors, which is sub-optimal due to their sub-linear scalability. In this work, we present a strategy for parallel execution of multiple nested domain simulations based on partitioning the 2-D processor grid into disjoint rectangular regions associated with each domain. We propose a novel combination of performance prediction, processor allocation methods and topology-aware mapping of the regions on torus interconnects. Experiments on IBM Blue Gene systems using WRF show that the proposed strategies result in performance improvement of up to 33% with topology-oblivious mapping and up to additional 7% with topology-aware mapping over the default sequential strategy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

With the availability of a wide range of cloud Virtual Machines (VMs) it is difficult to determine which VMs can maximise the performance of an application. Benchmarking is commonly used to this end for capturing the performance of VMs. Most cloud benchmarking techniques are typically heavyweight - time consuming processes which have to benchmark the entire VM in order to obtain accurate benchmark data. Such benchmarks cannot be used in real-time on the cloud and incur extra costs even before an application is deployed.

In this paper, we present lightweight cloud benchmarking techniques that execute quickly and can be used in near real-time on the cloud. The exploration of lightweight benchmarking techniques are facilitated by the development of DocLite - Docker Container-based Lightweight Benchmarking. DocLite is built on the Docker container technology which allows a user-defined portion (such as memory size and the number of CPU cores) of the VM to be benchmarked. DocLite operates in two modes, in the first mode, containers are used to benchmark a small portion of the VM to generate performance ranks. In the second mode, historic benchmark data is used along with the first mode as a hybrid to generate VM ranks. The generated ranks are evaluated against three scientific high-performance computing applications. The proposed techniques are up to 91 times faster than a heavyweight technique which benchmarks the entire VM. It is observed that the first mode can generate ranks with over 90% and 86% accuracy for sequential and parallel execution of an application. The hybrid mode improves the correlation slightly but the first mode is sufficient for benchmarking cloud VMs.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Existing benchmarking methods are time consuming processes as they typically benchmark the entire Virtual Machine (VM) in order to generate accurate performance data, making them less suitable for real-time analytics. The research in this paper is aimed to surmount the above challenge by presenting DocLite - Docker Container-based Lightweight benchmarking tool. DocLite explores lightweight cloud benchmarking methods for rapidly executing benchmarks in near real-time. DocLite is built on the Docker container technology, which allows a user-defined memory size and number of CPU cores of the VM to be benchmarked. The tool incorporates two benchmarking methods - the first referred to as the native method employs containers to benchmark a small portion of the VM and generate performance ranks, and the second uses historic benchmark data along with the native method as a hybrid to generate VM ranks. The proposed methods are evaluated on three use-cases and are observed to be up to 91 times faster than benchmarking the entire VM. In both methods, small containers provide the same quality of rankings as a large container. The native method generates ranks with over 90% and 86% accuracy for sequential and parallel execution of an application compared against benchmarking the whole VM. The hybrid method did not improve the quality of the rankings significantly.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper surveys control architectures proposed in the literature and describes a control architecture that is being developed for a semi-autonomous underwater vehicle for intervention missions (SAUVIM) at the University of Hawaii. Conceived as hybrid, this architecture has been organized in three layers: planning, control and execution. The mission is planned with a sequence of subgoals. Each subgoal has a related task supervisor responsible for arranging a set of pre-programmed task modules in order to achieve the subgoal. Task modules are the key concept of the architecture. They are the main building blocks and can be dynamically re-arranged by the task supervisor. In our architecture, deliberation takes place at the planning layer while reaction is dealt through the parallel execution of the task modules. Hence, the system presents both a hierarchical and an heterarchical decomposition, being able to show a predictable response while keeping rapid reactivity to the dynamic environment

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We examine numerical performance of various methods of calculation of the Conditional Value-at-risk (CVaR), and portfolio optimization with respect to this risk measure. We concentrate on the method proposed by Rockafellar and Uryasev in (Rockafellar, R.T. and Uryasev, S., 2000, Optimization of conditional value-at-risk. Journal of Risk, 2, 21-41), which converts this problem to that of convex optimization. We compare the use of linear programming techniques against a non-smooth optimization method of the discrete gradient, and establish the supremacy of the latter. We show that non-smooth optimization can be used efficiently for large portfolio optimization, and also examine parallel execution of this method on computer clusters.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The future of computing lies with distributed systems, i.e. a network of workstations controlled by a modern distributed operating system. By supporting load balancing and parallel execution, the overall performance of a distributed system can be improved dramatically. Process migration, the act of moving a running process from a highly loaded machine to a lightly loaded machine, could be used to support load balancing, parallel execution, reliability etc. This thesis identifies the problems past process migration facilities have had and determines the possible differing strategies that can be used to resolve these problems. The result of this analysis has led to a new design philosophy. This philosophy requires the design of a process migration facility and the design of an operating system to be conducted in parallel. Modern distributed operating systems follow the microkernel and client/server paradigms. Applying these design paradigms, in conjunction with the requirements of both process migration and a distributed operating system, results in a system where each resource is controlled by a separate server process. However, a process is a complex resource composed of simple resources such as data structures, an address space and communication state. For this reason, a process migration facility does not directly migrate the resources of a process. Instead, it requests the appropriate servers to transfer the resources. This novel solution yields a modular, high performance facility that is easy to create, debug and maintain. Furthermore, the design easily incorporates providing multiple migration strategies. In order to verify the validity of this design, a process migration facility was developed and tested within RHODOS (ResearcH Oriented Distributed Operating System). RHODOS is a modern microkernel and client/server based distributed operating system. In RHODOS, a process is composed of at least three separate resources: process state - maintained by a process manager, address space - maintained by a memory manager and communication state - maintained by an InterProcess Communication Manager (IPCM). The RHODOS multiple strategy migration manager utilises the services of the process, memory and IPC Managers to migrate the resources of a process. Performance testing of this facility indicates that this design is as fast or better than existing systems which use faster hardware. Furthermore, by studying the results of the performance test ing, the conditions under which a particular strategy should be employed have been identified. This thesis also addresses heterogeneous process migration. The current trend is to have islands of homogeneous workstations amid a sea of heterogeneity. From this situation and the current literature on the topic, heterogeneous process migration can be seen as too inefficient for general use. Instead, only homogeneous workstations should be used for process migration. This implies a need to locate homogeneous workstations. Entities called traders, which store and disseminate knowledge about the resources of several workstations, should be used to provide resource discovery. Resource discovery will enable the detection of homogeneous workstations to which processes can be migrated.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Distributed Shared Memory (DSM) provides programmers with a shared memory environment in systems where memory is not physically shared. Clusters of Workstations (COWs), an often untapped source of computing power, are characterised by a very low cost/performance ratio. The combination of Clusters of Workstations (COWs) with DSM provides an environment in which the programmer can use the well known approaches and methods of programming for physically shared memory systems and parallel processing can be carried out to make full use of the computing power and cost advantages of the COW. The aim of this research is to synthesise and develop a distributed shared memory system as an integral part of an operating system in order to provide application programmers with a convenient environment in which the development and execution of parallel applications can be done easily and efficiently, and which does this in a transparent manner. Furthermore, in order to satisfy our challenging design requirements we want to demonstrate that the operating system into which the DSM system is integrated should be a distributed operating system. In this thesis a study into the synthesis of a DSM system within a microkernel and client-server based distributed operating system which uses both strict and weak consistency models, with a write-invalidate and write-update based approach for consistency maintenance is reported. Furthermore a unique automatic initialisation system which allows the programmer to start the parallel execution of a group of processes with a single library call is reported. The number and location of these processes are determined by the operating system based on system load information. The DSM system proposed has a novel approach in that it provides programmers with a complete programming environment in which they are easily able to develop and run their code or indeed run existing shared memory code. A set of demanding DSM system design requirements are presented and the incentives for the placement of the DSM system with a distributed operating system and in particular in the memory management server have been reported. The new DSM system concentrated on an event-driven set of cooperating and distributed entities, and a detailed description of the events and reactions to these events that make up the operation of the DSM system is then presented. This is followed by a pseudocode form of the detailed design of the main modules and activities of the primitives used in the proposed DSM system. Quantitative results of performance tests and qualitative results showing the ease of programming and use of the RHODOS DSM system are reported. A study of five different application is given and the results of tests carried out on these applications together with a discussion of the results are given. A discussion of how RHODOS’ DSM allows programmers to write shared memory code in an easy to use and familiar environment and a comparative evaluation of RHODOS DSM with other DSM systems is presented. In particular, the ease of use and transparency of the DSM system have been demonstrated through the description of the ease with which a moderately inexperienced undergraduate programmer was able to convert, write and run applications for the testing of the DSM system. Furthermore, the description of the tests performed using physically shared memory shows that the latter is indistinguishable from distributed shared memory; this is further evidence that the DSM system is fully transparent. This study clearly demonstrates that the aim of the research has been achieved; it is possible to develop a programmer friendly and efficient DSM system fully integrated within a distributed operating system. It is clear from this research that client-server and microkernel based distributed operating system integrated DSM makes shared memory operations transparent and almost completely removes the involvement of the programmer beyond classical activities needed to deal with shared memory. The conclusion can be drawn that DSM, when implemented within a client-server and microkernel based distributed operating system, is one of the most encouraging approaches to parallel processing since it guarantees performance improvements with minimal programmer involvement.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Software corpora facilitate reproducibility of analyses, however, static analysis for an entire corpus still requires considerable effort, often duplicated unnecessarily by multiple users. Moreover, most corpora are designed for single languages increasing the effort for cross-language analysis. To address these aspects we propose Pangea, an infrastructure allowing fast development of static analyses on multi-language corpora. Pangea uses language-independent meta-models stored as object model snapshots that can be directly loaded into memory and queried without any parsing overhead. To reduce the effort of performing static analyses, Pangea provides out-of-the box support for: creating and refining analyses in a dedicated environment, deploying an analysis on an entire corpus, using a runner that supports parallel execution, and exporting results in various formats. In this tool demonstration we introduce Pangea and provide several usage scenarios that illustrate how it reduces the cost of analysis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Several types of parallelism can be exploited in logic programs while preserving correctness and efficiency, i.e. ensuring that the parallel execution obtains the same results as the sequential one and the amount of work performed is not greater. However, such results do not take into account a number of overheads which appear in practice, such as process creation and scheduling, which can induce a slow-down, or, at least, limit speedup, if they are not controlled in some way. This paper describes a methodology whereby the granularity of parallel tasks, i.e. the work available under them, is efficiently estimated and used to limit parallelism so that the effect of such overheads is controlled. The run-time overhead associated with the approach is usually quite small, since as much work is done at compile time as possible. Also,a number of run-time optimizations are proposed. Moreover, a static analysis of the overhead associated with the granularity control process is performed in order to decide its convenience. The performance improvements resulting from the incorporation of grain size control are shown to be quite good, specially for systems with medium to large parallel execution overheads.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Studying independence of goals has proven very useful in the context of logic programming. In particular, it has provided a formal basis for powerful automatic parallelization tools, since independence ensures that two goals may be evaluated in parallel while preserving correctness and eciency. We extend the concept of independence to constraint logic programs (CLP) and prove that it also ensures the correctness and eciency of the parallel evaluation of independent goals. Independence for CLP languages is more complex than for logic programming as search space preservation is necessary but no longer sucient for ensuring correctness and eciency. Two additional issues arise. The rst is that the cost of constraint solving may depend upon the order constraints are encountered. The second is the need to handle dynamic scheduling. We clarify these issues by proposing various types of search independence and constraint solver independence, and show how they can be combined to allow dierent optimizations, from parallelism to intelligent backtracking. Sucient conditions for independence which can be evaluated \a priori" at run-time are also proposed. Our study also yields new insights into independence in logic programming languages. In particular, we show that search space preservation is not only a sucient but also a necessary condition for ensuring correctness and eciency of parallel execution.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Traditional schemes for abstract interpretation-based global analysis of logic programs generally focus on obtaining procedure argument mode and type information. Variable sharing information is often given only the attention needed to preserve the correctness of the analysis. However, such sharing information can be very useful. In particular, it can be used for predicting runtime goal independence, which can eliminate costly run-time checks in and-parallel execution. In this paper, a new algorithm for doing abstract interpretation in logic programs is described which concentrates on inferring the dependencies of the terms bound to program variables with increased precisión and at all points in the execution of the program, rather than just at a procedure level. Algorithms are presented for computing abstract entry and success substitutions which extensively keep track of variable aliasing and term dependence information. In addition, a new, abstract domain independent ñxpoint algorithm is presented and described in detail. The algorithms are illustrated with examples. Finally, results from an implementation of the abstract interpreter are presented.