995 resultados para Parallel execution


Relevância:

60.00% 60.00%

Publicador:

Resumo:

We consider the problem of supporting goal-level, independent andparallelism (IAP) in the presence of non-determinism. IAP is exploited when two or more goals which will not interfere at run time are scheduled for simultaneous execution. Backtracking over non-deterministic parallel goals runs into the wellknown trapped goal and garbage slot problems. The proposed solutions for these problems generally require complex low-level machinery which makes systems difficult to maintain and extend, and in some cases can even affect sequential execution performance. In this paper we propose a novel solution to the problem of trapped nondeterministic goals and garbage slots which is based on a single stack reordering operation and offers several advantages over previous proposals. While the implementation of this operation itself is not simple, in return it does not impose constraints on the scheduler. As a result, the scheduler and the rest of the run-time machinery can safely ignore the trapped goal and garbage slot problems and their implementation is greatly simplified. Also, standard sequential execution remains unaffected. In addition to describing the solution we report on an implementation and provide performance results. We also suggest other possible applications of the proposed approach beyond parallel execution.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Andorra Kernel language scheme was aimed, in principle, at simultaneously supporting the programming styles of Prolog and committed choice languages. Within the constraint programming paradigm, this family of languages could also in principle support the concurrent constraint paradigm. This happens for the Agents Kernel Language (AKL). On the other hand, AKL requires a somewhat detailed specification of control by the user. This could be avoided by programming in CLP to run on AKL. However, CLP programs cannot be executed directly on AKL. This is due to a number of factors, from more or less trivial syntactic differences to more involved issues such as the treatment of cut and making the exploitation of certain types of parallelism possible. This paper provides a translation scheme which is a basis of an automatic compiler of CLP programs into AKL, which can bridge those differences. In addition to supporting CLP, our style of translation achieves independent and-parallel execution where possible, which is relevant since this type of parallel execution preserves, through the translation, the user-perceived "complexity" of the original program.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper performs a further generalization of the notion of independence in constraint logic programs to the context of constraint logic programs with dynamic scheduling. The complexity of this new environment made necessary to first formally define the relationship between independence and search space preservation in the context of CLP languages. In particular, we show that search space preservation is, in the context of CLP languages, not only a sufficient but also a necessary condition for ensuring that both the intended solutions and the number of transitions performed do not change. These results are then extended to dynamically scheduled languages and used as the basis for the extension of the concepts of independence. We also propose several a priori sufficient conditions for independence and also give correctness and efficiency results for parallel execution of constraint logic programs based on the proposed notions of independence.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

El objetivo de este proyecto es evaluar la mejora de rendimiento que aporta la paralelización de algoritmos de procesamiento de imágenes, para su ejecución en una tarjeta gráfica. Para ello, una vez seleccionados los algoritmos a estudio, fueron desarrollados en lenguaje C++ bajo el paradigma secuencial. A continuación, tomando como base estas implementaciones, se paralelizaron siguiendo las directivas de la tecnología CUDA (Compute Unified Device Architecture) desarrollada por NVIDIA. Posteriormente, se desarrolló un interfaz gráfico de usuario en Visual C#, para una utilización más sencilla de la herramienta. Por último, se midió el rendimiento de cada uno de los algoritmos, en términos de tiempo de ejecución paralela y speedup, mediante el procesamiento de una serie de imágenes de distintos tamaños.---ABSTRACT---The aim of this Project is to evaluate the performance improvement provided by the parallelization of image processing algorithms, which will be executed on a graphics processing unit. In order to do this, once the algorithms to study were selected, each of them was developed in C++ under sequential paradigm. Then, based on these implementations, these algorithms were implemented using the compute unified device architecture (CUDA) programming model provided by NVIDIA. After that, a graphical user interface (GUI) was developed to increase application’s usability. Finally, performance of each algorithm was measured in terms of parallel execution time and speedup by processing a set of images of different sizes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents a new dynamic visual control system for redundant robots with chaos compensation. In order to implement the visual servoing system, a new architecture is proposed that improves the system maintainability and traceability. Furthermore, high performance is obtained as a result of parallel execution of the different tasks that compose the architecture. The control component of the architecture implements a new visual servoing technique for resolving the redundancy at the acceleration level in order to guarantee the correct motion of both end-effector and joints. The controller generates the required torques for the tracking of image trajectories. However, in order to guarantee the applicability of this technique, a repetitive path tracked by the robot-end must produce a periodic joint motion. A chaos controller is integrated in the visual servoing system and the correct performance is observed in low and high velocities. Furthermore, a method to adjust the chaos controller is proposed and validated using a real three-link robot.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A hard combinatorial problem is investigated which has useful application in design of discrete devices: the two-block decomposition of a partial Boolean function. The key task is regarded: finding such a weak partition on the set of arguments, at which the considered function can be decomposed. Solving that task is essentially speeded up by the way of preliminary discovering traces of the sought-for partition. Efficient combinatorial operations are used by that, based on parallel execution of operations above adjacent units in the Boolean space.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Cumulon is a system aimed at simplifying the development and deployment of statistical analysis of big data in public clouds. Cumulon allows users to program in their familiar language of matrices and linear algebra, without worrying about how to map data and computation to specific hardware and cloud software platforms. Given user-specified requirements in terms of time, monetary cost, and risk tolerance, Cumulon automatically makes intelligent decisions on implementation alternatives, execution parameters, as well as hardware provisioning and configuration settings -- such as what type of machines and how many of them to acquire. Cumulon also supports clouds with auction-based markets: it effectively utilizes computing resources whose availability varies according to market conditions, and suggests best bidding strategies for them. Cumulon explores two alternative approaches toward supporting such markets, with different trade-offs between system and optimization complexity. Experimental study is conducted to show the efficiency of Cumulon's execution engine, as well as the optimizer's effectiveness in finding the optimal plan in the vast plan space.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents the Accurate Google Cloud Simulator (AGOCS) – a novel high-fidelity Cloud workload simulator based on parsing real workload traces, which can be conveniently used on a desktop machine for day-to-day research. Our simulation is based on real-world workload traces from a Google Cluster with 12.5K nodes, over a period of a calendar month. The framework is able to reveal very precise and detailed parameters of the executed jobs, tasks and nodes as well as to provide actual resource usage statistics. The system has been implemented in Scala language with focus on parallel execution and an easy-to-extend design concept. The paper presents the detailed structural framework for AGOCS and discusses our main design decisions, whilst also suggesting alternative and possibly performance enhancing future approaches. The framework is available via the Open Source GitHub repository.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In today's fast-paced and interconnected digital world, the data generated by an increasing number of applications is being modeled as dynamic graphs. The graph structure encodes relationships among data items, while the structural changes to the graphs as well as the continuous stream of information produced by the entities in these graphs make them dynamic in nature. Examples include social networks where users post status updates, images, videos, etc.; phone call networks where nodes may send text messages or place phone calls; road traffic networks where the traffic behavior of the road segments changes constantly, and so on. There is a tremendous value in storing, managing, and analyzing such dynamic graphs and deriving meaningful insights in real-time. However, a majority of the work in graph analytics assumes a static setting, and there is a lack of systematic study of the various dynamic scenarios, the complexity they impose on the analysis tasks, and the challenges in building efficient systems that can support such tasks at a large scale. In this dissertation, I design a unified streaming graph data management framework, and develop prototype systems to support increasingly complex tasks on dynamic graphs. In the first part, I focus on the management and querying of distributed graph data. I develop a hybrid replication policy that monitors the read-write frequencies of the nodes to decide dynamically what data to replicate, and whether to do eager or lazy replication in order to minimize network communication and support low-latency querying. In the second part, I study parallel execution of continuous neighborhood-driven aggregates, where each node aggregates the information generated in its neighborhoods. I build my system around the notion of an aggregation overlay graph, a pre-compiled data structure that enables sharing of partial aggregates across different queries, and also allows partial pre-computation of the aggregates to minimize the query latencies and increase throughput. Finally, I extend the framework to support continuous detection and analysis of activity-based subgraphs, where subgraphs could be specified using both graph structure as well as activity conditions on the nodes. The query specification tasks in my system are expressed using a set of active structural primitives, which allows the query evaluator to use a set of novel optimization techniques, thereby achieving high throughput. Overall, in this dissertation, I define and investigate a set of novel tasks on dynamic graphs, design scalable optimization techniques, build prototype systems, and show the effectiveness of the proposed techniques through extensive evaluation using large-scale real and synthetic datasets.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Workstation clusters equipped with high performance interconnect having programmable network processors facilitate interesting opportunities to enhance the performance of parallel application run on them. In this paper, we propose schemes where certain application level processing in parallel database query execution is performed on the network processor. We evaluate the performance of TPC-H queries executing on a high end cluster where all tuple processing is done on the host processor, using a timed Petri net model, and find that tuple processing costs on the host processor dominate the execution time. These results are validated using a small cluster. We therefore propose 4 schemes where certain tuple processing activity is offloaded to the network processor. The first 2 schemes offload the tuple splitting activity - computation to identify the node on which to process the tuples, resulting in an execution time speedup of 1.09 relative to the base scheme, but with I/O bus becoming the bottleneck resource. In the 3rd scheme in addition to offloading tuple processing activity, the disk and network interface are combined to avoid the I/O bus bottleneck, which results in speedups up to 1.16, but with high host processor utilization. Our 4th scheme where the network processor also performs apart of join operation along with the host processor, gives a speedup of 1.47 along with balanced system resource utilizations. Further we observe that the proposed schemes perform equally well even in a scaled architecture i.e., when the number of processors is increased from 2 to 64

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This thesis defines Pi, a parallel architecture interface that separates model and machine issues, allowing them to be addressed independently. This provides greater flexibility for both the model and machine builder. Pi addresses a set of common parallel model requirements including low latency communication, fast task switching, low cost synchronization, efficient storage management, the ability to exploit locality, and efficient support for sequential code. Since Pi provides generic parallel operations, it can efficiently support many parallel programming models including hybrids of existing models. Pi also forms a basis of comparison for architectural components.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Currently, coordinated scheduling of multiple parallel applications across computers has been considered as the critical factor to achieve high execution performance. We claim in this report that the performance and costs of the execution of parallel applications could be improved if not only dedicated clusters but also non-dedicated clusters were used and several parallel applications were executed concurreontly. To support this claim we carried out experimental study into the performance of multiple NAS parallel programs executing concurrently on a non-dedicated cluster.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

An enterprise has not only a single cluster but a set of geographically distributed clusters – they could be used to form an enterprise grid. In this paper we show based on our case study that enterprise grids could be efficiently used as parallel computers to carry out high-performance computing.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Computers of a non-dedicated cluster are often idle (users attend meetings, have lunch or coffee breaks) or lightly loaded (users carry out simple computations). These underutilized computers can be employed to execute parallel applications not only during weekends and at nights but also during office hours. Thus, they have to be shared by parallel and sequential applications which could lead to the improvement of their execution performance. However, there is a lack of experimental study showing the behavior and performance of parallel and sequential applications executing concurrently on clusters. We present here the result of an experimental study into load balancing based scheduling of a mixture of parallel and sequential applications on a non-dedicated cluster.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Dedicated clusters are becoming commonly used for high performance parallel processing. Computers of a non-dedicated cluster are often idle or lightly loaded. These under utilised computers can be employed to execute parallel applications. Thus, they have to be shared by parallel and sequential applications, which could lead to the improvement of their execution performance. There is a lack of experimental study showing the behaviour and performance of executing parallel and sequential applications concurrently on a non-dedicated cluster. We present the result of an experimental study into load balancing of a mixture of parallel and sequential applications on a non-dedicated cluster.