958 resultados para Parallel programming (computer science)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The InteGrade project is a multi-university effort to build a novel grid computing middleware based on the opportunistic use of resources belonging to user workstations. The InteGrade middleware currently enables the execution of sequential, bag-of-tasks, and parallel applications that follow the BSP or the MPI programming models. This article presents the lessons learned over the last five years of the InteGrade development and describes the solutions achieved concerning the support for robust application execution. The contributions cover the related fields of application scheduling, execution management, and fault tolerance. We present our solutions, describing their implementation principles and evaluation through the analysis of several experimental results. (C) 2010 Elsevier Inc. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The InteGrade middleware intends to exploit the idle time of computing resources in computer laboratories. In this work we investigate the performance of running parallel applications with communication among processors on the InteGrade grid. As costly communication on a grid can be prohibitive, we explore the so-called systolic or wavefront paradigm to design the parallel algorithms in which no global communication is used. To evaluate the InteGrade middleware we considered three parallel algorithms that solve the matrix chain product problem, the 0-1 Knapsack Problem, and the local sequence alignment problem, respectively. We show that these three applications running under the InteGrade middleware and MPI take slightly more time than the same applications running on a cluster with only LAM-MPI support. The results can be considered promising and the time difference between the two is not substantial. The overhead of the InteGrade middleware is acceptable, in view of the benefits obtained to facilitate the use of grid computing by the user. These benefits include job submission, checkpointing, security, job migration, etc. Copyright (C) 2009 John Wiley & Sons, Ltd.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Relevant results for (sub-)distribution functions related to parallel systems are discussed. The reverse hazard rate is defined using the product integral. Consequently, the restriction of absolute continuity for the involved distributions can be relaxed. The only restriction is that the sets of discontinuity points of the parallel distributions have to be disjointed. Nonparametric Bayesian estimators of all survival (sub-)distribution functions are derived. Dual to the series systems that use minimum life times as observations, the parallel systems record the maximum life times. Dirichlet multivariate processes forming a class of prior distributions are considered for the nonparametric Bayesian estimation of the component distribution functions, and the system reliability. For illustration, two striking numerical examples are presented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A programming style can be seen as a particular model of shaping thought or a special way of codifying language to solve a problem. Adaptive languages have the basic feature of allowing the expression of programs which self-modifying through adaptive actions at runtime. The conception of such languages calls for a new programming style, since the application of adaptive technology in the field of programming languages suggests a new way of thinking. With the adaptive style, programming language codes can be structured in such a way that the codified program therein modifies or adapts itself towards the needs of the problem. The adaptive programming style may be a feasible alternate way to obtain self-modifying consistent codes, which allow its use in modern applications for self-modifying code.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Software Transactional Memory (STM) systems have poor performance under high contention scenarios. Since many transactions compete for the same data, most of them are aborted, wasting processor runtime. Contention management policies are typically used to avoid that, but they are passive approaches as they wait for an abort to happen so they can take action. More proactive approaches have emerged, trying to predict when a transaction is likely to abort so its execution can be delayed. Such techniques are limited, as they do not replace the doomed transaction by another or, when they do, they rely on the operating system for that, having little or no control on which transaction should run. In this paper we propose LUTS, a Lightweight User-Level Transaction Scheduler, which is based on an execution context record mechanism. Unlike other techniques, LUTS provides the means for selecting another transaction to run in parallel, thus improving system throughput. Moreover, it avoids most of the issues caused by pseudo parallelism, as it only launches as many system-level threads as the number of available processor cores. We discuss LUTS design and present three conflict-avoidance heuristics built around LUTS scheduling capabilities. Experimental results, conducted with STMBench7 and STAMP benchmark suites, show LUTS efficiency when running high contention applications and how conflict-avoidance heuristics can improve STM performance even more. In fact, our transaction scheduling techniques are capable of improving program performance even in overloaded scenarios. 2011 Springer-Verlag.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Software transaction memory (STM) systems have been used as an approach to improve performance, by allowing the concurrent execution of atomic blocks. However, under high-contention workloads, STM-based systems can considerably degrade performance, as transaction conflict rate increases. Contention management policies have been used as a way to select which transaction to abort when a conflict occurs. In general, contention managers are not capable of avoiding conflicts, as they can only select which transaction to abort and the moment it should restart. Since contention managers act only after a conflict is detected, it becomes harder to effectively increase transaction throughput. More proactive approaches have emerged, aiming at predicting when a transaction is likely to abort, postponing its execution. Nevertheless, most of the proposed proactive techniques are limited, as they do not replace the doomed transaction by another or, when they do, they rely on the operating system for that, having little or no control on which transaction to run. This article proposes LUTS, a lightweight user-level transaction scheduler. Unlike other techniques, LUTS provides the means for selecting another transaction to run in parallel, thus improving system throughput. We discuss LUTS design and propose a dynamic conflict-avoidance heuristic built around its scheduling capabilities. Experimental results, conducted with the STAMP and STMBench7 benchmark suites, running on TinySTM and SwissTM, show how our conflict-avoidance heuristic can effectively improve STM performance on high contention applications. 2012 Springer Science+Business Media, LLC.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Research on the micro-structural characterization of metal-matrix composites uses X-ray computed tomography to collect information about the interior features of the samples, in order to elucidate their exhibited properties. The tomographic raw data needs several steps of computational processing in order to eliminate noise and interference. Our experience with a program (Tritom) that handles these questions has shown that in some cases the processing steps take a very long time and that it is not easy for a Materials Science specialist to interact with Tritom in order to define the most adequate parameter values and the proper sequence of the available processing steps. For easing the use of Tritom, a system was built which addresses the aspects described before and that is based on the OpenDX visualization system. OpenDX visualization facilities constitute a great benefit to Tritom. The visual programming environment of OpenDX allows an easy definition of a sequence of processing steps thus fulfilling the requirement of an easy use by non-specialists on Computer Science. Also the possibility of incorporating external modules in a visual OpenDX program allows the researchers to tackle the aspect of reducing the long execution time of some processing steps. The longer processing steps of Tritom have been parallelized in two different types of hardware architectures (message-passing and shared-memory); the corresponding parallel programs can be easily incorporated in a sequence of processing steps defined in an OpenDX program. The benefits of our system are illustrated through an example where the tool is applied in the study of the sensitivity to crushing and the implications thereof of the reinforcements used in a functionally graded syntactic metallic foam.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Consider the NP-hard problem of, given a simple graph G, to find a series-parallel subgraph of G with the maximum number of edges. The algorithm that, given a connected graph G, outputs a spanning tree of G, is a 1/2-approximation. Indeed, if n is the number of vertices in G, any spanning tree in G has n-1 edges and any series-parallel graph on n vertices has at most 2n-3 edges. We present a 7/12 -approximation for this problem and results showing the limits of our approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Field-Programmable Gate Arrays (FPGAs) are becoming increasingly important in embedded and high-performance computing systems. They allow performance levels close to the ones obtained with Application-Specific Integrated Circuits, while still keeping design and implementation flexibility. However, to efficiently program FPGAs, one needs the expertise of hardware developers in order to master hardware description languages (HDLs) such as VHDL or Verilog. Attempts to furnish a high-level compilation flow (e.g., from C programs) still have to address open issues before broader efficient results can be obtained. Bearing in mind an FPGA available resources, it has been developed LALP (Language for Aggressive Loop Pipelining), a novel language to program FPGA-based accelerators, and its compilation framework, including mapping capabilities. The main ideas behind LALP are to provide a higher abstraction level than HDLs, to exploit the intrinsic parallelism of hardware resources, and to allow the programmer to control execution stages whenever the compiler techniques are unable to generate efficient implementations. Those features are particularly useful to implement loop pipelining, a well regarded technique used to accelerate computations in several application domains. This paper describes LALP, and shows how it can be used to achieve high-performance computing solutions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Interactive theorem provers are tools designed for the certification of formal proofs developed by means of man-machine collaboration. Formal proofs obtained in this way cover a large variety of logical theories, ranging from the branches of mainstream mathematics, to the field of software verification. The border between these two worlds is marked by results in theoretical computer science and proofs related to the metatheory of programming languages. This last field, which is an obvious application of interactive theorem proving, poses nonetheless a serious challenge to the users of such tools, due both to the particularly structured way in which these proofs are constructed, and to difficulties related to the management of notions typical of programming languages like variable binding. This thesis is composed of two parts, discussing our experience in the development of the Matita interactive theorem prover and its use in the mechanization of the metatheory of programming languages. More specifically, part I covers: - the results of our effort in providing a better framework for the development of tactics for Matita, in order to make their implementation and debugging easier, also resulting in a much clearer code; - a discussion of the implementation of two tactics, providing infrastructure for the unification of constructor forms and the inversion of inductive predicates; we point out interactions between induction and inversion and provide an advancement over the state of the art. In the second part of the thesis, we focus on aspects related to the formalization of programming languages. We describe two works of ours: - a discussion of basic issues we encountered in our formalizations of part 1A of the Poplmark challenge, where we apply the extended inversion principles we implemented for Matita; - a formalization of an algebraic logical framework, posing more complex challenges, including multiple binding and a form of hereditary substitution; this work adopts, for the encoding of binding, an extension of Masahiko Sato's canonical locally named representation we designed during our visit to the Laboratory for Foundations of Computer Science at the University of Edinburgh, under the supervision of Randy Pollack.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Complex networks analysis is a very popular topic in computer science. Unfortunately this networks, extracted from different contexts, are usually very large and the analysis may be very complicated: computation of metrics on these structures could be very complex. Among all metrics we analyse the extraction of subnetworks called communities: they are groups of nodes that probably play the same role within the whole structure. Communities extraction is an interesting operation in many different fields (biology, economics,...). In this work we present a parallel community detection algorithm that can operate on networks with huge number of nodes and edges. After an introduction to graph theory and high performance computing, we will explain our design strategies and our implementation. Then, we will show some performance evaluation made on a distributed memory architectures i.e. the supercomputer IBM-BlueGene/Q "Fermi" at the CINECA supercomputing center, Italy, and we will comment our results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In vielen Bereichen der industriellen Fertigung, wie zum Beispiel in der Automobilindustrie, wer- den digitale Versuchsmodelle (sog. digital mock-ups) eingesetzt, um die Entwicklung komplexer Maschinen m oglichst gut durch Computersysteme unterstu tzen zu k onnen. Hierbei spielen Be- wegungsplanungsalgorithmen eine wichtige Rolle, um zu gew ahrleisten, dass diese digitalen Pro- totypen auch kollisionsfrei zusammengesetzt werden k onnen. In den letzten Jahrzehnten haben sich hier sampling-basierte Verfahren besonders bew ahrt. Diese erzeugen eine groe Anzahl von zuf alligen Lagen fu r das ein-/auszubauende Objekt und verwenden einen Kollisionserken- nungsmechanismus, um die einzelnen Lagen auf Gu ltigkeit zu u berpru fen. Daher spielt die Kollisionserkennung eine wesentliche Rolle beim Design effizienter Bewegungsplanungsalgorith- men. Eine Schwierigkeit fu r diese Klasse von Planern stellen sogenannte narrow passages dar, schmale Passagen also, die immer dort auftreten, wo die Bewegungsfreiheit der zu planenden Objekte stark eingeschr ankt ist. An solchen Stellen kann es schwierig sein, eine ausreichende Anzahl von kollisionsfreien Samples zu finden. Es ist dann m oglicherweise n otig, ausgeklu geltere Techniken einzusetzen, um eine gute Performance der Algorithmen zu erreichen.rnDie vorliegende Arbeit gliedert sich in zwei Teile: Im ersten Teil untersuchen wir parallele Kollisionserkennungsalgorithmen. Da wir auf eine Anwendung bei sampling-basierten Bewe- gungsplanern abzielen, w ahlen wir hier eine Problemstellung, bei der wir stets die selben zwei Objekte, aber in einer groen Anzahl von unterschiedlichen Lagen auf Kollision testen. Wir im- plementieren und vergleichen verschiedene Verfahren, die auf Hu llk operhierarchien (BVHs) und hierarchische Grids als Beschleunigungsstrukturen zuru ckgreifen. Alle beschriebenen Verfahren wurden auf mehreren CPU-Kernen parallelisiert. Daru ber hinaus vergleichen wir verschiedene CUDA Kernels zur Durchfu hrung BVH-basierter Kollisionstests auf der GPU. Neben einer un- terschiedlichen Verteilung der Arbeit auf die parallelen GPU Threads untersuchen wir hier die Auswirkung verschiedener Speicherzugriffsmuster auf die Performance der resultierenden Algo- rithmen. Weiter stellen wir eine Reihe von approximativen Kollisionstests vor, die auf den beschriebenen Verfahren basieren. Wenn eine geringere Genauigkeit der Tests tolerierbar ist, kann so eine weitere Verbesserung der Performance erzielt werden.rnIm zweiten Teil der Arbeit beschreiben wir einen von uns entworfenen parallelen, sampling- basierten Bewegungsplaner zur Behandlung hochkomplexer Probleme mit mehreren narrow passages. Das Verfahren arbeitet in zwei Phasen. Die grundlegende Idee ist hierbei, in der er- sten Planungsphase konzeptionell kleinere Fehler zuzulassen, um die Planungseffizienz zu erh ohen und den resultierenden Pfad dann in einer zweiten Phase zu reparieren. Der hierzu in Phase I eingesetzte Planer basiert auf sogenannten Expansive Space Trees. Zus atzlich haben wir den Planer mit einer Freidru ckoperation ausgestattet, die es erlaubt, kleinere Kollisionen aufzul osen und so die Effizienz in Bereichen mit eingeschr ankter Bewegungsfreiheit zu erh ohen. Optional erlaubt unsere Implementierung den Einsatz von approximativen Kollisionstests. Dies setzt die Genauigkeit der ersten Planungsphase weiter herab, fu hrt aber auch zu einer weiteren Perfor- mancesteigerung. Die aus Phase I resultierenden Bewegungspfade sind dann unter Umst anden nicht komplett kollisionsfrei. Um diese Pfade zu reparieren, haben wir einen neuartigen Pla- nungsalgorithmus entworfen, der lokal beschr ankt auf eine kleine Umgebung um den bestehenden Pfad einen neuen, kollisionsfreien Bewegungspfad plant.rnWir haben den beschriebenen Algorithmus mit einer Klasse von neuen, schwierigen Metall- Puzzlen getestet, die zum Teil mehrere narrow passages aufweisen. Unseres Wissens nach ist eine Sammlung vergleichbar komplexer Benchmarks nicht offentlich zug anglich und wir fan- den auch keine Beschreibung von vergleichbar komplexen Benchmarks in der Motion-Planning Literatur.