193 resultados para parallelization


Relevância:

20.00% 20.00%

Publicador:

Resumo:

[EN]Longest edge (nested) algorithms for triangulation refinement in two dimensions are able to produce hierarchies of quality and nested irregular triangulations as needed both for adaptive finite element methods and for multigrid methods. They can be formulated in terms of the longest edge propagation path (Lepp) and terminal edge concepts, to refine the target triangles and some related neighbors. We discuss a parallel multithread algorithm, where every thread is in charge of refining a triangle t and its associated Lepp neighbors. The thread manages a changing Lepp(t) (ordered set of increasing triangles) both to find a last longest (terminal) edge and to refine the pair of triangles sharing this edge...

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The aim of my thesis is to parallelize the Weighting Histogram Analysis Method (WHAM), which is a popular algorithm used to calculate the Free Energy of a molucular system in Molecular Dynamics simulations. WHAM works in post processing in cooperation with another algorithm called Umbrella Sampling. Umbrella Sampling has the purpose to add a biasing in the potential energy of the system in order to force the system to sample a specific region in the configurational space. Several N independent simulations are performed in order to sample all the region of interest. Subsequently, the WHAM algorithm is used to estimate the original system energy starting from the N atomic trajectories. The parallelization of WHAM has been performed through CUDA, a language that allows to work in GPUs of NVIDIA graphic cards, which have a parallel achitecture. The parallel implementation may sensibly speed up the WHAM execution compared to previous serial CPU imlementations. However, the WHAM CPU code presents some temporal criticalities to very high numbers of interactions. The algorithm has been written in C++ and executed in UNIX systems provided with NVIDIA graphic cards. The results were satisfying obtaining an increase of performances when the model was executed on graphics cards with compute capability greater. Nonetheless, the GPUs used to test the algorithm is quite old and not designated for scientific calculations. It is likely that a further performance increase will be obtained if the algorithm would be executed in clusters of GPU at high level of computational efficiency. The thesis is organized in the following way: I will first describe the mathematical formulation of Umbrella Sampling and WHAM algorithm with their apllications in the study of ionic channels and in Molecular Docking (Chapter 1); then, I will present the CUDA architectures used to implement the model (Chapter 2); and finally, the results obtained on model systems will be presented (Chapter 3).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Long-term electrocardiogram (ECG) often suffers from relevant noise. Baseline wander in particular is pronounced in ECG recordings using dry or esophageal electrodes, which are dedicated for prolonged registration. While analog high-pass filters introduce phase distortions, reliable offline filtering of the baseline wander implies a computational burden that has to be put in relation to the increase in signal-to-baseline ratio (SBR). Here we present a graphics processor unit (GPU) based parallelization method to speed up offline baseline wander filter algorithms, namely the wavelet, finite, and infinite impulse response, moving mean, and moving median filter. Individual filter parameters were optimized with respect to the SBR increase based on ECGs from the Physionet database superimposed to auto-regressive modeled, real baseline wander. A Monte-Carlo simulation showed that for low input SBR the moving median filter outperforms any other method but negatively affects ECG wave detection. In contrast, the infinite impulse response filter is preferred in case of high input SBR. However, the parallelized wavelet filter is processed 500 and 4 times faster than these two algorithms on the GPU, respectively, and offers superior baseline wander suppression in low SBR situations. Using a signal segment of 64 mega samples that is filtered as entire unit, wavelet filtering of a 7-day high-resolution ECG is computed within less than 3 seconds. Taking the high filtering speed into account, the GPU wavelet filter is the most efficient method to remove baseline wander present in long-term ECGs, with which computational burden can be strongly reduced.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we will see how the efficiency of the MBS simulations can be improved in two different ways, by considering both an explicit and implicit semi-recursive formulation. The explicit method is based on a double velocity transformation that involves the solution of a redundant but compatible system of equations. The high computational cost of this operation has been drastically reduced by taking into account the sparsity pattern of the system. Regarding this, the goal of this method is the introduction of MA48, a high performance mathematical library provided by Harwell Subroutine Library. The second method proposed in this paper has the particularity that, depending on the case, between 70 and 85% of the computation time is devoted to the evaluation of forces derivatives with respect to the relative position and velocity vectors. Keeping in mind that evaluating these derivatives can be decomposed into concurrent tasks, the main goal of this paper lies on a successful and straightforward parallel implementation that have led to a substantial improvement with a speedup of 3.2 by keeping all the cores busy in a quad-core processor and distributing the workload between them, achieving on this way a huge time reduction by doing an ideal CPU usage

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We report on a detailed study of the application and effectiveness of program analysis based on abstract interpretation to automatic program parallelization. We study the case of parallelizing logic programs using the notion of strict independence. We first propose and prove correct a methodology for the application in the parallelization task of the information inferred by abstract interpretation, using a parametric domain. The methodology is generic in the sense of allowing the use of different analysis domains. A number of well-known approximation domains are then studied and the transformation into the parametric domain defined. The transformation directly illustrates the relevance and applicability of each abstract domain for the application. Both local and global analyzers are then built using these domains and embedded in a complete parallelizing compiler. Then, the performance of the domains in this context is assessed through a number of experiments. A comparatively wide range of aspects is studied, from the resources needed by the analyzers in terms of time and memory to the actual benefits obtained from the information inferred. Such benefits are evaluated both in terms of the characteristics of the parallelized code and of the actual speedups obtained from it. The results show that data flow analysis plays an important role in achieving efficient parallelizations, and that the cost of such analysis can be reasonable even for quite sophisticated abstract domains. Furthermore, the results also offer significant insight into the characteristics of the domains, the demands of the application, and the trade-offs involved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Program specialization optimizes programs for known valúes of the input. It is often the case that the set of possible input valúes is unknown, or this set is infinite. However, a form of specialization can still be performed in such cases by means of abstract interpretation, specialization then being with respect to abstract valúes (substitutions), rather than concrete ones. We study the múltiple specialization of logic programs based on abstract interpretation. This involves in principie, and based on information from global analysis, generating several versions of a program predicate for different uses of such predicate, optimizing these versions, and, finally, producing a new, "multiply specialized" program. While múltiple specialization has received theoretical attention, little previous evidence exists on its practicality. In this paper we report on the incorporation of múltiple specialization in a parallelizing compiler and quantify its effects. A novel approach to the design and implementation of the specialization system is proposed. The resulting implementation techniques result in identical specializations to those of the best previously proposed techniques but require little or no modification of some existing abstract interpreters. Our results show that, using the proposed techniques, the resulting "abstract múltiple specialization" is indeed a relevant technique in practice. In particular, in the parallelizing compiler application, a good number of run-time tests are eliminated and invariants extracted automatically from loops, resulting generally in lower overheads and in several cases in increased speedups.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A framework for the automatic parallelization of (constraint) logic programs is proposed and proved correct. Intuitively, the parallelization process replaces conjunctions of literals with parallel expressions. Such expressions trigger at run-time the exploitation of restricted, goal-level, independent and-parallelism. The parallelization process performs two steps. The first one builds a conditional dependency graph (which can be implified using compile-time analysis information), while the second transforms the resulting graph into linear conditional expressions, the parallel expressions of the &-Prolog language. Several heuristic algorithms for the latter ("annotation") process are proposed and proved correct. Algorithms are also given which determine if there is any loss of parallelism in the linearization process with respect to a proposed notion of maximal parallelism. Finally, a system is presented which implements the proposed approach. The performance of the different annotation algorithms is compared experimentally in this system by studying the time spent in parallelization and the effectiveness of the results in terms of speedups.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Program specialization optimizes programs for known valúes of the input. It is often the case that the set of possible input valúes is unknown, or this set is infinite. However, a form of specialization can still be performed in such cases by means of abstract interpretation, specialization then being with respect to abstract valúes (substitutions), rather than concrete ones. This paper reports on the application of abstract múltiple specialization to automatic program parallelization in the &-Prolog compiler. Abstract executability, the main concept underlying abstract specialization, is formalized, the design of the specialization system presented, and a non-trivial example of specialization in automatic parallelization is given.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a study of the effectiveness of global analysis in the parallelization of logic programs using strict independence. A number of well-known approximation domains are selected and tlieir usefulness for the application in hand is explained. Also, methods for using the information provided by such domains to improve parallelization are proposed. Local and global analyses are built using these domains and such analyses are embedded in a complete parallelizing compiler. Then, the performance of the domains (and the system in general) is assessed for this application through a number of experiments. We argüe that the results offer significant insight into the characteristics of these domains, the demands of the application, and the tradeoffs involved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a study of the effectiveness of three different algorithms for the parallelization of logic programs based on compile-time detection of independence among goals. The algorithms are embedded in a complete parallelizing compiler, which incorporates different abstract interpretation-based program analyses. The complete system shows the task of automatic program parallelization to be practical. The trade-offs involved in using each of the algorithms in this task are studied experimentally, weaknesses of these identified, and possible improvements discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

There has been significant interest in parallel execution models for logic programs which exploit Independent And-Parallelism (IAP). In these models, it is necessary to determine which goals are independent and therefore eligible for parallel execution and which goals have to wait for which others during execution. Although this can be done at run-time, it can imply a very heavy overhead. In this paper, we present three algorithms for automatic compiletime parallelization of logic programs using IAP. This is done by converting a clause into a graph-based computational form and then transforming this graph into linear expressions based on &-Prolog, a language for IAP. We also present an algorithm which, given a clause, determines if there is any loss of parallelism due to linearization, for the case in which only unconditional parallelism is desired. Finally, the performance of these annotation algorithms is discussed for some benchmark programs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Irregular computations pose some of the most interesting and challenging problems in automatic parallelization. Irregularity appears in certain kinds of numerical problems and is pervasive in symbolic applications. Such computations often use dynamic data structures which make heavy use of pointers. This complicates all the steps of a parallelizing compiler, from independence detection to task partitioning and placement. In the past decade there has been significant progress in the development of parallelizing compilers for logic programming and, more recently, constraint programming. The typical applications of these paradigms frequently involve irregular computations, which arguably makes the techniques used in these compilers potentially interesting. In this paper we introduce in a tutorial way some of the problems faced by parallelizing compilers for logic and constraint programs. These include the need for inter-procedural pointer aliasing analysis for independence detection and having to manage speculative and irregular computations through task granularity control and dynamic task allocation. We also provide pointers to some of the progress made in these áreas. In the associated talk we demónstrate representatives of several generations of these parallelizing compilers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a conditional parallelization process for and-parallelism based on the notion of non-strict independence, a more relaxed notion than the traditional of strict independence. By using this notion, a parallelism annotator can extract more parallelism from programs. On the other hand, the intrinsic complexity of non-strict independence poses new challenges to this task. We report here on the implementation we have accomplished of an annotator for non-strict independence, capable of producing both static and dynamic execution graphs. This implementation, along with the also implemented independence checker and their integration in our system, have resulted what is, to the best of our knowledge, the first parallelizing compiler based on nonstrict independence which produces dynamic execution graphs. The paper also presents a preliminary assessment of the implemented tools, comparing them with the existing ones for strict independence, which shows encouraging results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The concept of independence has been recently generalized to the constraint logic programming (CLP) paradigm. Also, several abstract domains specifically designed for CLP languages, and whose information can be used to detect the generalized independence conditions, have been recently defined. As a result we are now in a position where automatic parallelization of CLP programs is feasible. In this paper we study the task of automatically parallelizing CLP programs based on such analyses, by transforming them to explicitly concurrent programs in our parallel CC platform (CIAO) as well as to AKL. We describe the analysis and transformation process, and study its efficiency, accuracy, and effectiveness in program parallelization. The information gathered by the analyzers is evaluated not only in terms of its accuracy, i.e. its ability to determine the actual dependencies among the program variables, but also of its effectiveness, measured in terms of code reduction in the resulting parallelized programs. Given that only a few abstract domains have been already defined for CLP, and that none of them were specifically designed for dependency detection, the aim of the evaluation is not only to asses the effectiveness of the available domains, but also to study what additional information it would be desirable to infer, and what domains would be appropriate for further improving the parallelization process.