117 resultados para Parallelizing Compilers


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Pós-graduação em Ciência e Tecnologia de Materiais - FC

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Sao Paulo State Research Foundation-FAPESP

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Field-Programmable Gate Arrays (FPGAs) are becoming increasingly important in embedded and high-performance computing systems. They allow performance levels close to the ones obtained with Application-Specific Integrated Circuits, while still keeping design and implementation flexibility. However, to efficiently program FPGAs, one needs the expertise of hardware developers in order to master hardware description languages (HDLs) such as VHDL or Verilog. Attempts to furnish a high-level compilation flow (e.g., from C programs) still have to address open issues before broader efficient results can be obtained. Bearing in mind an FPGA available resources, it has been developed LALP (Language for Aggressive Loop Pipelining), a novel language to program FPGA-based accelerators, and its compilation framework, including mapping capabilities. The main ideas behind LALP are to provide a higher abstraction level than HDLs, to exploit the intrinsic parallelism of hardware resources, and to allow the programmer to control execution stages whenever the compiler techniques are unable to generate efficient implementations. Those features are particularly useful to implement loop pipelining, a well regarded technique used to accelerate computations in several application domains. This paper describes LALP, and shows how it can be used to achieve high-performance computing solutions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Coupled-cluster theory provides one of the most successful concepts in electronic-structure theory. This work covers the parallelization of coupled-cluster energies, gradients, and second derivatives and its application to selected large-scale chemical problems, beside the more practical aspects such as the publication and support of the quantum-chemistry package ACES II MAB and the design and development of a computational environment optimized for coupled-cluster calculations. The main objective of this thesis was to extend the range of applicability of coupled-cluster models to larger molecular systems and their properties and therefore to bring large-scale coupled-cluster calculations into day-to-day routine of computational chemistry. A straightforward strategy for the parallelization of CCSD and CCSD(T) energies, gradients, and second derivatives has been outlined and implemented for closed-shell and open-shell references. Starting from the highly efficient serial implementation of the ACES II MAB computer code an adaptation for affordable workstation clusters has been obtained by parallelizing the most time-consuming steps of the algorithms. Benchmark calculations for systems with up to 1300 basis functions and the presented applications show that the resulting algorithm for energies, gradients and second derivatives at the CCSD and CCSD(T) level of theory exhibits good scaling with the number of processors and substantially extends the range of applicability. Within the framework of the ’High accuracy Extrapolated Ab initio Thermochemistry’ (HEAT) protocols effects of increased basis-set size and higher excitations in the coupled- cluster expansion were investigated. The HEAT scheme was generalized for molecules containing second-row atoms in the case of vinyl chloride. This allowed the different experimental reported values to be discriminated. In the case of the benzene molecule it was shown that even for molecules of this size chemical accuracy can be achieved. Near-quantitative agreement with experiment (about 2 ppm deviation) for the prediction of fluorine-19 nuclear magnetic shielding constants can be achieved by employing the CCSD(T) model together with large basis sets at accurate equilibrium geometries if vibrational averaging and temperature corrections via second-order vibrational perturbation theory are considered. Applying a very similar level of theory for the calculation of the carbon-13 NMR chemical shifts of benzene resulted in quantitative agreement with experimental gas-phase data. The NMR chemical shift study for the bridgehead 1-adamantyl cation at the CCSD(T) level resolved earlier discrepancies of lower-level theoretical treatment. The equilibrium structure of diacetylene has been determined based on the combination of experimental rotational constants of thirteen isotopic species and zero-point vibrational corrections calculated at various quantum-chemical levels. These empirical equilibrium structures agree to within 0.1 pm irrespective of the theoretical level employed. High-level quantum-chemical calculations on the hyperfine structure parameters of the cyanopolyynes were found to be in excellent agreement with experiment. Finally, the theoretically most accurate determination of the molecular equilibrium structure of ferrocene to date is presented.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The central topic of this thesis is the study of algorithms for type checking, both from the programming language and from the proof-theoretic point of view. A type checking algorithm takes a program or a proof, represented as a syntactical object, and checks its validity with respect to a specification or a statement. It is a central piece of compilers and proof assistants. We postulate that since type checkers are at the interface between proof theory and program theory, their study can let these two fields mutually enrich each other. We argue by two main instances: first, starting from the problem of proof reuse, we develop an incremental type checker; secondly, starting from a type checking program, we evidence a novel correspondence between natural deduction and the sequent calculus.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Coupled-cluster theory in its single-reference formulation represents one of the most successful approaches in quantum chemistry for the description of atoms and molecules. To extend the applicability of single-reference coupled-cluster theory to systems with degenerate or near-degenerate electronic configurations, multireference coupled-cluster methods have been suggested. One of the most promising formulations of multireference coupled cluster theory is the state-specific variant suggested by Mukherjee and co-workers (Mk-MRCC). Unlike other multireference coupled-cluster approaches, Mk-MRCC is a size-extensive theory and results obtained so far indicate that it has the potential to develop to a standard tool for high-accuracy quantum-chemical treatments. This work deals with developments to overcome the limitations in the applicability of the Mk-MRCC method. Therefore, an efficient Mk-MRCC algorithm has been implemented in the CFOUR program package to perform energy calculations within the singles and doubles (Mk-MRCCSD) and singles, doubles, and triples (Mk-MRCCSDT) approximations. This implementation exploits the special structure of the Mk-MRCC working equations that allows to adapt existing efficient single-reference coupled-cluster codes. The algorithm has the correct computational scaling of d*N^6 for Mk-MRCCSD and d*N^8 for Mk-MRCCSDT, where N denotes the system size and d the number of reference determinants. For the determination of molecular properties as the equilibrium geometry, the theory of analytic first derivatives of the energy for the Mk-MRCC method has been developed using a Lagrange formalism. The Mk-MRCC gradients within the CCSD and CCSDT approximation have been implemented and their applicability has been demonstrated for various compounds such as 2,6-pyridyne, the 2,6-pyridyne cation, m-benzyne, ozone and cyclobutadiene. The development of analytic gradients for Mk-MRCC offers the possibility of routinely locating minima and transition states on the potential energy surface. It can be considered as a key step towards routine investigation of multireference systems and calculation of their properties. As the full inclusion of triple excitations in Mk-MRCC energy calculations is computational demanding, a parallel implementation is presented in order to circumvent limitations due to the required execution time. The proposed scheme is based on the adaption of a highly efficient serial Mk-MRCCSDT code by parallelizing the time-determining steps. A first application to 2,6-pyridyne is presented to demonstrate the efficiency of the current implementation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Domain-specific languages (DSLs) are increasingly used as embedded languages within general-purpose host languages. DSLs provide a compact, dedicated syntax for specifying parts of an application related to specialized domains. Unfortunately, such language extensions typically do not integrate well with the development tools of the host language. Editors, compilers and debuggers are either unaware of the extensions, or must be adapted at a non-trivial cost. We present a novel approach to embed DSLs into an existing host language by leveraging the underlying representation of the host language used by these tools. Helvetia is an extensible system that intercepts the compilation pipeline of the Smalltalk host language to seamlessly integrate language extensions. We validate our approach by case studies that demonstrate three fundamentally different ways to extend or adapt the host language syntax and semantics.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

P-GENESIS is an extension to the GENESIS neural simulator that allows users to take advantage of parallel machines to speed up the simulation of their network models or concurrently simulate multiple models. P-GENESIS adds several commands to the GENESIS script language that let a script running on one processor execute remote procedure calls on other processors, and that let a script synchronize its execution with the scripts running on other processors. We present here some brief comments on the mechanisms underlying parallel script execution. We also offer advice on parallelizing parameter searches, partitioning network models, and selecting suitable parallel hardware on which to run P-GENESIS.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Despite the fact that input–output (IO) tables form a central part of the System of National Accounts, each individual country's national IO table exhibits more or less different features and characteristics, reflecting the country's socioeconomic idiosyncrasies. Consequently, the compilers of a multi-regional input–output table (MRIOT) are advised to thoroughly examine the conceptual as well as methodological differences among countries in the estimation of basic statistics for national IO tables and, if necessary, to carry out pre-adjustment of these tables into a common format prior to the MRIOT compilation. The objective of this study is to provide a practical guide for harmonizing national IO tables to construct a consistent MRIOT, referring to the adjustment practices used by the Institute of Developing Economies, JETRO (IDE-JETRO) in compiling the Asian International Input–Output Table.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We show a method for parallelizing top down dynamic programs in a straightforward way by a careful choice of a lock-free shared hash table implementation and randomization of the order in which the dynamic program computes its subproblems. This generic approach is applied to dynamic programs for knapsack, shortest paths, and RNA structure alignment, as well as to a state-of-the-art solution for minimizing the máximum number of open stacks. Experimental results are provided on three different modern multicore architectures which show that this parallelization is effective and reasonably scalable. In particular, we obtain over 10 times speedup for 32 threads on the open stacks problem.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We report on a detailed study of the application and effectiveness of program analysis based on abstract interpretation to automatic program parallelization. We study the case of parallelizing logic programs using the notion of strict independence. We first propose and prove correct a methodology for the application in the parallelization task of the information inferred by abstract interpretation, using a parametric domain. The methodology is generic in the sense of allowing the use of different analysis domains. A number of well-known approximation domains are then studied and the transformation into the parametric domain defined. The transformation directly illustrates the relevance and applicability of each abstract domain for the application. Both local and global analyzers are then built using these domains and embedded in a complete parallelizing compiler. Then, the performance of the domains in this context is assessed through a number of experiments. A comparatively wide range of aspects is studied, from the resources needed by the analyzers in terms of time and memory to the actual benefits obtained from the information inferred. Such benefits are evaluated both in terms of the characteristics of the parallelized code and of the actual speedups obtained from it. The results show that data flow analysis plays an important role in achieving efficient parallelizations, and that the cost of such analysis can be reasonable even for quite sophisticated abstract domains. Furthermore, the results also offer significant insight into the characteristics of the domains, the demands of the application, and the trade-offs involved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Program specialization optimizes programs for known valúes of the input. It is often the case that the set of possible input valúes is unknown, or this set is infinite. However, a form of specialization can still be performed in such cases by means of abstract interpretation, specialization then being with respect to abstract valúes (substitutions), rather than concrete ones. We study the múltiple specialization of logic programs based on abstract interpretation. This involves in principie, and based on information from global analysis, generating several versions of a program predicate for different uses of such predicate, optimizing these versions, and, finally, producing a new, "multiply specialized" program. While múltiple specialization has received theoretical attention, little previous evidence exists on its practicality. In this paper we report on the incorporation of múltiple specialization in a parallelizing compiler and quantify its effects. A novel approach to the design and implementation of the specialization system is proposed. The resulting implementation techniques result in identical specializations to those of the best previously proposed techniques but require little or no modification of some existing abstract interpreters. Our results show that, using the proposed techniques, the resulting "abstract múltiple specialization" is indeed a relevant technique in practice. In particular, in the parallelizing compiler application, a good number of run-time tests are eliminated and invariants extracted automatically from loops, resulting generally in lower overheads and in several cases in increased speedups.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Modeling the evolution of the state of program memory during program execution is critical to many parallehzation techniques. Current memory analysis techniques either provide very accurate information but run prohibitively slowly or produce very conservative results. An approach based on abstract interpretation is presented for analyzing programs at compile time, which can accurately determine many important program properties such as aliasing, logical data structures and shape. These properties are known to be critical for transforming a single threaded program into a versión that can be run on múltiple execution units in parallel. The analysis is shown to be of polynomial complexity in the size of the memory heap. Experimental results for benchmarks in the Jolden suite are given. These results show that in practice the analysis method is efflcient and is capable of accurately determining shape information in programs that créate and manipúlate complex data structures.