999 resultados para AQVAL programs (Computer programs)


Relevância:

50.00% 50.00%

Publicador:

Resumo:

Dynamic Voltage and Frequency Scaling (DVFS) offers a huge potential for designing trade-offs involving energy, power, temperature and performance of computing systems. In this paper, we evaluate three different DVFS schemes - our enhancement of a Petri net performance model based DVFS method for sequential programs to stream programs, a simple profile based Linear Scaling method, and an existing hardware based DVFS method for multithreaded applications - using multithreaded stream applications, in a full system Chip Multiprocessor (CMP) simulator. From our evaluation, we find that the software based methods achieve significant Energy/Throughput2(ET−2) improvements. The hardware based scheme degrades performance heavily and suffers ET−2 loss. Our results indicate that the simple profile based scheme achieves the benefits of the complex Petri net based scheme for stream programs, and present a strong case for the need for independent voltage/frequency control for different cores of CMPs, which is lacking in most of the state-of-the-art CMPs. This is in contrast to the conclusions of a recent evaluation of per-core DVFS schemes for multithreaded applications for CMPs.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Memory models for shared-memory concurrent programming languages typically guarantee sequential consistency (SC) semantics for datarace-free (DRF) programs, while providing very weak or no guarantees for non-DRF programs. In effect programmers are expected to write only DRF programs, which are then executed with SC semantics. With this in mind, we propose a novel scalable solution for dataflow analysis of concurrent programs, which is proved to be sound for DRF programs with SC semantics. We use the synchronization structure of the program to propagate dataflow information among threads without requiring to consider all interleavings explicitly. Given a dataflow analysis that is sound for sequential programs and meets certain criteria, our technique automatically converts it to an analysis for concurrent programs.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program's execution time. Today's computer systems have tremendous computing power in the form of traditional CPU cores and throughput oriented accelerators such as graphics processing units(GPUs). Thus, an approach that maps the control flow dominated regions to the CPU and the data parallel regions to the GPU can significantly improve program performance. In this paper, we present the design and implementation of MEGHA, a compiler that automatically compiles MATLAB programs to enable synergistic execution on heterogeneous processors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. We propose a set of compiler optimizations tailored for MATLAB. Our compiler identifies data parallel regions of the program and composes them into kernels. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically and the amount of data transfer needed is minimized. In order to ensure required data movement for dependencies across basic blocks, we propose a data flow analysis and edge splitting strategy. Thus our compiler automatically handles composition of kernels, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfer. The proposed compiler was implemented and experimental evaluation using a set of MATLAB benchmarks shows that our approach achieves a geometric mean speedup of 19.8X for data parallel benchmarks over native execution of MATLAB.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

With proliferation of chip multicores (CMPs) on desktops and embedded platforms, multi-threaded programs have become ubiquitous. Existence of multiple threads may cause resource contention, such as, in on-chip shared cache and interconnects, depending upon how they access resources. Hence, we propose a tool - Thread Contention Predictor (TCP) to help quantify the number of threads sharing data and their sharing pattern. We demonstrate its use to predict a more profitable shared, last level on-chip cache (LLC) access policy on CMPs. Our cache configuration predictor is 2.2 times faster compared to the cycle-accurate simulations. We also demonstrate its use for identifying hot data structures in a program which may cause performance degradation due to false data sharing. We fix layout of such data structures and show up-to 10% and 18% improvement in execution time and energy-delay product (EDP), respectively.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Large software systems are developed by composing multiple programs. If the programs manip-ulate and exchange complex data, such as network packets or files, it is essential to establish that they follow compatible data formats. Most of the complexity of data formats is associated with the headers. In this paper, we address compatibility of programs operating over headers of network packets, files, images, etc. As format specifications are rarely available, we infer the format associated with headers by a program as a set of guarded layouts. In terms of these formats, we define and check compatibility of (a) producer-consumer programs and (b) different versions of producer (or consumer) programs. A compatible producer-consumer pair is free of type mismatches and logical incompatibilities such as the consumer rejecting valid outputs gen-erated by the producer. A backward compatible producer (resp. consumer) is guaranteed to be compatible with consumers (resp. producers) that were compatible with its older version. With our prototype tool, we identified 5 known bugs and 1 potential bug in (a) sender-receiver modules of Linux network drivers of 3 vendors and (b) different versions of a TIFF image library.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

We propose a new approach for producing precise constrained slices of programs in a language such as C. We build upon a previous approach for this problem, which is based on term-rewriting, which primarily targets loop-free fragments and is fully precise in this setting. We incorporate abstract interpretation into term-rewriting, using a given arbitrary abstract lattice, resulting in a novel technique for slicing loops whose precision is linked to the power of the given abstract lattice. We address pointers in a first-class manner, including when they are used within loops to traverse and update recursive data structures. Finally, we illustrate the comparative precision of our slices over those of previous approaches using representative examples.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Task-parallel languages are increasingly popular. Many of them provide expressive mechanisms for intertask synchronization. For example, OpenMP 4.0 will integrate data-driven execution semantics derived from the StarSs research language. Compared to the more restrictive data-parallel and fork-join concurrency models, the advanced features being introduced into task-parallelmodels in turn enable improved scalability through load balancing, memory latency hiding, mitigation of the pressure on memory bandwidth, and, as a side effect, reduced power consumption. In this article, we develop a systematic approach to compile loop nests into concurrent, dynamically constructed graphs of dependent tasks. We propose a simple and effective heuristic that selects the most profitable parallelization idiom for every dependence type and communication pattern. This heuristic enables the extraction of interband parallelism (cross-barrier parallelism) in a number of numerical computations that range from linear algebra to structured grids and image processing. The proposed static analysis and code generation alleviates the burden of a full-blown dependence resolver to track the readiness of tasks at runtime. We evaluate our approach and algorithms in the PPCG compiler, targeting OpenStream, a representative dataflow task-parallel language with explicit intertask dependences and a lightweight runtime. Experimental results demonstrate the effectiveness of the approach.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

CLADP is an engineering software program developed at Cambridge University for the interactive computer aided design of feedback control systems. CLADP contains a wide range of tools for the analysis of complex systems, and the assessment of their performance when feedback control is applied, thus enabling control systems to be designed to meet difficult performance objectives. The range of tools within CLADP include the latest techniques in the field whose central theme is the extension of classical frequency domain concepts (well known and well proven for single loop systems) to multivariable or multiloop systems, and by making extensive use of graphical presentation information is provided in a readily understood form.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Chinese Assoc Cryptol Res, State Key Lab Informat Secur, Inst Software, Grad Univ Chinese Acad Sci, Natl Nat Sci Fdn China

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Sanders, K. and Thomas, L. 2007. Checklists for grading object-oriented CS1 programs: concepts and misconceptions. In Proceedings of the 12th Annual SIGCSE Conference on innovation and Technology in Computer Science Education (Dundee, Scotland, June 25 - 27, 2007). ITiCSE '07. ACM, New York, NY, 166-170

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Predictability -- the ability to foretell that an implementation will not violate a set of specified reliability and timeliness requirements -- is a crucial, highly desirable property of responsive embedded systems. This paper overviews a development methodology for responsive systems, which enhances predictability by eliminating potential hazards resulting from physically-unsound specifications. The backbone of our methodology is the Time-constrained Reactive Automaton (TRA) formalism, which adopts a fundamental notion of space and time that restricts expressiveness in a way that allows the specification of only reactive, spontaneous, and causal computation. Using the TRA model, unrealistic systems – possessing properties such as clairvoyance, caprice, infinite capacity, or perfect timing -- cannot even be specified. We argue that this "ounce of prevention" at the specification level is likely to spare a lot of time and energy in the development cycle of responsive systems -- not to mention the elimination of potential hazards that would have gone, otherwise, unnoticed. The TRA model is presented to system developers through the Cleopatra programming language. Cleopatra features a C-like imperative syntax for the description of computation, which makes it easier to incorporate in applications already using C. It is event-driven, and thus appropriate for embedded process control applications. It is object-oriented and compositional, thus advocating modularity and reusability. Cleopatra is semantically sound; its objects can be transformed, mechanically and unambiguously, into formal TRA automata for verification purposes, which can be pursued using model-checking or theorem proving techniques. Since 1989, an ancestor of Cleopatra has been in use as a specification and simulation language for embedded time-critical robotic processes.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Generic object-oriented programming languages combine parametric polymorphism and nominal subtype polymorphism, thereby providing better data abstraction, greater code reuse, and fewer run-time errors. However, most generic object-oriented languages provide a straightforward combination of the two kinds of polymorphism, which prevents the expression of advanced type relationships. Furthermore, most generic object-oriented languages have a type-erasure semantics: instantiations of type parameters are not available at run time, and thus may not be used by type-dependent operations. This dissertation shows that two features, which allow the expression of many advanced type relationships, can be added to a generic object-oriented programming language without type erasure: 1. type variables that are not parameters of the class that declares them, and 2. extension that is dependent on the satisfiability of one or more constraints. We refer to the first feature as hidden type variables and the second feature as conditional extension. Hidden type variables allow: covariance and contravariance without variance annotations or special type arguments such as wildcards; a single type to extend, and inherit methods from, infinitely many instantiations of another type; a limited capacity to augment the set of superclasses after that class is defined; and the omission of redundant type arguments. Conditional extension allows the properties of a collection type to be dependent on the properties of its element type. This dissertation describes the semantics and implementation of hidden type variables and conditional extension. A sound type system is presented. In addition, a sound and terminating type checking algorithm is presented. Although designed for the Fortress programming language, hidden type variables and conditional extension can be incorporated into other generic object-oriented languages. Many of the same problems would arise, and solutions analogous to those we present would apply.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

In this work we show how automatic relative debugging can be used to find differences in computation between a correct serial program and an OpenMP parallel version of that program that does not yield correct results. Backtracking and re-execution are used to determine the first OpenMP parallel region that produces a difference in computation that may lead to an incorrect value the user has indicated. Our approach also lends itself to finding differences between parallel computations, where executing with M threads produces expected results but an N thread execution does not (M, N > 1, M ≠ N). OpenMP programs created using a parallelization tool are addressed by utilizing static analysis and directive information from the tool. Hand-parallelized programs, where OpenMP directives are inserted by the user, are addressed by performing data dependence and directive analysis.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Traditional static analysis fails to auto-parallelize programs with a complex control and data flow. Furthermore, thread-level parallelism in such programs is often restricted to pipeline parallelism, which can be hard to discover by a programmer. In this paper we propose a tool that, based on profiling information, helps the programmer to discover parallelism. The programmer hand-picks the code transformations from among the proposed candidates which are then applied by automatic code transformation techniques.

This paper contributes to the literature by presenting a profiling tool for discovering thread-level parallelism. We track dependencies at the whole-data structure level rather than at the element level or byte level in order to limit the profiling overhead. We perform a thorough analysis of the needs and costs of this technique. Furthermore, we present and validate the belief that programs with complex control and data flow contain significant amounts of exploitable coarse-grain pipeline parallelism in the program’s outer loops. This observation validates our approach to whole-data structure dependencies. As state-of-the-art compilers focus on loops iterating over data structure members, this observation also explains why our approach finds coarse-grain pipeline parallelism in cases that have remained out of reach for state-of-the-art compilers. In cases where traditional compilation techniques do find parallelism, our approach allows to discover higher degrees of parallelism, allowing a 40% speedup over traditional compilation techniques. Moreover, we demonstrate real speedups on multiple hardware platforms.