131 resultados para parallel programs


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Dynamic Voltage and Frequency Scaling (DVFS) offers a huge potential for designing trade-offs involving energy, power, temperature and performance of computing systems. In this paper, we evaluate three different DVFS schemes - our enhancement of a Petri net performance model based DVFS method for sequential programs to stream programs, a simple profile based Linear Scaling method, and an existing hardware based DVFS method for multithreaded applications - using multithreaded stream applications, in a full system Chip Multiprocessor (CMP) simulator. From our evaluation, we find that the software based methods achieve significant Energy/Throughput2(ET−2) improvements. The hardware based scheme degrades performance heavily and suffers ET−2 loss. Our results indicate that the simple profile based scheme achieves the benefits of the complex Petri net based scheme for stream programs, and present a strong case for the need for independent voltage/frequency control for different cores of CMPs, which is lacking in most of the state-of-the-art CMPs. This is in contrast to the conclusions of a recent evaluation of per-core DVFS schemes for multithreaded applications for CMPs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Memory models for shared-memory concurrent programming languages typically guarantee sequential consistency (SC) semantics for datarace-free (DRF) programs, while providing very weak or no guarantees for non-DRF programs. In effect programmers are expected to write only DRF programs, which are then executed with SC semantics. With this in mind, we propose a novel scalable solution for dataflow analysis of concurrent programs, which is proved to be sound for DRF programs with SC semantics. We use the synchronization structure of the program to propagate dataflow information among threads without requiring to consider all interleavings explicitly. Given a dataflow analysis that is sound for sequential programs and meets certain criteria, our technique automatically converts it to an analysis for concurrent programs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Identical parallel-connected converters with unequal load sharing have unequal terminal voltages. The difference in terminal voltages is more pronounced in case of back-to-back connected converters, operated in power-circulation mode for the purpose of endurance tests. In this paper, a synchronous reference frame based analysis is presented to estimate the grid current distortion in interleaved, grid-connected converters with unequal terminal voltages. Influence of carrier interleaving angle on rms grid current ripple is studied theoretically as well as experimentally. Optimum interleaving angle to minimize the rms grid current ripple is investigated for different applications of parallel converters. The applications include unity power factor rectifiers, inverters for renewable energy sources, reactive power compensators, and circulating-power test set-up used for thermal testing of high-power converters. Optimum interleaving angle is shown to be a strong function of the average of the modulation indices of the two converters, irrespective of the application. The findings are verified experimentally on two parallel-connected converters, circulating reactive power of up to 150 kVA between them.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We show that every graph of maximum degree 3 can be represented as the intersection graph of axis parallel boxes in three dimensions, that is, every vertex can be mapped to an axis parallel box such that two boxes intersect if and only if their corresponding vertices are adjacent. In fact, we construct a representation in which any two intersecting boxes just touch at their boundaries. Further, this construction can be realized in linear time.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we study the diversity-multiplexing-gain tradeoff (DMT) of wireless relay networks under the half-duplex constraint. It is often unclear what penalty if any, is imposed by the half-duplex constraint on the DMT of such networks. We study two classes of networks; the first class, called KPP(I) networks, is the class of networks with the relays organized in K parallel paths between the source and the destination. While we assume that there is no direct source-destination path, the K relaying paths can interfere with each other. The second class, termed as layered networks, is comprised of relays organized in layers, where links exist only between adjacent layers. We present a communication scheme based on static schedules and amplify-and-forward relaying for these networks. We also show that for KPP(I) networks with K >= 3, the proposed schemes can achieve full-duplex DMT performance, thus demonstrating that there is no performance hit on the DMT due to the half-duplex constraint. We also show that, for layered networks, a linear DMT of d(max)(1 - r)(+) between the maximum diversity d(max) and the maximum MG, r(max) = 1 is achievable. We adapt existing DMT optimal coding schemes to these networks, thus specifying the end-to-end communication strategy explicitly.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The contour tree is a topological abstraction of a scalar field that captures evolution in level set connectivity. It is an effective representation for visual exploration and analysis of scientific data. We describe a work-efficient, output sensitive, and scalable parallel algorithm for computing the contour tree of a scalar field defined on a domain that is represented using either an unstructured mesh or a structured grid. A hybrid implementation of the algorithm using the GPU and multi-core CPU can compute the contour tree of an input containing 16 million vertices in less than ten seconds with a speedup factor of upto 13. Experiments based on an implementation in a multi-core CPU environment show near-linear speedup for large data sets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we consider the inference for the component and system lifetime distribution of a k-unit parallel system with independent components based on system data. The components are assumed to have identical Weibull distribution. We obtain the maximum likelihood estimates of the unknown parameters based on system data. The Fisher information matrix has been derived. We propose -expectation tolerance interval and -content -level tolerance interval for the life distribution of the system. Performance of the estimators and tolerance intervals is investigated via simulation study. A simulated dataset is analyzed for illustration.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We experimentally study the effect of having hinged leaflets at the jet exit on the formation of a two-dimensional counter-rotating vortex pair. A piston-cylinder mechanism is used to generate a starting jet from a high-aspect-ratio channel into a quiescent medium. For a rigid exit, with no leaflets at the channel exit, the measurements at a central plane show that the trailing jet in the present case is never detached from the vortex pair, and keeps feeding into the latter, unlike in the axisymmetric case. Passive flexibility is introduced in the form of rigid leaflets or flaps that are hinged at the exit of the channel, with the flaps initially parallel to the channel walls. The experimental arrangement closely approximates the limiting case of a free-to-rotate rigid flap with negligible structural stiffness, damping and flap inertia, as these limiting structural properties permit the largest flap openings. Using this arrangement, we start the flow and measure the flap kinematics and the vorticity fields for different flap lengths and piston velocity programs. The typical motion of the flaps involves a rapid opening and a subsequent more gradual return to its initial position, both of which occur when the piston is still moving. The initial opening of the flaps can be attributed to an excess pressure that develops in the channel when the flow starts, due to the acceleration that has to be imparted to the fluid slug between the flaps. In the case with flaps, two additional pairs of vortices are formed because of the motion of the flaps, leading to the ejection of a total of up to three vortex pairs from the hinged exit. The flaps' length (L-f) is found to significantly affect flap motions when plotted using the conventional time scale L/d, where L is the piston stroke and d is the channel width. However, with a newly defined time scale based on the flap length (L/L-f), we find a good collapse of all the measured flap motions irrespective of flap length and piston velocity for an impulsively started piston motion. The maximum opening angle in all these impulsive velocity program cases, irrespective of the flap length, is found to be close to 15 degrees. Even though the flap kinematics collapses well with L/L-f, there are differences in the distribution of the ejected vorticity even for the same L/L-f. Such a redistribution of vorticity can lead to important changes in the overall properties of the flow, and it gives us a better understanding of the importance of exit flexibility in such flows.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Large software systems are developed by composing multiple programs. If the programs manip-ulate and exchange complex data, such as network packets or files, it is essential to establish that they follow compatible data formats. Most of the complexity of data formats is associated with the headers. In this paper, we address compatibility of programs operating over headers of network packets, files, images, etc. As format specifications are rarely available, we infer the format associated with headers by a program as a set of guarded layouts. In terms of these formats, we define and check compatibility of (a) producer-consumer programs and (b) different versions of producer (or consumer) programs. A compatible producer-consumer pair is free of type mismatches and logical incompatibilities such as the consumer rejecting valid outputs gen-erated by the producer. A backward compatible producer (resp. consumer) is guaranteed to be compatible with consumers (resp. producers) that were compatible with its older version. With our prototype tool, we identified 5 known bugs and 1 potential bug in (a) sender-receiver modules of Linux network drivers of 3 vendors and (b) different versions of a TIFF image library.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a study of the nature of the degrees-of-freedom of spatial manipulators based on the concept of partition of degrees-of-freedom. In particular, the partitioning of degrees-of-freedom is studied in five lower-mobility spatial parallel manipulators possessing different combinations of degrees-of-freedom. An extension of the existing theory is introduced so as to analyse the nature of the gained degree(s)-of-freedom at a gain-type singularity. The gain of one- and two-degrees-of-freedom is analysed in several well-studied, as well as newly developed manipulators. The formulations also present a basis for the analysis of the velocity kinematics of manipulators of any architecture. (C) 2013 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Exploiting the performance potential of GPUs requires managing the data transfers to and from them efficiently which is an error-prone and tedious task. In this paper, we develop a software coherence mechanism to fully automate all data transfers between the CPU and GPU without any assistance from the programmer. Our mechanism uses compiler analysis to identify potential stale accesses and uses a runtime to initiate transfers as necessary. This allows us to avoid redundant transfers that are exhibited by all other existing automatic memory management proposals. We integrate our automatic memory manager into the X10 compiler and runtime, and find that it not only results in smaller and simpler programs, but also eliminates redundant memory transfers. Tested on eight programs ported from the Rodinia benchmark suite it achieves (i) a 1.06x speedup over hand-tuned manual memory management, and (ii) a 1.29x speedup over another recently proposed compiler--runtime automatic memory management system. Compared to other existing runtime-only and compiler-only proposals, it also transfers 2.2x to 13.3x less data on average.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a nonequilibrium strong-coupling approach to inhomogeneous systems of ultracold atoms in optical lattices. We demonstrate its application to the Mott-insulating phase of a two-dimensional Fermi-Hubbard model in the presence of a trap potential. Since the theory is formulated self-consistently, the numerical implementation relies on a massively parallel evaluation of the self-energy and the Green's function at each lattice site, employing thousands of CPUs. While the computation of the self-energy is straightforward to parallelize, the evaluation of the Green's function requires the inversion of a large sparse 10(d) x 10(d) matrix, with d > 6. As a crucial ingredient, our solution heavily relies on the smallness of the hopping as compared to the interaction strength and yields a widely scalable realization of a rapidly converging iterative algorithm which evaluates all elements of the Green's function. Results are validated by comparing with the homogeneous case via the local-density approximation. These calculations also show that the local-density approximation is valid in nonequilibrium setups without mass transport.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Knowledge of protein-ligand interactions is essential to understand several biological processes and important for applications ranging from understanding protein function to drug discovery and protein engineering. Here, we describe an algorithm for the comparison of three-dimensional ligand-binding sites in protein structures. A previously described algorithm, PocketMatch (version 1.0) is optimised, expanded, and MPI-enabled for parallel execution. PocketMatch (version 2.0) rapidly quantifies binding-site similarity based on structural descriptors such as residue nature and interatomic distances. Atomic-scale alignments may also be obtained from amino acid residue pairings generated. It allows an end-user to compute database-wide, all-to-all comparisons in a matter of hours. The use of our algorithm on a sample dataset, performance-analysis, and annotated source code is also included.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Precise pointer analysis is a problem of interest to both the compiler and the program verification community. Flow-sensitivity is an important dimension of pointer analysis that affects the precision of the final result computed. Scaling flow-sensitive pointer analysis to millions of lines of code is a major challenge. Recently, staged flow-sensitive pointer analysis has been proposed, which exploits a sparse representation of program code created by staged analysis. In this paper we formulate the staged flow-sensitive pointer analysis as a graph-rewriting problem. Graph-rewriting has already been used for flow-insensitive analysis. However, formulating flow-sensitive pointer analysis as a graph-rewriting problem adds additional challenges due to the nature of flow-sensitivity. We implement our parallel algorithm using Intel Threading Building Blocks and demonstrate considerable scaling (upto 2.6x) for 8 threads on a set of 10 benchmarks. Compared to the sequential implementation of staged flow-sensitive analysis, a single threaded execution of our implementation performs better in 8 of the benchmarks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a new approach for producing precise constrained slices of programs in a language such as C. We build upon a previous approach for this problem, which is based on term-rewriting, which primarily targets loop-free fragments and is fully precise in this setting. We incorporate abstract interpretation into term-rewriting, using a given arbitrary abstract lattice, resulting in a novel technique for slicing loops whose precision is linked to the power of the given abstract lattice. We address pointers in a first-class manner, including when they are used within loops to traverse and update recursive data structures. Finally, we illustrate the comparative precision of our slices over those of previous approaches using representative examples.