79 resultados para Parallel or distributed processing


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a fully-distributed self-healing algorithm DEX, that maintains a constant degree expander network in a dynamic setting. To the best of our knowledge, our algorithm provides the first efficient distributed construction of expanders - whose expansion properties hold deterministically - that works even under an all-powerful adaptive adversary that controls the dynamic changes to the network (the adversary has unlimited computational power and knowledge of the entire network state, can decide which nodes join and leave and at what time, and knows the past random choices made by the algorithm). Previous distributed expander constructions typically provide only probabilistic guarantees on the network expansion which rapidly degrade in a dynamic setting, in particular, the expansion properties can degrade even more rapidly under adversarial insertions and deletions. Our algorithm provides efficient maintenance and incurs a low overhead per insertion/deletion by an adaptive adversary: only O(log n) rounds and O(log n) messages are needed with high probability (n is the number of nodes currently in the network). The algorithm requires only a constant number of topology changes. Moreover, our algorithm allows for an efficient implementation and maintenance of a distributed hash table (DHT) on top of DEX, with only a constant additional overhead. Our results are a step towards implementing efficient self-healing networks that have guaranteed properties (constant bounded degree and expansion) despite dynamic changes.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of counterpropagating laser pulses to suppress high harmonic generation (HHG) is investigated experimentally for pulses polarized parallel or perpendicular to the driving laser pulse. It is shown for the first time that perpendicularly polarized pulses can suppress HHG. The intensity of the counterpropagating pulse required for harmonic suppression is found to be much larger for perpendicular polarization than for parallel polarization, in good agreement with simple models of the harmonic suppression. These results have applications to quasi-phase-matching of HHG with trains of counterpropagating pulses. (C) 2007 Optical Society of America.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many scientific applications are programmed using hybrid programming models that use both message passing and shared memory, due to the increasing prevalence of large-scale systems with multicore, multisocket nodes. Previous work has shown that energy efficiency can be improved using software-controlled execution schemes that consider both the programming model and the power-aware execution capabilities of the system. However, such approaches have focused on identifying optimal resource utilization for one programming model, either shared memory or message passing, in isolation. The potential solution space, thus the challenge, increases substantially when optimizing hybrid models since the possible resource configurations increase exponentially. Nonetheless, with the accelerating adoption of hybrid programming models, we increasingly need improved energy efficiency in hybrid parallel applications on large-scale systems. In this work, we present new software-controlled execution schemes that consider the effects of dynamic concurrency throttling (DCT) and dynamic voltage and frequency scaling (DVFS) in the context of hybrid programming models. Specifically, we present predictive models and novel algorithms based on statistical analysis that anticipate application power and time requirements under different concurrency and frequency configurations. We apply our models and methods to the NPB MZ benchmarks and selected applications from the ASC Sequoia codes. Overall, we achieve substantial energy savings (8.74 percent on average and up to 13.8 percent) with some performance gain (up to 7.5 percent) or negligible performance loss.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Processor architectures has taken a turn towards many-core processors, which integrate multiple processing cores on a single chip to increase overall performance, and there are no signs that this trend will stop in the near future. Many-core processors are harder to program than multi-core and single-core processors due to the need of writing parallel or concurrent programs with high degrees of parallelism. Moreover, many-cores have to operate in a mode of strong scaling because of memory bandwidth constraints. In strong scaling increasingly finer-grain parallelism must be extracted in order to keep all processing cores busy.

Task dataflow programming models have a high potential to simplify parallel program- ming because they alleviate the programmer from identifying precisely all inter-task de- pendences when writing programs. Instead, the task dataflow runtime system detects and enforces inter-task dependences during execution based on the description of memory each task accesses. The runtime constructs a task dataflow graph that captures all tasks and their dependences. Tasks are scheduled to execute in parallel taking into account dependences specified in the task graph.

Several papers report important overheads for task dataflow systems, which severely limits the scalability and usability of such systems. In this paper we study efficient schemes to manage task graphs and analyze their scalability. We assume a programming model that supports input, output and in/out annotations on task arguments, as well as commutative in/out and reductions. We analyze the structure of task graphs and identify versions and generations as key concepts for efficient management of task graphs. Then, we present three schemes to manage task graphs building on graph representations, hypergraphs and lists. We also consider a fourth edge-less scheme that synchronizes tasks using integers. Analysis using micro-benchmarks shows that the graph representation is not always scalable and that the edge-less scheme introduces least overhead in nearly all situations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Massive multiple-input multiple-output (MIMO) systems are cellular networks where the base stations (BSs) are equipped with unconventionally many antennas, deployed on colocated or distributed arrays. Huge spatial degrees-of-freedom are achieved by coherent processing over these massive arrays, which provide strong signal gains, resilience to imperfect channel knowledge, and low interference. This comes at the price of more infrastructure; the hardware cost and circuit power consumption scale linearly/affinely with the number of BS antennas N. Hence, the key to cost-efficient deployment of large arrays is low-cost antenna branches with low circuit power, in contrast to today’s conventional expensive and power-hungry BS antenna branches. Such low-cost transceivers are prone to hardware imperfections, but it has been conjectured that the huge degrees-of-freedom would bring robustness to such imperfections. We prove this claim for a generalized uplink system with multiplicative phasedrifts, additive distortion noise, and noise amplification. Specifically, we derive closed-form expressions for the user rates and a scaling law that shows how fast the hardware imperfections can increase with N while maintaining high rates. The connection between this scaling law and the power consumption of different transceiver circuits is rigorously exemplified. This reveals that one can make the circuit power increase as p N, instead of linearly, by careful circuit-aware system design.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Graphics Processing Units (GPUs) are becoming popular accelerators in modern High-Performance Computing (HPC) clusters. Installing GPUs on each node of the cluster is not efficient resulting in high costs and power consumption as well as underutilisation of the accelerator. The research reported in this paper is motivated towards the use of few physical GPUs by providing cluster nodes access to remote GPUs on-demand for a financial risk application. We hypothesise that sharing GPUs between several nodes, referred to as multi-tenancy, reduces the execution time and energy consumed by an application. Two data transfer modes between the CPU and the GPUs, namely concurrent and sequential, are explored. The key result from the experiments is that multi-tenancy with few physical GPUs using sequential data transfers lowers the execution time and the energy consumed, thereby improving the overall performance of the application.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The triple-differential cross section for ionization of a heavy atom is shown to depend on the spin of the incident electron even if this is polarized entirely parallel or antiparallel to its direction of propagation, the atom is unpolarized, and the spins of the ejected electrons are not resolved. Quantitative predictions for the spin asymmetry are presented in a relativistic distorted-wave Born approximation. Simple physical models are introduced to understand both these results and further symmetry properties involving the reversal of a spatial momentum component also.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a matrix inversion architecture based on the novel Modified Squared Givens Rotations (MSGR) algorithm, which extends the original SGR method to complex valued data, and also corrects erroneous results in the original SGR method when zeros occur on the diagonal of the matrix either initially or during processing. The MSGR algorithm also avoids complex dividers in the matrix inversion, thus minimising the complexity of potential real-time implementations. A systolic array architecture is implemented and FPGA synthesis results indicate a high-throughput low-latency complex matrix inversion solution. © 2008 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Collagen-related peptide is a selective agonist for the platelet collagen receptor Glycoprotein VI. The triple helical peptide contains ten GPO triplets/strand (single letter amino acid nomenclature, where O is hydroxyproline) and so over-represents GPO compared with native collagen sequence. To investigate the ability of Glycoprotein VI to recognize GPO triplets in a setting more representative of the collagens, we synthesized a set of triple helical peptides containing fewer GPO triplets, varying their number and spacing within an inert (GPP)(n) backbone. The adhesion of recombinant human Glycoprotein VI ectodomain, like that of human platelets, to these peptides increased with their GPO content, and platelet adhesion was abolished by the specific anti-Glycoprotein VI-blocking antibody, 10B12. Platelet aggregation and protein tyrosine phosphorylation were induced only by cross-linked peptides and only those that contained two or more GPO triplets. Such peptides were less potent than cross-linked collagen-related peptide. Our data suggest that both the sequences GPOGPO and GPO center dot center dot center dot center dot center dot center dot center dot center dot center dot GPO represent functional Glycoprotein VI recognition motifs within collagen. Furthermore, we propose that the (GPO)(4) motif can support simultaneous binding of two glycoprotein VI molecules, in either a parallel or anti-parallel stacking arrangement, which could play an important role in activation of signaling.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multicore computational accelerators such as GPUs are now commodity components for highperformance computing at scale. While such accelerators have been studied in some detail as stand-alone computational engines, their integration in large-scale distributed systems raises new challenges and trade-offs. In this paper, we present an exploration of resource management alternatives for building asymmetric accelerator-based distributed systems. We present these alternatives in the context of a capabilities-aware framework for data-intensive computing, which uses an enhanced implementation of the MapReduce programming model for accelerator-based clusters, compared to the state of the art. The framework can transparently utilize heterogeneous accelerators for deriving high performance with low programming effort. Our work is the first to compare heterogeneous types of accelerators, GPUs and a Cell processors, in the same environment and the first to explore the trade-offs between compute-efficient and control-efficient accelerators on data-intensive systems. Our investigation shows that our framework scales well with the number of different compute nodes. Furthermore, it runs simultaneously on two different types of accelerators, successfully adapts to the resource capabilities, and performs 26.9% better on average than a static execution approach.