106 resultados para scalable


Relevância:

20.00% 20.00%

Publicador:

Resumo:

We address the problem of designing distributed algorithms for large scale networks that are robust to Byzantine faults. We consider a message passing, full information model: the adversary is malicious, controls a constant fraction of processors, and can view all messages in a round before sending out its own messages for that round. Furthermore, each bad processor may send an unlimited number of messages. The only constraint on the adversary is that it must choose its corrupt processors at the start, without knowledge of the processors’ private random bits.

A good quorum is a set of O(logn) processors, which contains a majority of good processors. In this paper, we give a synchronous algorithm which uses polylogarithmic time and Õ(vn) bits of communication per processor to bring all processors to agreement on a collection of n good quorums, solving Byzantine agreement as well. The collection is balanced in that no processor is in more than O(logn) quorums. This yields the first solution to Byzantine agreement which is both scalable and load-balanced in the full information model.

The technique which involves going from situation where slightly more than 1/2 fraction of processors are good and and agree on a short string with a constant fraction of random bits to a situation where all good processors agree on n good quorums can be done in a fully asynchronous model as well, providing an approach for extending the Byzantine agreement result to this model.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a scalable, statistical ‘black-box’ model for predicting the performance of parallel programs on multi-core non-uniform memory access (NUMA) systems. We derive a model with low overhead, by reducing data collection and model training time. The model can accurately predict the behaviour of parallel applications in response to changes in their concurrency, thread layout on NUMA nodes, and core voltage and frequency. We present a framework that applies the model to achieve significant energy and energy-delay-square (ED2) savings (9% and 25%, respectively) along with performance improvement (10% mean) on an actual 16-core NUMA system running realistic application workloads. Our prediction model proves substantially more accurate than previous efforts.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper introduces hybrid address spaces as a fundamental design methodology for implementing scalable runtime systems on many-core architectures without hardware support for cache coherence. We use hybrid address spaces for an implementation of MapReduce, a programming model for large-scale data processing, and the implementation of a remote memory access (RMA) model. Both implementations are available on the Intel SCC and are portable to similar architectures. We present the design and implementation of HyMR, a MapReduce runtime system whereby different stages and the synchronization operations between them alternate between a distributed memory address space and a shared memory address space, to improve performance and scalability. We compare HyMR to a reference implementation and we find that HyMR improves performance by a factor of 1.71× over a set of representative MapReduce benchmarks. We also compare HyMR with Phoenix++, a state-of-art implementation for systems with hardware-managed cache coherence in terms of scalability and sustained to peak data processing bandwidth, where HyMR demon- strates improvements of a factor of 3.1× and 3.2× respectively. We further evaluate our hybrid remote memory access (HyRMA) programming model and assess its performance to be superior of that of message passing.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The aluminum complex Alq(3) (q = 8-hydroxyquinolinate), which has important applications in organic light-emitting diode materials, is shown to be readily synthesized as a pure phase under solvent-free mechanochemical conditions from Al(OAc)(2)OH and 8-hydroxyquinoline by ball milling. The initial product of the mechanochemical synthesis is a novel acetic acid solvate of Alq(3), and the alpha polymorph of Alq(3) is obtained on subsequent heating/desolvation of this phase. The structure of the mechanochemically prepared acetic acid solvate of Alq(3) has been determined directly from powder X-ray diffraction data and is shown to be a different polymorph from the corresponding acetic acid solvate prepared by solution-state crystallization of Alq(3) from acetic acid. Significantly, the mechanochemical synthesis of Alq(3) is shown to be fully scalable across two orders of magnitude from 0.5 to 50 g scale. The Alq(3) sample obtained from the solvent-free mechanochemical synthesis is analytically pure and exhibits identical photoluminescence behavior to that of a sample prepared by the conventional synthetic route.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With the ability to engineer ferroelectricity in HfO2 thin films, manufacturable and highly scaled MFM capacitors and MFIS-FETs can be implemented into a CMOS-environment. NVM properties of the resulting devices are discussed and contrasted to existing perovskite based FRAM.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we present a novel discrete cosine transform (DCT) architecture that allows aggressive voltage scaling for low-power dissipation, even under process parameter variations with minimal overhead as opposed to existing techniques. Under a scaled supply voltage and/or variations in process parameters, any possible delay errors appear only from the long paths that are designed to be less contributive to output quality. The proposed architecture allows a graceful degradation in the peak SNR (PSNR) under aggressive voltage scaling as well as extreme process variations. Results show that even under large process variations (±3σ around mean threshold voltage) and aggressive supply voltage scaling (at 0.88 V, while the nominal voltage is 1.2 V for a 90-nm technology), there is a gradual degradation of image quality with considerable power savings (71% at PSNR of 23.4 dB) for the proposed architecture, when compared to existing implementations in a 90-nm process technology. © 2006 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we explore various arithmetic units for possible use in high-speed, high-yield ALUs operated at scaled supply voltage with adaptive clock stretching. We demonstrate that careful logic optimization of the existing arithmetic units (to create hybrid units) indeed make them further amenable to supply voltage scaling. Such hybrid units result from mixing right amount of fast arithmetic into the slower ones. Simulations on different hybrid adder and multipliers in BPTM 70 nm technology show 18%-50% improvements in power compared to standard adders with only 2%-8% increase in die-area at iso-yield. These optimized datapath units can be used to construct voltage scalable robust ALUs that can operate at high clock frequency with minimal performance degradation due to occasional clock stretching. © 2009 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we present a design methodology for algorithm/architecture co-design of a voltage-scalable, process variation aware motion estimator based on significance driven computation. The fundamental premise of our approach lies in the fact that all computations are not equally significant in shaping the output response of video systems. We use a statistical technique to intelligently identify these significant/not-so-significant computations at the algorithmic level and subsequently change the underlying architecture such that the significant computations are computed in an error free manner under voltage over-scaling. Furthermore, our design includes an adaptive quality compensation (AQC) block which "tunes" the algorithm and architecture depending on the magnitude of voltage over-scaling and severity of process variations. Simulation results show average power savings of similar to 33% for the proposed architecture when compared to conventional implementation in the 90 nm CMOS technology. The maximum output quality loss in terms of Peak Signal to Noise Ratio (PSNR) was similar to 1 dB without incurring any throughput penalty.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we propose a system level design approach considering voltage over-scaling (VOS) that achieves error resiliency using unequal error protection of different computation elements, while incurring minor quality degradation. Depending on user specifications and severity of process variations/channel noise, the degree of VOS in each block of the system is adaptively tuned to ensure minimum system power while providing "just-the-right" amount of quality and robustness. This is achieved, by taking into consideration block level interactions and ensuring that under any change of operating conditions, only the "less-crucial" computations, that contribute less to block/system output quality, are affected. The proposed approach applies unequal error protection to various blocks of a system-logic and memory-and spans multiple layers of design hierarchy-algorithm, architecture and circuit. The design methodology when applied to a multimedia subsystem shows large power benefits ( up to 69% improvement in power consumption) at reasonable image quality while tolerating errors introduced due to VOS, process variations, and channel noise.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Today there is a growing interest in the integration of health monitoring applications in portable devices necessitating the development of methods that improve the energy efficiency of such systems. In this paper, we present a systematic approach that enables energy-quality trade-offs in spectral analysis systems for bio-signals, which are useful in monitoring various health conditions as those associated with the heart-rate. To enable such trade-offs, the processed signals are expressed initially in a basis in which significant components that carry most of the relevant information can be easily distinguished from the parts that influence the output to a lesser extent. Such a classification allows the pruning of operations associated with the less significant signal components leading to power savings with minor quality loss since only less useful parts are pruned under the given requirements. To exploit the attributes of the modified spectral analysis system, thresholding rules are determined and adopted at design- and run-time, allowing the static or dynamic pruning of less-useful operations based on the accuracy and energy requirements. The proposed algorithm is implemented on a typical sensor node simulator and results show up-to 82% energy savings when static pruning is combined with voltage and frequency scaling, compared to the conventional algorithm in which such trade-offs were not available. In addition, experiments with numerous cardiac samples of various patients show that such energy savings come with a 4.9% average accuracy loss, which does not affect the system detection capability of sinus-arrhythmia which was used as a test case.