9 resultados para run performance

em Indian Institute of Science - Bangalore - Índia


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Rapid advancements in multi-core processor architectures coupled with low-cost, low-latency, high-bandwidth interconnects have made clusters of multi-core machines a common computing resource. Unfortunately, writing good parallel programs that efficiently utilize all the resources in such a cluster is still a major challenge. Various programming languages have been proposed as a solution to this problem, but are yet to be adopted widely to run performance-critical code mainly due to the relatively immature software framework and the effort involved in re-writing existing code in the new language. In this paper, we motivate and describe our initial study in exploring CUDA as a programming language for a cluster of multi-cores. We develop CUDA-For-Clusters (CFC), a framework that transparently orchestrates execution of CUDA kernels on a cluster of multi-core machines. The well-structured nature of a CUDA kernel, the growing popularity, support and stability of the CUDA software stack collectively make CUDA a good candidate to be considered as a programming language for a cluster. CFC uses a mixture of source-to-source compiler transformations, a work distribution runtime and a light-weight software distributed shared memory to manage parallel executions. Initial results on running several standard CUDA benchmark programs achieve impressive speedups of up to 7.5X on a cluster with 8 nodes, thereby opening up an interesting direction of research for further investigation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Friction characteristics of journal bearings made from cast graphic aluminum particulate composite alloy were determined under mixed lubrication and compared with those of the base alloy (without graphite) and leaded phosphor bronze. All three materials ran without seizure while the performance of the particulate composite and leaded phosphor bronze improved with running. Temperature rise in the journal bearing under mixed/boundary lubrication was also measured. It was found that with 0.3D/1000 to 1.5D/1000 clearance and a low lubrication rate (typical value for a bearing of diameter 35 mm × length 35 mm is 80 mm3/min) and at a PV value of 73 × 106 Nm m−2 min−1 graphitic aluminium alloy journal bearings operate satisfactorily without seizure and excessive temperature rise. In comparison, the bronze bearings, with all the other parameters remaining the same, could not run without excessive temperature rise at clearances below D/1000 at lubrication rates lower than 200 mm3/min

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Surface treatment alters the frictional behaviour of pistons in I.C. engines and can be used to improve engine performance. Surface treatments applied to aluminium alloy pistons of a high speed diesel engine and their effect on the engine performance are described. Certain piston surface treatments improve engine performance and also reduce the run-in period.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

H.264 video standard achieves high quality video along with high data compression when compared to other existing video standards. H.264 uses context-based adaptive variable length coding (CAVLC) to code residual data in Baseline profile. In this paper we describe a novel architecture for CAVLC decoder including coeff-token decoder, level decoder total-zeros decoder and run-before decoder UMC library in 0.13 mu CMOS technology is used to synthesize the proposed design. The proposed design reduces chip area and improves critical path performance of CAVLC decoder in comparison with [1]. Macroblock level (including luma and chroma) pipeline processing for CAVLC is implemented with an average of 141 cycles (including pipeline buffering) per macroblock at 250MHz clock frequency. To compare our results with [1] clock frequency is constrained to 125MHz. The area required for the proposed architecture is 17586 gates, which is 22.1% improvement in comparison to [1]. We obtain a throughput of 1.73 * 10(6) macroblocks/second, which is 28% higher than that reported in [1]. The proposed design meets the processing requirement of 1080HD [5] video at 30frames/seconds.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Modern wireline and wireless communication devices are multimode and multifunctional communication devices. In order to support multiple standards on a single platform, it is necessary to develop a reconfigurable architecture that can provide the required flexibility and performance. The Channel decoder is one of the most compute intensive and essential elements of any communication system. Most of the standards require a reconfigurable Channel decoder that is capable of performing Viterbi decoding and Turbo decoding. Furthermore, the Channel decoder needs to support different configurations of Viterbi and Turbo decoders. In this paper, we propose a reconfigurable Channel decoder that can be reconfigured for standards such as WCDMA, CDMA2000, IEEE802.11, DAB, DVB and GSM. Different parameters like code rate, constraint length, polynomials and truncation length can be configured to map any of the above mentioned standards. A multiprocessor approach has been followed to provide higher throughput and scalable power consumption in various configurations of the reconfigurable Viterbi decoder and Turbo decoder. We have proposed A Hybrid register exchange approach for multiprocessor architecture to minimize power consumption.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This letter investigates the influence of a corrugated gate on the transfer characteristics of thin-film transistors. Corrugations that run parallel to the length of the channel from source to drain are patterned on the gate. The author finds that these corrugations result in higher currents as compared to conventional planar-gate transistors.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Numerical Linear Algebra (NLA) kernels are at the heart of all computational problems. These kernels require hardware acceleration for increased throughput. NLA Solvers for dense and sparse matrices differ in the way the matrices are stored and operated upon although they exhibit similar computational properties. While ASIC solutions for NLA Solvers can deliver high performance, they are not scalable, and hence are not commercially viable. In this paper, we show how NLA kernels can be accelerated on REDEFINE, a scalable runtime reconfigurable hardware platform. Compared to a software implementation, Direct Solver (Modified Faddeev's algorithm) on REDEFINE shows a 29X improvement on an average and Iterative Solver (Conjugate Gradient algorithm) shows a 15-20% improvement. We further show that solution on REDEFINE is scalable over larger problem sizes without any notable degradation in performance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Before installation, a voltage source converter is usually subjected to heat-run test to verify its thermal design and performance under load. For heat-run test, the converter needs to be operated at rated voltage and rated current for a substantial length of time. Hence, such tests consume huge amount of energy in case of high-power converters. Also, the capacities of the source and loads available in the research and development (R&D) centre or the production facility could be inadequate to conduct such tests. This paper proposes a method to conduct heat-run tests on high-power, pulse width modulated (PWM) converters with low energy consumption. The experimental set-up consists of the converter under test and another converter (of similar or higher rating), both connected in parallel on the ac side and open on the dc side. Vector-control or synchronous reference frame control is employed to control the converters such that one draws certain amount of reactive power and the other supplies the same; only the system losses are drawn from the mains. The performance of the controller is validated through simulation and experiments. Experimental results, pertaining to heat-run tests on a high-power PWM converter, are presented at power levels of 25 kVA to 150 kVA.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Elasticity in cloud systems provides the flexibility to acquire and relinquish computing resources on demand. However, in current virtualized systems resource allocation is mostly static. Resources are allocated during VM instantiation and any change in workload leading to significant increase or decrease in resources is handled by VM migration. Hence, cloud users tend to characterize their workloads at a coarse grained level which potentially leads to under-utilized VM resources or under performing application. A more flexible and adaptive resource allocation mechanism would benefit variable workloads, such as those characterized by web servers. In this paper, we present an elastic resources framework for IaaS cloud layer that addresses this need. The framework provisions for application workload forecasting engine, that predicts at run-time the expected demand, which is input to the resource manager to modulate resource allocation based on the predicted demand. Based on the prediction errors, resources can be over-allocated or under-allocated as compared to the actual demand made by the application. Over-allocation leads to unused resources and under allocation could cause under performance. To strike a good trade-off between over-allocation and under-performance we derive an excess cost model. In this model excess resources allocated are captured as over-allocation cost and under-allocation is captured as a penalty cost for violating application service level agreement (SLA). Confidence interval for predicted workload is used to minimize this excess cost with minimal effect on SLA violations. An example case-study for an academic institute web server workload is presented. Using the confidence interval to minimize excess cost, we achieve significant reduction in resource allocation requirement while restricting application SLA violations to below 2-3%.