878 resultados para advanced compiler optimizations
Resumo:
An advanced design of the solid-state cell incorporating a buffer electrode has been developed for high temperature thermodynamic measurements. The function of the buffer electrode, placed between reference and working electrodes, was to absorb the electrochemical flux of the mobile species through the solid electrolyte caused by trace electronic conductivity. The buffer electrode prevented polarization of the measuring electrode and ensured accurate data. The application of the novel design and its advantages have been demonstrated by measuring the standard Gibbs energies of formation of ternary oxides of the system Sm–Pd–O. Yttria-stabilized zirconia was used as the solid electrolyte and pure oxygen gas at a pressure of 0.1 MPa as the reference electrode. For the design of appropriate working electrodes, phase relations in the ternary system Sm–Pd–O were investigated at 1273 K. The two ternary oxides, Sm4PdO7 and Sm2Pd2O5, compositions of which fall on the Sm2O3–PdO join, were found to coexist with pure metal Pd. The thermodynamic properties of the ternary oxides were measured using three-phase electrodes in the temperature range 950–1425 K. During electrochemical measurements a third ternary oxide, Sm2PdO4, was found to be stable at low temperature. The standard Gibbs energies of formation (Δf(ox)Go) of the compounds from their component binary oxides Sm2O3 and PdO, can be represented by the equations: Sm4PdO7: Δf(ox)Go (J mol−1)=−34,220+0.84T(K) (±280); Sm2PdO4: Δf(ox)Go (J mol−1)=−33,350+2.49T(K) (±230); Sm2Pd2O5: Δf(ox)Go (J mol−1)=−59,955+1.80T(K) (±320). Based on the thermodynamic information, three-dimensional P–T–C and chemical potential diagrams for the system Sm–Pd–O were developed.
Resumo:
Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Although clustering helps by improving the clock speed, reducing the energy consumption of the logic, and making the design simpler, it introduces extra overheads by way of inter-cluster communication. This communication happens over long global wires having high load capacitance which leads to delay in execution and significantly high energy consumption. Inter-cluster communication also introduces many short idle cycles, thereby significantly increasing the overall leakage energy consumption in the functional units. The trend towards miniaturization of devices (and associated reduction in threshold voltage) makes energy consumption in interconnects and functional units even worse, and limits the usability of clustered architectures in smaller technologies. However, technological advancements now permit the design of interconnects and functional units with varying performance and power modes. In this paper, we propose scheduling algorithms that aggregate the scheduling slack of instructions and communication slack of data values to exploit the low-power modes of functional units and interconnects. Finally, we present a synergistic combination of these algorithms that simultaneously saves energy in functional units and interconnects to improves the usability of clustered architectures by achieving better overall energy-performance trade-offs. Even with conservative estimates of the contribution of the functional units and interconnects to the overall processor energy consumption, the proposed combined scheme obtains on average 8% and 10% improvement in overall energy-delay product with 3.5% and 2% performance degradation for a 2-clustered and a 4-clustered machine, respectively. We present a detailed experimental evaluation of the proposed schemes. Our test bed uses the Trimaran compiler infrastructure. (C) 2012 Elsevier Inc. All rights reserved.
Resumo:
Software transactional memory (STM) is a promising programming paradigm for shared memory multithreaded programs. In order for STMs to be adopted widely for performance critical software, understanding and improving the cache performance of applications running on STM becomes increasingly crucial, as the performance gap between processor and memory continues to grow. In this paper, we present the most detailed experimental evaluation to date, of the cache behavior of STM applications and quantify the impact of the different STM factors on the cache misses experienced by the applications. We find that STMs are not cache friendly, with the data cache stall cycles contributing to more than 50% of the execution cycles in a majority of the benchmarks. We find that on an average, misses occurring inside the STM account for 62% of total data cache miss latency cycles experienced by the applications and the cache performance is impacted adversely due to certain inherent characteristics of the STM itself. The above observations motivate us to propose a set of specific compiler transformations targeted at making the STMs cache friendly. We find that STM's fine grained and application unaware locking is a major contributor to its poor cache behavior. Hence we propose selective Lock Data co-location (LDC) and Redundant Lock Access Removal (RLAR) to address the lock access misses. We find that even transactions that are completely disjoint access parallel, suffer from costly coherence misses caused by the centralized global time stamp updates and hence we propose the Selective Per-Partition Time Stamp (SPTS) transformation to address this. We show that our transformations are effective in improving the cache behavior of STM applications by reducing the data cache miss latency by 20.15% to 37.14% and improving execution time by 18.32% to 33.12% in five of the 8 STAMP applications.
Resumo:
Voltage source inverters (VSIs) supply nonsinusoidal voltages to induction motor drives, leading to line current distortion and torque pulsation. Conventional space vector pulsewidth modulation (PWM) techniques are widely used in VSIs on the account of good waveform quality and high dc bus utilization. In a conventional space vector PWM technique, the switching sequence begins with one zero state and ends with the other zero state in a subcycle. Some novel switching sequences have been proposed, which employ only one zero state but apply one of the two active states twice in a subcycle. One pair of such special switching sequences has recently been shown to reduce the pulsating torque considerably. In this paper, the conventional and special switching sequences are compared experimentally in terms of acoustic noise. In the low-and medium-speed ranges, the special switching sequence is seen to reduce the amplitude of the tonal component of noise at the switching frequency considerably and is also found to result in spread spectrum.
Resumo:
Ensuring reliable operation over an extended period of time is one of the biggest challenges facing present day electronic systems. The increased vulnerability of the components to atmospheric particle strikes poses a big threat in attaining the reliability required for various mission critical applications. Various soft error mitigation methodologies exist to address this reliability challenge. A general solution to this problem is to arrive at a soft error mitigation methodology with an acceptable implementation overhead and error tolerance level. This implementation overhead can then be reduced by taking advantage of various derating effects like logical derating, electrical derating and timing window derating, and/or making use of application redundancy, e. g. redundancy in firmware/software executing on the so designed robust hardware. In this paper, we analyze the impact of various derating factors and show how they can be profitably employed to reduce the hardware overhead to implement a given level of soft error robustness. This analysis is performed on a set of benchmark circuits using the delayed capture methodology. Experimental results show upto 23% reduction in the hardware overhead when considering individual and combined derating factors.
Resumo:
Points-to analysis is a key compiler analysis. Several memory related optimizations use points-to information to improve their effectiveness. Points-to analysis is performed by building a constraint graph of pointer variables and dynamically updating it to propagate more and more points-to information across its subset edges. So far, the structure of the constraint graph has been only trivially exploited for efficient propagation of information, e.g., in identifying cyclic components or to propagate information in topological order. We perform a careful study of its structure and propose a new inclusion-based flow-insensitive context-sensitive points-to analysis algorithm based on the notion of dominant pointers. We also propose a new kind of pointer-equivalence based on dominant pointers which provides significantly more opportunities for reducing the number of pointers tracked during the analysis. Based on this hitherto unexplored form of pointer-equivalence, we develop a new context-sensitive flow-insensitive points-to analysis algorithm which uses incremental dominator update to efficiently compute points-to information. Using a large suite of programs consisting of SPEC 2000 benchmarks and five large open source programs we show that our points-to analysis is 88% faster than BDD-based Lazy Cycle Detection and 2x faster than Deep Propagation. We argue that our approach of detecting dominator-based pointer-equivalence is a key to improve points-to analysis efficiency.
Resumo:
ADVANCED MULTIFUNCTIONAL INORGANIC NANOSTRUCTURED OXIDES FOR CONTROLLED RELEASE AND SENSING. We demonstrate here certain examples of multifunctional nanostructured oxidematerials for biotechnological and environmental applications.Various in-house synthesized homogeneous nanostructured viz.mesoporous and nanotubes silica and titania have been employed for controlled drug delivery and electrochemical biosensing applications. Confinement of macromolecules such as proteins studied via electrochemical, thermal and spectroscopic methods showed no detrimental effect on native protein structure and function, thus suggesting effective utility of oxide nanostructures as bio-encapsulators. Multi-functionalitywas demonstrated via employing similar nanostructures for sensing organic water pollutants e.g. textile dyes.
Resumo:
Exploiting the performance potential of GPUs requires managing the data transfers to and from them efficiently which is an error-prone and tedious task. In this paper, we develop a software coherence mechanism to fully automate all data transfers between the CPU and GPU without any assistance from the programmer. Our mechanism uses compiler analysis to identify potential stale accesses and uses a runtime to initiate transfers as necessary. This allows us to avoid redundant transfers that are exhibited by all other existing automatic memory management proposals. We integrate our automatic memory manager into the X10 compiler and runtime, and find that it not only results in smaller and simpler programs, but also eliminates redundant memory transfers. Tested on eight programs ported from the Rodinia benchmark suite it achieves (i) a 1.06x speedup over hand-tuned manual memory management, and (ii) a 1.29x speedup over another recently proposed compiler--runtime automatic memory management system. Compared to other existing runtime-only and compiler-only proposals, it also transfers 2.2x to 13.3x less data on average.
Resumo:
Software transactional memory(STM) is a promising programming paradigm for shared memory multithreaded programs. While STM offers the promise of being less error-prone and more programmer friendly compared to traditional lock-based synchronization, it also needs to be competitive in performance in order for it to be adopted in mainstream software. A major source of performance overheads in STM is transactional aborts. Conflict resolution and aborting a transaction typically happens at the transaction level which has the advantage that it is automatic and application agnostic. However it has a substantial disadvantage in that STM declares the entire transaction as conflicting and hence aborts it and re-executes it fully, instead of partially re-executing only those part(s) of the transaction, which have been affected due to the conflict. This "Re-execute Everything" approach has a significant adverse impact on STM performance. In order to mitigate the abort overheads, we propose a compiler aided Selective Reconciliation STM (SR-STM) scheme, wherein certain transactional conflicts can be reconciled by performing partial re-execution of the transaction. Ours is a selective hybrid approach which uses compiler analysis to identify those data accesses which are legal and profitable candidates for reconciliation and applies partial re-execution only to these candidates selectively while other conflicting data accesses are handled by the default STM approach of abort and full re-execution. We describe the compiler analysis and code transformations required for supporting selective reconciliation. We find that SR-STM is effective in reducing the transactional abort overheads by improving the performance for a set of five STAMP benchmarks by 12.58% on an average and up to 22.34%.
Resumo:
Space-vector-based pulse width modulation (PWM) for a voltage source inverter (VSI) offers flexibility in terms of different switching sequences. Numerical simulation is helpful to assess the performance of a PWM method before actual implementation. A quick-simulation tool to simulate a variety of space-vector-based PWM strategies for a two-level VSI-fed squirrel cage induction motor drive is presented. The simulator is developed using C and Python programming languages, and has a graphical user interface (GUI) also. The prime focus being PWM strategies, the simulator developed is 40 times faster than MATLAB in terms of the actual time taken for a simulation. Simulation and experimental results are presented on a 5-hp ac motor drive.
Resumo:
Structural Health Monitoring (SHM) is an effective extension of NDE to reduce down time and cost of Inspection of structural components. On – line monitoring is an essential part of SHM. Acoustic Emission Techniques have most of the desirable requirements of an effective SHM tool. With the kind of advancement seen in the last couple of decades in the field of electronics, computers and signal processing technologies it can only be more helpful in obtaining better and meaningful quantitative results which can further enhance the potential of AET for the purpose. Advanced Composite materials owing to their specific high performance characteristics are finding a wide range of engineering applications. Testing and Evaluation of this category of materials and SHM of composite structures have been very challenging problems due to the very nature of these materials. Mechanical behaviour of fiber composite materials under different loading conditions is complex and involves different types of failure mechanisms. This is where the potential of AET can be exploited effectively. This paper presents an over view of some relevant studies where AET has been utilised to test, evaluate and monitor health of composite structures.
Resumo:
Advanced bus-clamping pulse width modulation (ABCPWM) techniques are advantageous in terms of line current distortion and inverter switching loss in voltage source inverter-fed applications. However, the PWM waveforms corresponding to these techniques are not amenable to carrier-based generation. The modulation process in ABCPWM methods is analyzed here from a “per-phase” perspective. It is shown that three sets of descendant modulating functions (or modified modulating functions) can be generated from the three-phase sinusoidal signals. Each set of the modified modulating functions can be used to produce the PWM waveform of a given phase in a computationally efficient manner. Theoretical results and experimental investigations on a 5hp motor drive are presented
Resumo:
Dead-time is introduced between the gating signals to the top and bottom switches in a voltage source inverter (VSI) leg, to prevent shoot through fault due to the finite turn-off times of IGBTs. The dead-time results in a delay when the incoming device is an IGBT, resulting in error voltage pulses in the inverter output voltage. This paper presents the design, fabrication and testing of an advanced gate driver, which eliminates dead-time and consequent output distortion. Here, the gating pulses are generated such that the incoming IGBT transition is not delayed and shoot-through is also prevented. The various logic units of the driver card and fault tolerance of the driver are verified through extensive tests on different topologies such as chopper, half-bridge and full-bridge inverter, and also at different conditions of load. Experimental results demonstrate the improvement in the load current waveform quality with the proposed circuit, on account of elimination of dead-time.
Resumo:
A few advanced bus-clamping pulse width modulation (ABCPWM) methods have been proposed recently for a three-phase inverter. With these methods, each phase is clamped, switched at nominal frequency, and switched at twice the nominal frequency in different regions of the fundamental cycle. This study proposes a generalised ABCPWM scheme, encompassing the few ABCPWM schemes that have been proposed and many more ABCPWM schemes that have not been reported yet. Furthermore, analytical closed-form expression is derived for the harmonic distortion factor corresponding to the generalised ABCPWM. This factor is independent of load parameters. The analytical expression derived here brings out the dependence of root-mean-square (RMS) current ripple on modulation index, and can be used to evaluate the RMS current ripple corresponding to any ABCPWM scheme. The analytical closed-form expression is validated experimentally in terms of measured weighted total harmonic distortion (THD) in line voltage (V-WTHD) and measured THD in line current (I-THD) on a 6 kW induction motor drive.
Resumo:
Turbine inlet pressures of similar to 300 bar in case of CO2 based cycles call for redesigning the cycle in such a way that the optimum high side pressures are restricted to the discharge pressure limits imposed by currently available commercial compressors (similar to 150 bar) for distributed power generation. This leads to a cycle which is a combination of a transcritical condensing and a subcritical cycle with an intercooler and a bifurcation system in it. Using a realistic thermodynamic model, it is predicted that the cycle with the working fluid as a non-flammable mixture of 48.5 % propane and rest CO2 delivers similar to 37.2 % efficiency at 873 K with a high and a low side pressure of 150 and 26 bar respectively. This is in contrast to the best efficiency of similar to 36.1 % offered by a transcritical condensing cycle with the same working fluid at a high side pressure of similar to 300 bar