47 resultados para advanced compiler optimizations


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Software transactional memory (STM) is a promising programming paradigm for shared memory multithreaded programs as an alternative to traditional lock based synchronization. However adoption of STM in mainstream software has been quite low due to its considerable overheads and its poor cache/memory performance. In this paper, we perform a detailed study of the cache behavior of STM applications and quantify the impact of different STM factors on the cache misses experienced by the applications. Based on our analysis, we propose a compiler driven Lock-Data Colocation (LDC), targeted at reducing the cache overheads on STM. We show that LDC is effective in improving the cache behavior of STM applications by reducing the dcache miss latency and improving execution time performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The ability of a metal to resist strain localisation and hence reduction in local thickness, is a most important forming property upon stretching. The uniform strain represents in this regard a critical factor to describe stretching ability - especially when the material under consideration exhibits negative strain rate sensitivity and dynamic strain ageing (DSA). A newly developed Laser Speckle Technique (LST), e.g. see [1], was used in-situ during tensile testing with two extensometers. The applied technique facilitates quantitative information on the propagating plasticity (i.e. the so-called PLC bands) known to take place during deformation where DSA is active. The band velocity (V-band), and the bandwidth (W-band) were monitored upon increasing accumulated strain. The knowledge obtained with the LST was useful for understanding the underlying mechanisms for the formability limit when DSA and negative strain rate sensitivity operate. The goal was to understand the relationship between PLC/DSA phenomena and the formability limit physically manifested as shear band formation. Two principally different alloys were used to discover alloying effects.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Just-in-Time (JIT) compilers for Java can be augmented by making use of runtime profile information to produce better quality code and hence achieve higher performance. In a JIT compilation environment, the profile information obtained can be readily exploited in the same run to aid recompilation and optimization of frequently executed (hot) methods. This paper discusses a low overhead path profiling scheme for dynamically profiling AT produced native code. The profile information is used in recompilation during a subsequent invocation of the hot method. During recompilation tree regions along the hot paths are enlarged and instruction scheduling at the superblock level is performed. We have used the open source LaTTe AT compiler framework for our implementation. Our results on a SPARC platform for SPEC JVM98 benchmarks indicate that (i) there is a significant reduction in the number of tree regions along the hot paths, and (ii) profile aided recompilation in LaTTe achieves performance comparable to that of adaptive LaTTe in spite of retranslation and profiling overheads.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We measure the non-axisymmetry in the luminosity distribution in the central few kpc of a sample of advanced mergers of galaxies, by analyzing their 2MASS images. All mergers show a high central asymmetry: the centres of isophotes show a striking sloshing pattern with a spatial variation of upto 30% within the central 1 kpc; and the Fourier amplitude for lopsidedness (m = 1) shows high values upto 0.2 within the central 5 kpc. The central asymmetry is estimated to be long-lived, lasting for ~ a few Gyr or ~ 100 local dynamical timescales. This will significantly affect the dynamical evolution of this region, by helping fuel the central active galactic nucleus, and also by causing the secular growth of the bulge driven by lopsidedness.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A sample of 27 disturbed galaxies that show signs of interaction but have a single nucleus were selected from the Arp and the Arp-Madore catalogues. For these, the Ks band images from the Two Micron All Sky Survey (2MASS) are analysed to obtain their radial luminosity pro�les and other structural parameters. We �nd that in spite of their similar optical appearance, the sample galaxies vary in their dynamical properties, and fall into two distinct classes. The �rst class consists of galaxies which can be described by a single r1=4 law and the second class consists of galaxies that show an outer exponential disk. A few galaxies that have disturbed pro�les cannot be �t into either of the above classes. However, all the galaxies are similar in all other parameters such as the far-infrared colours, the molecular hydrogen content and the central velocity dispersion. Thus, the dynamical parameters of these sets seem to be determined by the ratio of the initial masses of the colliding galaxies. We propose that the galaxies in the �rst class result from a merger of spiral galaxies of equal masses whereas the second class of galaxies results from a merger of unequal mass galaxies. The few objects that do not fall into either category show a disturbed luminosity pro�le and a wandering centre, which is indicative of these being unrelaxed mergers. Of the 27 galaxies in our sample, 9 show elliptical-like pro�les and 13 show an outer exponential. Interestingly, Arp 224, the second oldest merger remnant of the Toomre sequence shows an exponential disk in the outer parts.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Multiple Clock Domain processors provide an attractive solution to the increasingly challenging problems of clock distribution and power dissipation. They allow their chips to be partitioned into different clock domains, and each domain’s frequency (voltage) to be independently configured. This flexibility adds new dimensions to the Dynamic Voltage and Frequency Scaling problem, while providing better scope for saving energy and meeting performance demands. In this paper, we propose a compiler directed approach for MCD-DVFS. We build a formal petri net based program performance model, parameterized by settings of microarchitectural components and resource configurations, and integrate it with our compiler passes for frequency selection.Our model estimates the performance impact of a frequency setting, unlike the existing best techniques which rely on weaker indicators of domain performance such as queue occupancies(used by online methods) and slack manifestation for a particular frequency setting (software based methods).We evaluate our method with subsets of SPECFP2000,Mediabench and Mibench benchmarks. Our mean energy savings is 60.39% (versus 33.91% of the best software technique)in a memory constrained system for cache miss dominated benchmarks, and we meet the performance demands.Our ED2 improves by 22.11% (versus 18.34%) for other benchmarks. For a CPU with restricted frequency settings, our energy consumption is within 4.69% of the optimal.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Based on the an earlier CFD analysis of the performance of the gas-dynamically controlled laser cavity [1]it was found that there is possibility of optimizing the geometry of the diffuser that can bring about reductions in both size and cost of the system by examining the critical dimensional requirements of the diffuser. Consequently,an extensive CFD analysis has been carried out for a range of diffuser configurations by simulating the supersonic flow through the arrangement including the laser cavity driven by a bank of converging – diverging nozzles and the diffuser. The numerical investigations with 3D-RANS code are carried out to capture the flow patterns through diffusers past the cavity that has multiple supersonic jet interactions with shocks leading to complex flow pattern. Varying length of the diffuser plates is made to be the basic parameter of the study. The analysis reveals that the pressure recovery pattern during the flow through the diffuser from the simulation, being critical for the performance of the laser device shows its dependence on the diffuser length is weaker beyond a critical lower limit and this evaluation of this limit would provide a design guideline for a more efficient system configuration.The observation based on the parametric study shows that the pressure recovery transients in the near vicinity of the cavity is not affected for the reduction in the length of the diffuser plates up to its 10% of the initial size, indicating the design in the first configuration that was tested experimentally has a large factor of margin. The flow stability in the laser cavity is found to be unaffected since a strong and stable shock is located at the leading edge of the diffuser plates while the downstream shock and flow patterns are changed, as one would expect. Results of the study for the different lengths of diffusers in the range of 10% to its full length are presented, keeping the experimentally tested configuration used in the earlier study [1] as the reference length. The conclusions drawn from the analysis is found to be of significance since it provides new design considerations based on the understanding of the intricacies of the flow, allowing for a hardware optimization that can lead to substantial size reduction of the device with no loss of performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Energy consumption has become a major constraint in providing increased functionality for devices with small form factors. Dynamic voltage and frequency scaling has been identified as an effective approach for reducing the energy consumption of embedded systems. Earlier works on dynamic voltage scaling focused mainly on performing voltage scaling when the CPU is waiting for memory subsystem or concentrated chiefly on loop nests and/or subroutine calls having sufficient number of dynamic instructions. This paper concentrates on coarser program regions and for the first time uses program phase behavior for performing dynamic voltage scaling. Program phases are annotated at compile time with mode switch instructions. Further, we relate the Dynamic Voltage Scaling Problem to the Multiple Choice Knapsack Problem, and use well known heuristics to solve it efficiently. Also, we develop a simple integer linear program formulation for this problem. Experimental evaluation on a set of media applications reveal that our heuristic method obtains a 38% reduction in energy consumption on an average, with a performance degradation of 1% and upto 45% reduction in energy with a performance degradation of 5%. Further, the energy consumed by the heuristic solution is within 1% of the optimal solution obtained from the ILP approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

When hosting XML information on relational backends, a mapping has to be established between the schemas of the information source and the target storage repositories. A rich body of recent literature exists for mapping isolated components of XML Schema to their relational counterparts, especially with regard to table configurations. In this paper, we present the Elixir system for designing industrial-strength mappings for real-world applications. Specifically, it produces an information-preserving holistic mapping that transforms the complete XML world-view (XML schema with constraints, XML documents XQuery queries including triggers and views) into a full-scale relational mapping (table definitions, integrity constraints, indices, triggers and views) that is tuned to the application workload. A key design feature of Elixir is that it performs all its mapping-related optimizations in the XML source space, rather than in the relational target space. Further, unlike the XML mapping tools of commercial database systems, which rely heavily on user inputs, Elixir takes a principled cost-based approach to automatically find an efficient relational mapping. A prototype of Elixir is operational and we quantitatively demonstrate its functionality and efficacy on a variety of real-life XML schemas.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Miniaturization of devices and the ensuing decrease in the threshold voltage has led to a substantial increase in the leakage component of the total processor energy consumption. Relatively simpler issue logic and the presence of a large number of function units in the VLIW and the clustered VLIW architectures attribute a large fraction of this leakage energy consumption in the functional units. However, functional units are not fully utilized in the VLIW architectures because of the inherent variations in the ILP of the programs. This underutilization is even more pronounced in the context of clustered VLIW architectures because of the contentions for the limited number of slow intercluster communication channels which lead to many short idle cycles.In the past, some architectural schemes have been proposed to obtain leakage energy bene .ts by aggressively exploiting the idleness of functional units. However, presence of many short idle cycles cause frequent transitions from the active mode to the sleep mode and vice-versa and adversely a ffects the energy benefits of a purely hardware based scheme. In this paper, we propose and evaluate a compiler instruction scheduling algorithm that assist such a hardware based scheme in the context of VLIW and clustered VLIW architectures. The proposed scheme exploits the scheduling slacks of instructions to orchestrate the functional unit mapping with the objective of reducing the number of transitions in functional units thereby keeping them off for a longer duration. The proposed compiler-assisted scheme obtains a further 12% reduction of energy consumption of functional units with negligible performance degradation over a hardware-only scheme for a VLIW architecture. The benefits are 15% and 17% in the context of a 2-clustered and a 4-clustered VLIW architecture respectively. Our test bed uses the Trimaran compiler infrastructure.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An isothermal section of the phase diagram for the system Nd-Pd-O at 1350 K has been established by equilibration of samples representing 13 different compositions and phase identification after quenching by optical and scanning electron microscopy, x-ray diffraction, and energy dispersive analysis of x-rays. The binary oxides PdO and NdO were not stable at 1350 K. Two ternary oxides Nd4PdO7 and Nd2Pd2O5 were identified. Solid and liquid alloys, as well as the intermetallics NdPd3 and NdPd5, were found to be in equilibrium with Nd2O3. Based on the phase relations, three solidstate cells were designed to measure the Gibbs energies of formation of PdO and the two ternary oxides. An advanced version of the solid-state cell incorporating a buffer electrode was used for high-temperature thermodynamic measurements. The function of the buffer electrode, placed between reference and working electrodes, was to absorb the electrochemical flux of the mobile species through the solid electrolyte caused by trace electronic conductivity. The buffer electrode prevented polarization of the measuring electrode and ensured accurate data. Yttria-stabilized zirconia was used as the solid electrolyte and pure oxygen gas at a pressure of 0.1 MP a as the reference electrode. Electromotive force measurements, conducted from 950 to 1425 K, indicated the presence of a third ternary oxide Nd2PdO4, stable below 1135 (±10) K. Additional cells were designed to study this compound. The standard Gibbs energy of formation of PdO (†f G 0) was measured from 775 to 1125 Kusing two separate cell designs against the primary reference standard for oxygen chemical potential. Based on the thermodynamic information, chemical potential diagrams for the system Nd-Pd-O were also developed.