42 resultados para industrial optimizing compiler

em Indian Institute of Science - Bangalore - Índia


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Data-flow analysis is an integral part of any aggressive optimizing compiler. We propose a framework for improving the precision of data-flow analysis in the presence of complex control-flow. W initially perform data-flow analysis to determine those control-flow merges which cause the loss in data-flow analysis precision. The control-flow graph of the program is then restructured such that performing data-flow analysis on the resulting restructured graph gives more precise results. The proposed framework is both simple, involving the familiar notion of product automata, and also general, since it is applicable to any forward data-flow analysis. Apart from proving that our restructuring process is correct, we also show that restructuring is effective in that it necessarily leads to more optimization opportunities. Furthermore, the framework handles the trade-off between the increase in data-flow precision and the code size increase inherent in the restructuring. We show that determining an optimal restructuring is NP-hard, and propose and evaluate a greedy strategy. The framework has been implemented in the Scale research compiler, and instantiated for the specific problem of Constant Propagation. On the SPECINT 2000 benchmark suite we observe an average speedup of 4% in the running times over Wegman-Zadeck conditional constant propagation algorithm and 2% over a purely path profile guided approach.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Instruction reuse is a microarchitectural technique that improves the execution time of a program by removing redundant computations at run-time. Although this is the job of an optimizing compiler, they do not succeed many a time due to limited knowledge of run-time data. In this paper we examine instruction reuse of integer ALU and load instructions in network processing applications. Specifically, this paper attempts to answer the following questions: (1) How much of instruction reuse is inherent in network processing applications?, (2) Can reuse be improved by reducing interference in the reuse buffer?, (3) What characteristics of network applications can be exploited to improve reuse?, and (4) What is the effect of reuse on resource contention and memory accesses? We propose an aggregation scheme that combines the high-level concept of network traffic i.e. "flows" with a low level microarchitectural feature of programs i.e. repetition of instructions and data along with an architecture that exploits temporal locality in incoming packet data to improve reuse. We find that for the benchmarks considered, 1% to 50% of instructions are reused while the speedup achieved varies between 1% and 24%. As a side effect, instruction reuse reduces memory traffic and can therefore be considered as a scheme for low power.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Traditionally, an instruction decoder is designed as a monolithic structure that inhibit the leakage energy optimization. In this paper, we consider a split instruction decoder that enable the leakage energy optimization. We also propose a compiler scheduling algorithm that exploits instruction slack to increase the simultaneous active and idle duration in instruction decoder. The proposed compiler-assisted scheme obtains a further 14.5% reduction of energy consumption of instruction decoder over a hardware-only scheme for a VLIW architecture. The benefits are 17.3% and 18.7% in the context of a 2-clustered and a 4-clustered VLIW architecture respectively.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An estimate of the groundwater budget at the catchment scale is extremely important for the sustainable management of available water resources. Water resources are generally subjected to over-exploitation for agricultural and domestic purposes in agrarian economies like India. The double water-table fluctuation method is a reliable method for calculating the water budget in semi-arid crystalline rock areas. Extensive measurements of water levels from a dense network before and after the monsoon rainfall were made in a 53 km(2)atershed in southern India and various components of the water balance were then calculated. Later, water level data underwent geostatistical analyses to determine the priority and/or redundancy of each measurement point using a cross-validation method. An optimal network evolved from these analyses. The network was then used in re-calculation of the water-balance components. It was established that such an optimized network provides far fewer measurement points without considerably changing the conclusions regarding groundwater budget. This exercise is helpful in reducing the time and expenditure involved in exhaustive piezometric surveys and also in determining the water budget for large watersheds (watersheds greater than 50 km(2)).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes the use of empirical modeling techniques for building microarchitecture sensitive models for compiler optimizations. The models we build relate program performance to settings of compiler optimization flags, associated heuristics and key microarchitectural parameters. Unlike traditional analytical modeling methods, this relationship is learned entirely from data obtained by measuring performance at a small number of carefully selected compiler/microarchitecture configurations. We evaluate three different learning techniques in this context viz. linear regression, adaptive regression splines and radial basis function networks. We use the generated models to a) predict program performance at arbitrary compiler/microarchitecture configurations, b) quantify the significance of complex interactions between optimizations and the microarchitecture, and c) efficiently search for'optimal' settings of optimization flags and heuristics for any given microarchitectural configuration. Our evaluation using benchmarks from the SPEC CPU2000 suits suggests that accurate models (< 5% average error in prediction) can be generated using a reasonable number of simulations. We also find that using compiler settings prescribed by a model-based search can improve program performance by as much as 19% (with an average of 9.5%) over highly optimized binaries.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present an algorithm for testing the suitability of an affix grammar for deterministic, one-pass, bottom-up parsing which is an improvement over the one suggested by Pohlmann [1]. The space requirements of the new algorithm are considerably less than that of Pohlmann's. We also describe an implementation of Pohlmann's algorithm and methods for improving its space requirements.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Multilevel converters have been under research and development for more than three decades and have found successful industrial application. However, this is still a technology under development, and many new contributions and new commercial topologies have been reported in the last few years. The aim of this paper is to group and review these recent contributions, in order to establish the current state of the art and trends of the technology, to provide readers with a comprehensive and insightful review of where multilevel converter technology stands and is heading. This paper first presents a brief overview of well-established multilevel converters strongly oriented to their current state in industrial applications to then center the discussion on the new converters that have made their way into the industry. In addition, new promising topologies are discussed. Recent advances made in modulation and control of multilevel converters are also addressed. A great part of this paper is devoted to show nontraditional applications powered by multilevel converters and how multilevel converters are becoming an enabling technology in many industrial sectors. Finally, some future trends and challenges in the further development of this technology are discussed to motivate future contributions that address open problems and explore new possibilities.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustered VLIW architectures solve the scalability problem associated with flat VLIW architectures by partitioning the register file and connecting only a subset of the functional units to a register file. However, inter-cluster communication in clustered architectures leads to increased leakage in functional components and a high number of register accesses. In this paper, we propose compiler scheduling algorithms targeting two previously ignored power-hungry components in clustered VLIW architectures, viz., instruction decoder and register file. We consider a split decoder design and propose a new energy-aware instruction scheduling algorithm that provides 14.5% and 17.3% benefit in the decoder power consumption on an average over a purely hardware based scheme in the context of 2-clustered and 4-clustered VLIW machines. In the case of register files, we propose two new scheduling algorithms that exploit limited register snooping capability to reduce extra register file accesses. The proposed algorithms reduce register file power consumption on an average by 6.85% and 11.90% (10.39% and 17.78%), respectively, along with performance improvement of 4.81% and 5.34% (9.39% and 11.16%) over a traditional greedy algorithm for 2-clustered (4-clustered) VLIW machine. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Software transactional memory (STM) is a promising programming paradigm for shared memory multithreaded programs as an alternative to traditional lock based synchronization. However adoption of STM in mainstream software has been quite low due to its considerable overheads and its poor cache/memory performance. In this paper, we perform a detailed study of the cache behavior of STM applications and quantify the impact of different STM factors on the cache misses experienced by the applications. Based on our analysis, we propose a compiler driven Lock-Data Colocation (LDC), targeted at reducing the cache overheads on STM. We show that LDC is effective in improving the cache behavior of STM applications by reducing the dcache miss latency and improving execution time performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We describe a compiler for the Flat Concurrent Prolog language on a message passing multiprocessor architecture. This compiler permits symbolic and declarative programming in the syntax of Guarded Horn Rules, The implementation has been verified and tested on the 64-node PARAM parallel computer developed by C-DAC (Centre for the Development of Advanced Computing, India), Flat Concurrent Prolog (FCP) is a logic programming language designed for concurrent programming and parallel execution, It is a process oriented language, which embodies dataflow synchronization and guarded-command as its basic control mechanisms. An identical algorithm is executed on every processor in the network, We assume regular network topologies like mesh, ring, etc, Each node has a local memory, The algorithm comprises of two important parts: reduction and communication, The most difficult task is to integrate the solutions of problems that arise in the implementation in a coherent and efficient manner. We have tested the efficacy of the compiler on various benchmark problems of the ICOT project that have been reported in the recent book by Evan Tick, These problems include Quicksort, 8-queens, and Prime Number Generation, The results of the preliminary tests are favourable, We are currently examining issues like indexing and load balancing to further optimize our compiler.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The development of microstructure in 316L stainless steel during industrial hot forming operations including press forging (strain rate of 0 . 15 s(-1)), rolling/extrusion (strain rate of 2-8 . 8 s(-1)), and hammer forging (strain rate of 100 s(-1)) at different temperatures in the range 600-1200 degrees C was studied with a view to validating the predictions of the processing map. The results showed that good col relation existed between the regimes indicated in the map and the product microstructures. The 316L stainless steel exhibited unstable flow in the form of flow localisation when hammer forged at temperatures above 900 degrees C, rolled below 1000 degrees C, or press forged below 900 degrees C. All these conditions must therefore be avoided in mechanical processing of the material. Conversely, in order to obtain defect free microstructures, ideally the material should be rolled at temperatures above 1100 degrees C, press forged at temperatures above 1000 degrees C, or hammer forged in the temperature range 600-900 degrees C. (C) 1996 The Institute of Materials.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The prime focus of this study is to design a 50 mm internal diameter diaphragmless shock tube that can be used in an industrial facility for repeated loading of shock waves. The instantaneous rise in pressure and temperature of a medium can be used in a variety of industrial applications. We designed, fabricated and tested three different shock wave generators of which one system employs a highly elastic rubber membrane and the other systems use a fast acting pneumatic valve instead of conventional metal diaphragms. The valve opening speed is obtained with the help of a high speed camera. For shock generation systems with a pneumatic cylinder, it ranges from 0.325 to 1.15 m/s while it is around 8.3 m/s for the rubber membrane. Experiments are conducted using the three diaphragmless systems and the results obtained are analyzed carefully to obtain a relation between the opening speed of the valve and the amount of gas that is actually utilized in the generation of the shock wave for each system. The rubber membrane is not suitable for industrial applications because it needs to be replaced regularly and cannot withstand high driver pressures. The maximum shock Mach number obtained using the new diaphragmless system that uses the pneumatic valve is 2.125 +/- 0.2%. This system shows much promise for automation in an industrial environment.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Just-in-Time (JIT) compilers for Java can be augmented by making use of runtime profile information to produce better quality code and hence achieve higher performance. In a JIT compilation environment, the profile information obtained can be readily exploited in the same run to aid recompilation and optimization of frequently executed (hot) methods. This paper discusses a low overhead path profiling scheme for dynamically profiling AT produced native code. The profile information is used in recompilation during a subsequent invocation of the hot method. During recompilation tree regions along the hot paths are enlarged and instruction scheduling at the superblock level is performed. We have used the open source LaTTe AT compiler framework for our implementation. Our results on a SPARC platform for SPEC JVM98 benchmarks indicate that (i) there is a significant reduction in the number of tree regions along the hot paths, and (ii) profile aided recompilation in LaTTe achieves performance comparable to that of adaptive LaTTe in spite of retranslation and profiling overheads.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A mathematical model has been developed for the gas carburising (diffusion) process using finite volume method. The computer simulation has been carried out for an industrial gas carburising process. The model's predictions are in good agreement with industrial experimental data and with data collected from the literature. A study of various mass transfer and diffusion coefficients has been carried out in order to suggest which correlations should be used for the gas carburising process. The model has been interfaced in a Windows environment using a graphical user interface. In this way, the model is extremely user friendly. The sensitivity analysis of various parameters such as initial carbon concentration in the specimen, carbon potential of the atmosphere, temperature of the process, etc. has been carried out using the model.