Biblioteca Digital

26 resultados para instruction program

Enhancing Speedup in Network Processing Applications by Exploiting Instruction Reuse with Flow Aggregation

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Instruction reuse is a microarchitectural technique that improves the execution time of a program by removing redundant computations at run-time. Although this is the job of an optimizing compiler, they do not succeed many a time due to limited knowledge of run-time data. In this paper we examine instruction reuse of integer ALU and load instructions in network processing applications. Specifically, this paper attempts to answer the following questions: (1) How much of instruction reuse is inherent in network processing applications?, (2) Can reuse be improved by reducing interference in the reuse buffer?, (3) What characteristics of network applications can be exploited to improve reuse?, and (4) What is the effect of reuse on resource contention and memory accesses? We propose an aggregation scheme that combines the high-level concept of network traffic i.e. "flows" with a low level microarchitectural feature of programs i.e. repetition of instructions and data along with an architecture that exploits temporal locality in incoming packet data to improve reuse. We find that for the benchmarks considered, 1% to 50% of instructions are reused while the speedup achieved varies between 1% and 24%. As a side effect, instruction reuse reduces memory traffic and can therefore be considered as a scheme for low power.

Implications of program phase behavior on timing analysis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Knowledge about program worst case execution time (WCET) is essential in validating real-time systems and helps in effective scheduling. One popular approach used in industry is to measure execution time of program components on the target architecture and combine them using static analysis of the program. Measurements need to be taken in the least intrusive way in order to avoid affecting accuracy of estimated WCET. Several programs exhibit phase behavior, wherein program dynamic execution is observed to be composed of phases. Each phase being distinct from the other, exhibits homogeneous behavior with respect to cycles per instruction (CPI), data cache misses etc. In this paper, we show that phase behavior has important implications on timing analysis. We make use of the homogeneity of a phase to reduce instrumentation overhead at the same time ensuring that accuracy of WCET is not largely affected. We propose a model for estimating WCET using static worst case instruction counts of individual phases and a function of measured average CPI. We describe a WCET analyzer built on this model which targets two different architectures. The WCET analyzer is observed to give safe estimates for most benchmarks considered in this paper. The tightness of the WCET estimates are observed to be improved for most benchmarks compared to Chronos, a well known static WCET analyzer.

Relative roles of instruction count and cycles per instruction in WCET estimation

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Most of the existing WCET estimation methods directly estimate execution time, ET, in cycles. We propose to study ET as a product of two factors, ET = IC * CPI, where IC is instruction count and CPI is cycles per instruction. Considering directly the estimation of ET may lead to a highly pessimistic estimate since implicitly these methods may be using worst case IC and worst case CPI. We hypothesize that there exists a functional relationship between CPI and IC such that CPI=f(IC). This is ascertained by computing the covariance matrix and studying the scatter plots of CPI versus IC. IC and CPI values are obtained by running benchmarks with a large number of inputs using the cycle accurate architectural simulator, Simplescalar on two different architectures. It is shown that the benchmarks can be grouped into different classes based on the CPI versus IC relationship. For some benchmarks like FFT, FIR etc., both IC and CPI are almost a constant irrespective of the input. There are other benchmarks that exhibit a direct or an inverse relationship between CPI and IC. In such a case, one can predict CPI for a given IC as CPI=f(IC). We derive the theoretical worst case IC for a program, denoted as SWIC, using integer linear programming(ILP) and estimate WCET as SWIC*f(SWIC). However, if CPI decreases sharply with IC then measured maximum cycles is observed to be a better estimate. For certain other benchmarks, it is observed that the CPI versus IC relationship is either random or CPI remains constant with varying IC. In such cases, WCET is estimated as the product of SWIC and measured maximum CPI. It is observed that use of the proposed method results in tighter WCET estimates than Chronos, a static WCET analyzer, for most benchmarks for the two architectures considered in this paper.

Compiler-assisted instruction decoder energy optimization for clustered VLIW architectures

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Traditionally, an instruction decoder is designed as a monolithic structure that inhibit the leakage energy optimization. In this paper, we consider a split instruction decoder that enable the leakage energy optimization. We also propose a compiler scheduling algorithm that exploits instruction slack to increase the simultaneous active and idle duration in instruction decoder. The proposed compiler-assisted scheme obtains a further 14.5% reduction of energy consumption of instruction decoder over a hardware-only scheme for a VLIW architecture. The benefits are 17.3% and 18.7% in the context of a 2-clustered and a 4-clustered VLIW architecture respectively.

A microcomputer program to analyze the CD spectrum of proteins and nucleic acids—Use of LOTUS 1-2-3 spread sheet

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A user friendly interactive computer program, CIRDIC, is developed which calculates the molar ellipticity and molar circular dichroic absorption coefficients from the CD spectrum. This, in combination with LOTUS 1-2-3 spread sheet, will give the spectra of above parameters vs wavelength. The code is implemented in MicroSoft FORTRAN 77 which runs on any IBM compatible PC under MSDOS environment.

Development of a structured program for conversion to prenex normal form

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The method of structured programming or program development using a top-down, stepwise refinement technique provides a systematic approach for the development of programs of considerable complexity. The aim of this paper is to present the philosophy of structured programming through a case study of a nonnumeric programming task. The problem of converting a well-formed formula in first-order logic into prenex normal form is considered. The program has been coded in the programming language PASCAL and implemented on a DEC-10 system. The program has about 500 lines of code and comprises 11 procedures.

HELANAL: A program to characterise helix geometry in proteins,

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A detailed analysis of structural and position dependent characteristic features of helices will give a better understanding of the secondary structure formation in globular proteins. Here we describe an algorithm that quantifies the geometry of helices in proteins on the basis of their C-alpha atoms alone. The Fortran program HELANAL can extract the helices from the PDB files and then characterises the overall geometry of each helix as being linear, curved or kinked, in terms of its local structural features, viz. local helical twist and rise, virtual torsion angle, local helix origins and bending angles between successive local helix axes. Even helices with large radius of curvature are unambiguously identified as being linear or curved. The program can also be used to differentiate a kinked helix and other motifs, such as helix-loop-helix or a helix-turn-helix (with a single residue linker) with the help of local bending angles. In addition to these, the program can also be used to characterise the helix start and end as well as other types of secondary structures.

Energy-Efficient Fault Tolerance in Chip Multiprocessors Using Critical Value Forwarding

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Relentless CMOS scaling coupled with lower design tolerances is making ICs increasingly susceptible to wear-out related permanent faults and transient faults, necessitating on-chip fault tolerance in future chip microprocessors (CMPs). In this paper we introduce a new energy-efficient fault-tolerant CMP architecture known as Redundant Execution using Critical Value Forwarding (RECVF). RECVF is based on two observations: (i) forwarding critical instruction results from the leading to the trailing core enables the latter to execute faster, and (ii) this speedup can be exploited to reduce energy consumption by operating the trailing core at a lower voltage-frequency level. Our evaluation shows that RECVF consumes 37% less energy than conventional dual modular redundant (DMR) execution of a program. It consumes only 1.26 times the energy of a non-fault-tolerant baseline and has a performance overhead of just 1.2%.

Toward Effective Scalar Hardware for Highly Vectorizable Applications

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The performance of a program will ultimately be limited by its serial (scalar) portion, as pointed out by Amdahl′s Law. Reported studies thus far of instruction-level parallelism have mixed data-parallel program portions with scalar program portions, often leading to contradictory and controversial results. We report an instruction-level behavioral characterization of scalar code containing minimal data-parallelism, extracted from highly vectorized programs of the PERFECT benchmark suite running on a Cray Y-MP system. We classify scalar basic blocks according to their instruction mix, characterize the data dependencies seen in each class, and, as a first step, measure the maximum intrablock instruction-level parallelism available. We observe skewed rather than balanced instruction distributions in scalar code and in individual basic block classes of scalar code; nonuniform distribution of parallelism across instruction classes; and, as expected, limited available intrablock parallelism. We identify frequently occurring data-dependence patterns and discuss new instructions to reduce latency. Toward effective scalar hardware, we study latency-pipelining trade-offs and restricted multiple instruction issue mechanisms.

Indian Nanoelectronics Users Program An Outreach Vehicle to Expedite Nanoelectronics Research in India

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The worldwide research in nanoelectronics is motivated by the fact that scaling of MOSFETs by conventional top down approach will not continue for ever due to fundamental limits imposed by physics even if it is delayed for some more years. The research community in this domain has largely become multidisciplinary trying to discover novel transistor structures built with novel materials so that semiconductor industry can continue to follow its projected roadmap. However, setting up and running a nanoelectronics facility for research is hugely expensive. Therefore it is a common model to setup a central networked facility that can be shared with large number of users across the research community. The Centres for Excellence in Nanoelectronics (CEN) at Indian Institute of Science, Bangalore (IISc) and Indian Institute of Technology, Bombay (IITB) are such central networked facilities setup with funding of about USD 20 million from the Department of Information Technology (DIT), Ministry of Communications and Information Technology (MCIT), Government of India, in 2005. Indian Nanoelectronics Users Program (INUP) is a missionary program not only to spread awareness and provide training in nanoelectronics but also to provide easy access to the latest facilities at CEN in IISc and at IITB for the wider nanoelectronics research community in India. This program, also funded by MCIT, aims to train researchers by conducting workshops, hands-on training programs, and providing access to CEN facilities. This is a unique program aiming to expedite nanoelectronics research in the country, as the funding for projects required for projects proposed by researchers from around India has prior financial approval from the government and requires only technical approval by the IISc/ IITB team. This paper discusses the objectives of INUP, gives brief descriptions of CEN facilities, the training programs conducted by INUP and list various research activities currently under way in the program.

Improved preprocessing methods for modulo scheduling algorithms

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Instruction scheduling with an automaton-based resource conflict model is well-established for normal scheduling. Such models have been generalized to software pipelining in the modulo-scheduling framework. One weakness with existing methods is that a distinct automaton must be constructed for each combination of a reservation table and initiation interval. In this work, we present a different approach to model conflicts. We construct one automaton for each reservation table which acts as a compact encoding of all the conflict automata for this table, which can be recovered for use in modulo-scheduling. The basic premise of the construction is to move away from the Proebsting-Fraser model of conflict automaton to the Muller model of automaton modelling issue sequences. The latter turns out to be useful and efficient in this situation. Having constructed this automaton, we show how to improve the estimate of resource constrained initiation interval. Such a bound is always better than the average-use estimate. We show that our bound is safe: it is always lower than the true initiation interval. This use of the automaton is orthogonal to its use in modulo-scheduling. Once we generate the required information during pre-processing, we can compute the lower bound for a program without any further reference to the automaton.

A Status-Report On R And D Program On The Preparation Of Single-Crystals Of Yig And Minispheres

Relevância:

20.00% 20.00%

Publicador:

FREP: A Soft-Error Resilient Pipelined RISC Architecture

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Soft error has become one of the major areas of attention with the device scaling and large scale integration. Lot of variants for superscalar architecture were proposed with focus on program re-execution, thread re-execution and instruction re-execution. In this paper we proposed a fault tolerant micro-architecture of pipelined RISC. The proposed architecture, Floating Resources Extended pipeline (FREP), re-executes the instructions using extended pipeline stages. The instructions are re-executed by hybrid architecture with a suitable combination of space and time redundancy.

A Scalable Low Power Store Queue For Large Instruction Window Superscalar processors

Relevância:

20.00% 20.00%

Publicador:

Register Allocation and Optimal Spill code Scheduling in Software Pipelined Loops using 0-1 Integer Linear Programming Formulation

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In achieving higher instruction level parallelism, software pipelining increases the register pressure in the loop. The usefulness of the generated schedule may be restricted to cases where the register pressure is less than the available number of registers. Spill instructions need to be introduced otherwise. But scheduling these spill instructions in the compact schedule is a difficult task. Several heuristics have been proposed to schedule spill code. These heuristics may generate more spill code than necessary, and scheduling them may necessitate increasing the initiation interval. We model the problem of register allocation with spill code generation and scheduling in software pipelined loops as a 0-1 integer linear program. The formulation minimizes the increase in initiation interval (II) by optimally placing spill code and simultaneously minimizes the amount of spill code produced. To the best of our knowledge, this is the first integrated formulation for register allocation, optimal spill code generation and scheduling for software pipelined loops. The proposed formulation performs better than the existing heuristics by preventing an increase in II in 11.11% of the loops and generating 18.48% less spill code on average among the loops extracted from Perfect Club and SPEC benchmarks with a moderate increase in compilation time.

«
1
2
»