Biblioteca Digital

269 resultados para dynamic execution

em Indian Institute of Science - Bangalore - Índia

Implications of program phase behavior on timing analysis

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Knowledge about program worst case execution time (WCET) is essential in validating real-time systems and helps in effective scheduling. One popular approach used in industry is to measure execution time of program components on the target architecture and combine them using static analysis of the program. Measurements need to be taken in the least intrusive way in order to avoid affecting accuracy of estimated WCET. Several programs exhibit phase behavior, wherein program dynamic execution is observed to be composed of phases. Each phase being distinct from the other, exhibits homogeneous behavior with respect to cycles per instruction (CPI), data cache misses etc. In this paper, we show that phase behavior has important implications on timing analysis. We make use of the homogeneity of a phase to reduce instrumentation overhead at the same time ensuring that accuracy of WCET is not largely affected. We propose a model for estimating WCET using static worst case instruction counts of individual phases and a function of measured average CPI. We describe a WCET analyzer built on this model which targets two different architectures. The WCET analyzer is observed to give safe estimates for most benchmarks considered in this paper. The tightness of the WCET estimates are observed to be improved for most benchmarks compared to Chronos, a well known static WCET analyzer.

RETHROTTLE: Execution Throttling in the REDEFINE SoC Architecture

Relevância:

30.00% 30.00%

Publicador:

Resumo:

REDEFINE is a reconfigurable SoC architecture that provides a unique platform for high performance and low power computing by exploiting the synergistic interaction between coarse grain dynamic dataflow model of computation (to expose abundant parallelism in applications) and runtime composition of efficient compute structures (on the reconfigurable computation resources). We propose and study the throttling of execution in REDEFINE to maximize the architecture efficiency. A feature specific fast hybrid (mixed level) simulation framework for early in design phase study is developed and implemented to make the huge design space exploration practical. We do performance modeling in terms of selection of important performance criteria, ranking of the explored throttling schemes and investigate effectiveness of the design space exploration using statistical hypothesis testing. We find throttling schemes which give appreciable (24.8%) overall performance gain in the architecture and 37% resource usage gain in the throttling unit simultaneously.

Molecular Caches: A caching structure for dynamic creation of application-specific Heterogeneous cache regions

Relevância:

30.00% 30.00%

Publicador:

Resumo:

CMPs enable simultaneous execution of multiple applications on the same platforms that share cache resources. Diversity in the cache access patterns of these simultaneously executing applications can potentially trigger inter-application interference, leading to cache pollution. Whereas a large cache can ameliorate this problem, the issues of larger power consumption with increasing cache size, amplified at sub-100nm technologies, makes this solution prohibitive. In this paper in order to address the issues relating to power-aware performance of caches, we propose a caching structure that addresses the following: 1. Definition of application-specific cache partitions as an aggregation of caching units (molecules). The parameters of each molecule namely size, associativity and line size are chosen so that the power consumed by it and access time are optimal for the given technology. 2. Application-Specific resizing of cache partitions with variable and adaptive associativity per cache line, way size and variable line size. 3. A replacement policy that is transparent to the partition in terms of size, heterogeneity in associativity and line size. Through simulation studies we establish the superiority of molecular cache (caches built as aggregations of molecules) that offers a 29% power advantage over that of an equivalently performing traditional cache.

Power Efficient Redundant Execution for Chip Multiprocessor

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes the design of a power efficient microarchitecture for transient fault detection in chip multiprocessors (CMPs) We introduce a new per-core dynamic voltage and frequency scaling (DVFS) algorithm for our architecture that significantly reduces power dissipation for redundant execution with a minimal performance overhead. Using cycle accurate simulation combined with a simple first order power model, we estimate that our architecture reduces dynamic power dissipation in the redundant core by an mean value of 79% and a maximum of 85% with an associated mean performance overhead of only 1:2%

Energy-efficient redundant execution for chip multiprocessors

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Relentless CMOS scaling coupled with lower design tolerances is making ICs increasingly susceptible to wear-out related permanent faults and transient faults, necessitating on-chip fault tolerance in future chip microprocessors (CMPs). In this paper, we describe a power-efficient architecture for redundant execution on chip multiprocessors (CMPs) which when coupled with our per-core dynamic voltage and frequency scaling (DVFS) algorithm significantly reduces the energy overhead of redundant execution without sacrificing performance. Our evaluation shows that this architecture has a performance overhead of only 0.3% and consumes only 1.48 times the energy of a non-fault-tolerant baseline.

Improving Superscalar Instruction Dispatch And Issue By Exploiting Dynamic Code Sequences

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Superscalar processors currently have the potential to fetch multiple basic blocks per cycle by employing one of several recently proposed instruction fetch mechanisms. However, this increased fetch bandwidth cannot be exploited unless pipeline stages further downstream correspondingly improve. In particular,register renaming a large number of instructions per cycle is diDcult. A large instruction window, needed to receive multiple basic blocks per cycle, will slow down dependence resolution and instruction issue. This paper addresses these and related issues by proposing (i) partitioning of the instruction window into multiple blocks, each holding a dynamic code sequence; (ii) logical partitioning of the registerjle into a global file and several local jles, the latter holding registers local to a dynamic code sequence; (iii) the dynamic recording and reuse of register renaming information for registers local to a dynamic code sequence. Performance studies show these mechanisms improve performance over traditional superscalar processors by factors ranging from 1.5 to a little over 3 for the SPEC Integer programs. Next, it is observed that several of the loops in the benchmarks display vector-like behavior during execution, even if the static loop bodies are likely complex for compile-time vectorization. A dynamic loop vectorization mechanism that builds on top of the above mechanisms is briefly outlined. The mechanism vectorizes up to 60% of the dynamic instructions for some programs, albeit the average number of iterations per loop is quite small.

Compiler/Runtime Framework for Dynamic Dataflow Parallelization of Tiled Programs

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Task-parallel languages are increasingly popular. Many of them provide expressive mechanisms for intertask synchronization. For example, OpenMP 4.0 will integrate data-driven execution semantics derived from the StarSs research language. Compared to the more restrictive data-parallel and fork-join concurrency models, the advanced features being introduced into task-parallelmodels in turn enable improved scalability through load balancing, memory latency hiding, mitigation of the pressure on memory bandwidth, and, as a side effect, reduced power consumption. In this article, we develop a systematic approach to compile loop nests into concurrent, dynamically constructed graphs of dependent tasks. We propose a simple and effective heuristic that selects the most profitable parallelization idiom for every dependence type and communication pattern. This heuristic enables the extraction of interband parallelism (cross-barrier parallelism) in a number of numerical computations that range from linear algebra to structured grids and image processing. The proposed static analysis and code generation alleviates the burden of a full-blown dependence resolver to track the readiness of tasks at runtime. We evaluate our approach and algorithms in the PPCG compiler, targeting OpenStream, a representative dataflow task-parallel language with explicit intertask dependences and a lightweight runtime. Experimental results demonstrate the effectiveness of the approach.

Trace Driven Dynamic Deadlock Detection and Reproduction

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dynamic analysis techniques have been proposed to detect potential deadlocks. Analyzing and comprehending each potential deadlock to determine whether the deadlock is feasible in a real execution requires significant programmer effort. Moreover, empirical evidence shows that existing analyses are quite imprecise. This imprecision of the analyses further void the manual effort invested in reasoning about non-existent defects. In this paper, we address the problems of imprecision of existing analyses and the subsequent manual effort necessary to reason about deadlocks. We propose a novel approach for deadlock detection by designing a dynamic analysis that intelligently leverages execution traces. To reduce the manual effort, we replay the program by making the execution follow a schedule derived based on the observed trace. For a real deadlock, its feasibility is automatically verified if the replay causes the execution to deadlock. We have implemented our approach as part of WOLF and have analyzed many large (upto 160KLoC) Java programs. Our experimental results show that we are able to identify 74% of the reported defects as true (or false) positives automatically leaving very few defects for manual analysis. The overhead of our approach is negligible making it a compelling tool for practical adoption.

Genome-Wide Gene Expression Analysis Reveals a Dynamic Interplay between Luteotropic and Luteolytic Factors in the Regulation of Corpus Luteum Function in the Bonnet Monkey (Macaca radiata)

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Although LH is essential for survival and function of the corpus luteum (CL) in higher primates, luteolysis occurs during nonfertile cycles without a discernible decrease in circulating LH levels. Using genome-wide expression analysis, several experiments were performed to examine the processes of luteolysis and rescue of luteal function in monkeys. Induced luteolysis with GnRH receptor antagonist (Cetrorelix) resulted in differential regulation of 3949 genes, whereas replacement with exogenous LH (Cetrorelix plus LH) led to regulation of 4434 genes (1563 down-regulation and 2871 up-regulation). A model system for prostaglandin (PG) F-2 alpha-induced luteolysis in the monkey was standardized and demonstrated that PGF(2 alpha) regulated expression of 2290 genes in the CL. Analysis of the LH-regulated luteal transcriptome revealed that 120 genes were regulated in an antagonistic fashion by PGF(2 alpha). Based on the microarray data, 25 genes were selected for validation by real-time RT-PCR analysis, and expression of these genes was also examined in the CL throughout the luteal phase and from monkeys treated with human chorionic gonadotropin (hCG) to mimic early pregnancy. The results indicated changes in expression of genes favorable to PGF(2 alpha) action during the late to very late luteal phase, and expressions of many of these genes were regulated in an opposite manner by exogenous hCG treatment. Collectively, the findings suggest that curtailment of expression of downstream LH-target genes possibly through PGF(2 alpha) action on the CL is among the mechanisms underlying cross talk between the luteotropic and luteolytic signaling pathways that result in the cessation of luteal function, but hCG is likely to abrogate the PGF(2 alpha)-responsive gene expression changes resulting in luteal rescue crucial for the maintenance of early pregnancy. (Endocrinology 150: 1473-1484, 2009)

Dynamic initiation toughness measu~ment using double Grooved discs

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Among the multitude of test specimen geometries used for dynamic fiacture toughness evaluation, the most widely uscd specimen is lhc Chavpy specimen due its simple geomclry and availability of testing machines. The standard Chatpy specimen dimensions may llOl always give plane st~ain condilions and hence, it may be necessary Io coilduct lcs/s using specimens of dillEvcnt thicknesses to establish the plane strain K~a. An axisymmct/ic specimen, on the otlaev hand would always give flow constraints l~n a nominal specimen thickness i~rcspcctive of the test matctial. The notched disk specimen pVOl)oscd by Bcrn:ud ctal. [1] for static and dynamic initiation toughness measurement although p~ovicles plain-strain conditions, the crack plopagatcs at an angle to the direction of applied load. This makes inteq~retation of the test results difficult us it ~Ccluivcs ~actial slices to be cut fiom the fractured specimen to ascertain the angle o1 crack growth and a linite element model l~)r tl);t{ pa~ticulat ctack o~icntalion.

Dynamic strain ageing in Ni-base superalloy 720Li

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An experimental investigation into the dynamic strain ageing (DSA) of a wrought Ni-base superalloy 720Li was conducted. Characteristics of jerky, flow have been studied at intermediate temperatures of 350, 400 and 450 degrees C at strain-rates between 10(-3) and 10(-5) s(-1). Serrations of Type C are predominant within the temperature/strain-rate range explored. The major characteristics of the serrations-i.e. (a) critical plastic strain for onset of serrations, epsilon(c); (b) average stress decrement, Delta sigma(avg); and (c) strain increment between serrations. Delta epsilon(BS)-have been examined at selected temperatures and strain-rates. Negative strain-rate sensitivity was observed in the DSA regime. However. temperature did not influence tensile properties such as yield strength, ultimate strength. elongation, reduction in area, and work hardening rate or fracture features in DSA regime. Analysis of the results Suggests that locking of the mobile dislocations by substitutional alloying elements is responsible for the DSA in alloy 720Li.

Dynamic analysis of small signal voltage instability decoupled from angle instability

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a methodology for dynamic analysis of short term small signal voltage instability in a multi-machine power system. The formulation of the problem is done by decoupling the angle instability from the voltage instability. The method is based on the incremental reactive current flow network (IRCFN), where the incremental reactive current injection at each bus is related to the incremental voltage magnitude at all the buses. Small signal stability using the eigenvalue analysis is illustrated utilizing a single-machine load bus (SMLB) and three-machine system examples. The role of a static var compensator (SVC) at the load bus is also examined.

Quasi-static and dynamic strain sensing using carbon nanotube/epoxy nanocomposite thin films

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Thin films are developed by dispersing carbon black nanoparticles and carbon nanotubes (CNTs) in an epoxy polymer. The films show a large variation in electrical resistance when subjected to quasi-static and dynamic mechanical loading. This phenomenon is attributed to the change in the band-gap of the CNTs due to the applied strain, and also to the change in the volume fraction of the constituent phases in the percolation network. Under quasi-static loading, the films show a nonlinear response. This nonlinearity in the response of the films is primarily attributed to the pre-yield softening of the epoxy polymer. The electrical resistance of the films is found to be strongly dependent on the magnitude and frequency of the applied dynamic strain, induced by a piezoelectric substrate. Interestingly, the resistance variation is found to be a linear function of frequency and dynamic strain. Samples with a small concentration of just 0.57% of CNT show a sensitivity as high as 2.5% MPa-1 for static mechanical loading. A mathematical model based on Bruggeman's effective medium theory is developed to better understand the experimental results. Dynamic mechanical loading experiments reveal a sensitivity as high as 0.007% Hz(-1) at a constant small-amplitude vibration and up to 0.13%/mu-strain at 0-500 Hz vibration. Potential applications of such thin films include highly sensitive strain sensors, accelerometers, artificial neural networks, artificial skin and polymer electronics.

Dynamic analysis and simulation of a VSC based Back-to-Back HVDC link

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents the modeling and analysis of a voltage source converter (VSC) based back-to-back (BTB) HVDC link. The case study considers the response to changes in the active and reactive power and disturbance caused by single line to ground (SLG) fault. The controllers at each terminal are designed to inject a variable (magnitude and phase angle) sinusoidal, balanced set of voltages to regulate/control the active and reactive power. It is also possible to regulate the converter bus (AC) voltage by controlling the injected reactive power. The analysis is carried out using both d-q model (neglecting the harmonics in the output voltages of VSC) and three phase detailed model of VSC. While the eigenvalue analysis and controller design is based on the d-q model, the transient simulation considers both models.

A Proposal for Concurrent Estimation of Static and Dynamic Nonlinearity of ADC

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Despite great advances in very large scale integrated-circuit design and manufacturing, performance of even the best available high-speed, high-resolution analog-to-digital converter (ADC) is known to deteriorate while acquiring fast-rising, high-frequency, and nonrepetitive waveforms. Waveform digitizers (ADCs) used in high-voltage impulse recordings and measurements are invariably subjected to such waveforms. Errors resulting from a lowered ADC performance can be unacceptably high, especially when higher accuracies have to be achieved (e.g., when part of a reference measuring system). Static and dynamic nonlinearities (estimated independently) are vital indices for evaluating performance and suitability of ADCs to be used in such environments. Typically, the estimation of static nonlinearity involves 10-12 h of time or more (for a 12-b ADC) and the acquisition of millions of samples at high input frequencies for dynamic characterization. ADCs with even higher resolution and faster sampling speeds will soon become available. So, there is a need to reduce testing time for evaluating these parameters. This paper proposes a novel and time-efficient method for the simultaneous estimation of static and dynamic nonlinearity from a single test. This is achieved by conceiving a test signal, comprised of a high-frequency sinusoid (which addresses dynamic assessment) modulated by a low-frequency ramp (relevant to the static part). Details of implementation and results on two digitizers are presented and compared with nonlinearities determined by the existing standardized approaches. Good agreement in results and time savings achievable indicates its suitability.

«
1
2
3
4
5
6
7
8
...
17
18
»