6 resultados para IPC
em Indian Institute of Science - Bangalore - Índia
Resumo:
Loads that miss in L1 or L2 caches and waiting for their data at the head of the ROB cause significant slow down in the form of commit stalls. We identify that most of these commit stalls are caused by a small set of loads, referred to as LIMCOS (Loads Incurring Majority of COmmit Stalls). We propose simple history-based classifiers that track commit stalls suffered by loads to help us identify this small set of loads. We study an application of these classifiers to prefetching. The classifiers are used to train the prefetcher to focus on the misses suffered by LIMCOS. This, referred to as focused prefetching, results in a 9.8% gain in IPC over naive GHB based delta correlation prefetcher along with a 20.3% reduction in memory traffic for a set of 17 memory-intensive SPEC2000 benchmarks. Another important impact of focused prefetching is a 61% improvement in the accuracy of prefetches. We demonstrate that the proposed classification criterion performs better than other existing criteria like criticality and delinquent loads. Also we show that the criterion of focusing on commit stalls is robust enough across cache levels and can be applied to any prefetcher without any modifications to the prefetcher.
Resumo:
In this paper we propose a new method of data handling for web servers. We call this method Network Aware Buffering and Caching (NABC for short). NABC facilitates reduction of data copies in web server's data sending path, by doing three things: (1) Layout the data in main memory in a way that protocol processing can be done without data copies (2) Keep a unified cache of data in kernel and ensure safe access to it by various processes and kernel and (3) Pass only the necessary meta data between processes so that bulk data handling time spent during IPC can be reduced. We realize NABC by implementing a set of system calls and an user library. The end product of the implementation is a set of APIs specifically designed for use by the web servers. We port an in house web server called SWEET, to NABC APIs and evaluate performance using a range of workloads both simulated and real. The results show a very impressive gain of 12% to 21% in throughput for static file serving and 1.6 to 4 times gain in throughput for lightweight dynamic content serving for a server using NABC APIs over the one using UNIX APIs.
Resumo:
Large instruction windows and issue queues are key to exploiting greater instruction level parallelism in out-of-order superscalar processors. However, the cycle time and energy consumption of conventional large monolithic issue queues are high. Previous efforts to reduce cycle time segment the issue queue and pipeline wakeup. Unfortunately, this results in significant IPC loss. Other proposals which address energy efficiency issues by avoiding only the unnecessary tag-comparisons do not reduce broadcasts. These schemes also increase the issue latency.To address both these issues comprehensively, we propose the Scalable Lowpower Issue Queue (SLIQ). SLIQ augments a pipelined issue queue with direct indexing to mitigate the problem of delayed wakeups while reducing the cycle time. Also, the SLIQ design naturally leads to significant energy savings by reducing both the number of tag broadcasts and comparisons required.A 2 segment SLIQ incurs an average IPC loss of 0.2% over the entire SPEC CPU2000 suite, while achieving a 25.2% reduction in issue latency when compared to a monolithic 128-entry issue queue for an 8-wide superscalar processor. An 8 segment SLIQ improves scalability by reducing the issue latency by 38.3% while incurring an IPC loss of only 2.3%. Further, the 8 segment SLIQ significantly reduces the energy consumption and energy-delay product by 48.3% and 67.4% respectively on average.
Resumo:
Data Prefetchers identify and make use of any regularity present in the history/training stream to predict future references and prefetch them into the cache. The training information used is typically the primary misses seen at a particular cache level, which is a filtered version of the accesses seen by the cache. In this work we demonstrate that extending the training information to include secondary misses and hits along with primary misses helps improve the performance of prefetchers. In addition to empirical evaluation, we use the information theoretic metric entropy, to quantify the regularity present in extended histories. Entropy measurements indicate that extended histories are more regular than the default primary miss only training stream. Entropy measurements also help corroborate our empirical findings. With extended histories, further benefits can be achieved by triggering prefetches during secondary misses also. In this paper we explore the design space of extended prefetch histories and alternative prefetch trigger points for delta correlation prefetchers. We observe that different prefetch schemes benefit to a different extent with extended histories and alternative trigger points. Also the best performing design point varies on a per-benchmark basis. To meet these requirements, we propose a simple adaptive scheme that identifies the best performing design point for a benchmark-prefetcher combination at runtime. In SPEC2000 benchmarks, using all the L2 accesses as history for prefetcher improves the performance in terms of both IPC and misses reduced over techniques that use only primary misses as history. The adaptive scheme improves the performance of CZone prefetcher over Baseline by 4.6% on an average. These performance gains are accompanied by a moderate reduction in the memory traffic requirements.
Resumo:
A new naphthalene carbohydrazone based dizinc(II) complex has been synthesized and investigated to act as a highly selective fluorescence and visual sensor for a pyrophosphate ion with a quite low detection limit of 155 ppb; this has also been used to detect the pyrophosphate ion released from polymerase-chain-reaction.
Resumo:
Hydrogen bonded complexes formed between the square pyramidal Fe(CO)(5) with HX (X = F, Cl, Br), showing X-H center dot center dot center dot Fe interactions, have been investigated theoretically using density functional theory (DFT) including dispersion correction. Geometry, interaction energy, and large red shift of about 400 cm(-1) in the FIX stretching frequency confirm X-H center dot center dot center dot Fe hydrogen bond formation. In the (CO)(5)Fe center dot center dot center dot HBr complex, following the significant red shift, the HBr stretching mode is coupled with the carbonyl stretching modes. This clearly affects the correlation between frequency shift and binding energy, which is a hallmark of hydrogen bonds. Atoms in Molecule (AIM) theoretical analyses show the presence of a bond critical point between the iron and the hydrogen of FIX and significant mutual penetration. These X-H center dot center dot center dot Fe hydrogen bonds follow most but not all of the eight criteria proposed by Koch and Popelier (J. Phys. Chem. 1995, 99, 9747) based on their investigations on C-H center dot center dot center dot O hydrogen bonds. Natural bond orbital (NBO) analysis indicates charge transfer from the organometallic system to the hydrogen bond donor. However, there is no correlation between the extent of charge transfer and interaction,energy, contrary to what is proposed in the recent IUPAC recommendation (Pure Appl.. Chem. 2011, 83, 1637). The ``hydrogen bond radius'' for iron has been determined to be 1.60 +/- 0.02 angstrom, and not surprisingly it is between the covalent (127 angstrom) and van der Waals (2.0) radii of Fe. DFT and AIM theoretical studies reveal that Fe in square pyramidal Fe(CO)(5) can also form halogen bond with CIF and ClH as ``halogen bond donor''. Both these complexes show mutual penetration as well, though the Fe center dot center dot center dot Cl distance is closer to the sum of van der Waals radii of Fe and Cl in (CO)5Fe center dot center dot center dot ClH, and it is about 1 angstrom less in (CO)(5)Fe center dot center dot center dot ClF.