55 resultados para Cache Memories


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cache analysis plays a very important role in obtaining precise Worst Case Execution Time (WCET) estimates of programs for real-time systems. While Abstract Interpretation based approaches are almost universally used for cache analysis, they fail to take advantage of its unique requirement: it is not necessary to find the guaranteed cache behavior that holds across all executions of a program. We only need the cache behavior along one particular program path, which is the path with the maximum execution time. In this work, we introduce the concept of cache miss paths, which allows us to use the worst-case path information to improve the precision of AI-based cache analysis. We use Abstract Interpretation to determine the cache miss paths, and then integrate them in the IPET formulation. An added advantage is that this further allows us to use infeasible path information for cache analysis. Experimentally, our approach gives more precise WCETs as compared to AI-based cache analysis, and we also provide techniques to trade-off analysis time with precision to provide scalability.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we present Bi-Modal Cache - a flexible stacked DRAM cache organization which simultaneously achieves several objectives: (i) improved cache hit ratio, (ii) moving the tag storage overhead to DRAM, (iii) lower cache hit latency than tags-in-SRAM, and (iv) reduction in off-chip bandwidth wastage. The Bi-Modal Cache addresses the miss rate versus off-chip bandwidth dilemma by organizing the data in a bi-modal fashion - blocks with high spatial locality are organized as large blocks and those with little spatial locality as small blocks. By adaptively selecting the right granularity of storage for individual blocks at run-time, the proposed DRAM cache organization is able to make judicious use of the available DRAM cache capacity as well as reduce the off-chip memory bandwidth consumption. The Bi-Modal Cache improves cache hit latency despite moving the metadata to DRAM by means of a small SRAM based Way Locator. Further by leveraging the tremendous internal bandwidth and capacity that stacked DRAM organizations provide, the Bi-Modal Cache enables efficient concurrent accesses to tags and data to reduce hit time. Through detailed simulations, we demonstrate that the Bi-Modal Cache achieves overall performance improvement (in terms of Average Normalized Turnaround Time (ANTT)) of 10.8%, 13.8% and 14.0% in 4-core, 8-core and 16-core workloads respectively.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Animals that hoard food to mediate seasonal deficits in resource availability might be particularly vulnerable to climate-mediated reductions in the quality and accessibility of food during the caching season. Central-place foragers might be additionally impacted by climatic constraints on their already restricted foraging range. Aims: We sought evidence for these patterns in a study of the American pika (Ochotona princeps), a territorial, central-place forager sensitive to climate. Methods: Pika food caches and available forage were re-sampled using historical methods at two long-term study sites, to quantify changes over two decades. Taxa that changed in availability or use were analysed for primary and secondary metabolites. Results: Both sites trended towards warmer summers, and snowmelt trended earlier at the lower latitude site. Graminoid cover increased at each site, and caching trends appeared to reflect available forage rather than primary metabolites. Pikas at the lower latitude site preferred species higher in secondary metabolites, known to provide higher-nutrient winter forage. However, caching of lower-nutrient graminoids increased in proportion with graminoid availability at that site. Conclusions: If our results represent trends in climate, cache quality and available forage, we predict that pikas at the lower latitude site will soon face nutritional deficiencies.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Loads that miss in L1 or L2 caches and waiting for their data at the head of the ROB cause significant slow down in the form of commit stalls. We identify that most of these commit stalls are caused by a small set of loads, referred to as LIMCOS (Loads Incurring Majority of COmmit Stalls). We propose simple history-based classifiers that track commit stalls suffered by loads to help us identify this small set of loads. We study an application of these classifiers to prefetching. The classifiers are used to train the prefetcher to focus on the misses suffered by LIMCOS. This, referred to as focused prefetching, results in a 9.8% gain in IPC over naive GHB based delta correlation prefetcher along with a 20.3% reduction in memory traffic for a set of 17 memory-intensive SPEC2000 benchmarks. Another important impact of focused prefetching is a 61% improvement in the accuracy of prefetches. We demonstrate that the proposed classification criterion performs better than other existing criteria like criticality and delinquent loads. Also we show that the criterion of focusing on commit stalls is robust enough across cache levels and can be applied to any prefetcher without any modifications to the prefetcher.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We report here that the structural origin of an easily reversible Ge15Te83Si2 glass can be a promising candidate for phase change random access memories. In situ Raman scattering studies on Ge15Te83Si2 sample, undertaken during the amorphous set and reset processes, indicate that the degree of disorder in the glass is reduced from off to set state. It is also found that the local structure of the sample under reset condition is similar to that in the amorphous off state. Electron microscopic studies on switched samples indicate the formation of nanometric sized particles of c-SiTe2 structure. ©2009 American Institute of Physics

Relevância:

10.00% 10.00%

Publicador:

Resumo:

An Autonomous Line Scanning Unit (ALSU) for completely autonomous detection of call originations in the SPC Telephone Switching System is described. Through its own memories, ALSU maintains an up-to-date record of subscribers' statuses, detects call originations, performs 'hit timing check' and informs the Switching System of the identity of calling subscribers. The ALSU needs minimum interaction with the Central Processor, resulting in increased call handling capacity

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Individuals in distress emit audible vocalizations to either warn or inform conspecifics. The Indian short-nosed fruit bat, Cynopterus sphinx, emits distress calls soon after becoming entangled in mist nets, which appear to attract conspecifics. Phase I of these distress calls is longer and louder, and includes a secondary peak, compared to phase II. Activity-dependent expression of egr-1 was examined in free-ranging C. sphinx following the emissions and responses to a distress call. We found that the level of expression of egr-1 was higher in bats that emitted a distress call, in adults that responded, and in pups than in silent bats. Up-regulated cDNA was amplified to identify the target gene (TOE1) of the protein Egr-1. The observed expression pattern Toe1 was similar to that of egr-1. These findings suggest that the neuronal activity related to recognition of a distress call and an auditory feedback mechanism induces the expression of Egr-1. Co-expression of egr-1 with Toe1 may play a role in initial triggering of the genetic mechanism that could be involved in the consolidation or stabilization of distress call memories.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A major concern of embedded system architects is the design for low power. We address one aspect of the problem in this paper, namely the effect of executable code compression. There are two benefits of code compression – firstly, a reduction in the memory footprint of embedded software, and secondly, potential reduction in memory bus traffic and power consumption. Since decompression has to be performed at run time it is achieved by hardware. We describe a tool called COMPASS which can evaluate a range of strategies for any given set of benchmarks and display compression ratios. Also, given an execution trace, it can compute the effect on bus toggles, and cache misses for a range of compression strategies. The tool is interactive and allows the user to vary a set of parameters, and observe their effect on performance. We describe an implementation of the tool and demonstrate its effectiveness. To the best of our knowledge this is the first tool proposed for such a purpose.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Designing an ultrahigh density linear superlattice array consisting of periodic blocks of different semiconductors in the strong confinement regime via a direct synthetic route remains an unachieved challenge in nanotechnology. We report a general synthesis route for the formulation of a large-area ultrahigh density superlattice array that involves adjoining multiple units of ZnS rods by prolate US particles at the tips. A single one-dimensional wire is 300-500 nm long and consists of periodic quantum wells with a barrier width of 5 nm provided by ZnS and a well width of 1-2 nm provided by CdS, defining a superlattice structure. The synthesis route allows for tailoring of ultranarrow laserlike emissions (fwhm approximate to 125 meV) originating from strong interwell energy dispersion along with control of the width, pitch, and registry of the superlattice assembly. Such an exceptional high-density superlattice array could form the basis of ultrahigh density memories in addition to offering opportunities for technological advancement in conventional heterojunction-based device applications.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

CD-ROMs have proliferated as a distribution media for desktop machines for a large variety of multimedia applications (targeted for a single-user environment) like encyclopedias, magazines and games. With CD-ROM capacities up to 3 GB being available in the near future, they will form an integral part of Video on Demand (VoD) servers to store full-length movies and multimedia. In the first section of this paper we look at issues related to the single- user desktop environment. Since these multimedia applications are highly interactive in nature, we take a pragmatic approach, and have made a detailed study of the multimedia application behavior in terms of the I/O request patterns generated to the CD-ROM subsystem by tracing these patterns. We discuss prefetch buffer design and seek time characteristics in the context of the analysis of these traces. We also propose an adaptive main-memory hosted cache that receives caching hints from the application to reduce the latency when the user moves from one node of the hyper graph to another. In the second section we look at the use of CD-ROM in a VoD server and discuss the problem of scheduling multiple request streams and buffer management in this scenario. We adapt the C-SCAN (Circular SCAN) algorithm to suit the CD-ROM drive characteristics and prove that it is optimal in terms of buffer size management. We provide computationally inexpensive relations by which this algorithm can be implemented. We then propose an admission control algorithm which admits new request streams without disrupting the continuity of playback of the previous request streams. The algorithm also supports operations such as fast forward and replay. Finally, we discuss the problem of optimal placement of MPEG streams on CD-ROMs in the third section.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A number of neural network models, in which fixed-point and limit-cycle attractors of the underlying dynamics are used to store and associatively recall information, are described. In the first class of models, a hierarchical structure is used to store an exponentially large number of strongly correlated memories. The second class of models uses limit cycles to store and retrieve individual memories. A neurobiologically plausible network that generates low-amplitude periodic variations of activity, similar to the oscillations observed in electroencephalographic recordings, is also described. Results obtained from analytic and numerical studies of the properties of these networks are discussed.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We describe the design of a directory-based shared memory architecture on a hierarchical network of hypercubes. The distributed directory scheme comprises two separate hierarchical networks for handling cache requests and transfers. Further, the scheme assumes a single address space and each processing element views the entire network as contiguous memory space. The size of individual directories stored at each node of the network remains constant throughout the network. Although the size of the directory increases with the network size, the architecture is scalable. The results of the analytical studies demonstrate superior performance characteristics of our scheme compared with those of other schemes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A simple route for tailoring emissions in the visible wavelength region by chemically coupling quantum dots composed of ZnSe and CdS is reported. coupled quantum dots offer a novel route for tuning electronic transitions via band-offset engineering at the material interface. This novel class of asymmetric. coupled quantum structures may offer a basis for a diverse set of building blocks for optoelectronic devices, ultrahigh density memories, and quantum information processing.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Simulation is an important means of evaluating new microarchitectures. With the invention of multi-core (CMP) platforms, simulators are becoming larger and more complex. However, with the availability of CMPs with larger caches and higher operating frequency, the wall clock time required for simulating an application has become comparatively shorter. Reducing this simulation time further is a great challenge, especially in the case of multi-threaded workload due to indeterminacy introduced due to simultaneously executing various threads. In this paper, we propose a technique for speeding multi-core simulation. The model of the processor core and cache are replaced with functional models, to achieve speedup. A timed Petri net model is used to estimate the execution time of the processor and the memory access latencies are estimated using hit/miss information obtained from the functional model of the cache. This model can be used to predict performance of data parallel applications or multiprogramming workload on CMP platform with various cache hierarchies and shared bus interconnect. The error in estimation of the execution time of an application is within 6%. The speedup achieved ranges between an average of 2x--4x over the cycle accurate simulator.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we propose a new method of data handling for web servers. We call this method Network Aware Buffering and Caching (NABC for short). NABC facilitates reduction of data copies in web server's data sending path, by doing three things: (1) Layout the data in main memory in a way that protocol processing can be done without data copies (2) Keep a unified cache of data in kernel and ensure safe access to it by various processes and kernel and (3) Pass only the necessary meta data between processes so that bulk data handling time spent during IPC can be reduced. We realize NABC by implementing a set of system calls and an user library. The end product of the implementation is a set of APIs specifically designed for use by the web servers. We port an in house web server called SWEET, to NABC APIs and evaluate performance using a range of workloads both simulated and real. The results show a very impressive gain of 12% to 21% in throughput for static file serving and 1.6 to 4 times gain in throughput for lightweight dynamic content serving for a server using NABC APIs over the one using UNIX APIs.