972 resultados para cache consistency


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Software transactional memory (STM) is a promising programming paradigm for shared memory multithreaded programs. In order for STMs to be adopted widely for performance critical software, understanding and improving the cache performance of applications running on STM becomes increasingly crucial, as the performance gap between processor and memory continues to grow. In this paper, we present the most detailed experimental evaluation to date, of the cache behavior of STM applications and quantify the impact of the different STM factors on the cache misses experienced by the applications. We find that STMs are not cache friendly, with the data cache stall cycles contributing to more than 50% of the execution cycles in a majority of the benchmarks. We find that on an average, misses occurring inside the STM account for 62% of total data cache miss latency cycles experienced by the applications and the cache performance is impacted adversely due to certain inherent characteristics of the STM itself. The above observations motivate us to propose a set of specific compiler transformations targeted at making the STMs cache friendly. We find that STM's fine grained and application unaware locking is a major contributor to its poor cache behavior. Hence we propose selective Lock Data co-location (LDC) and Redundant Lock Access Removal (RLAR) to address the lock access misses. We find that even transactions that are completely disjoint access parallel, suffer from costly coherence misses caused by the centralized global time stamp updates and hence we propose the Selective Per-Partition Time Stamp (SPTS) transformation to address this. We show that our transformations are effective in improving the cache behavior of STM applications by reducing the data cache miss latency by 20.15% to 37.14% and improving execution time by 18.32% to 33.12% in five of the 8 STAMP applications.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Effective sharing of the last level cache has a significant influence on the overall performance of a multicore system. We observe that existing solutions control cache occupancy at a coarser granularity, do not scale well to large core counts and in some cases lack the flexibility to support a variety of performance goals. In this paper, we propose Probabilistic Shared Cache Management (PriSM), a framework to manage the cache occupancy of different cores at cache block granularity by controlling their eviction probabilities. The proposed framework requires only simple hardware changes to implement, can scale to larger core count and is flexible enough to support a variety of performance goals. We demonstrate the flexibility of PriSM, by computing the eviction probabilities needed to achieve goals like hit-maximization, fairness and QOS. PriSM-HitMax improves performance by 18.7% over LRU and 11.8% over previously proposed schemes in a sixteen core machine. PriSM-Fairness improves fairness over existing solutions by 23.3% along with a performance improvement of 19.0%. PriSM-QOS successfully achieves the desired QOS targets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ampcalculator (AMPC) is a Mathematica (c) based program that was made publicly available some time ago by Unterdorfer and Ecker. It enables the user to compute several processes at one loop (upto O(p(4))) in SU(3) chiral perturbation theory. They include computing matrix elements and form factors for strong and non-leptonic weak processes with at most six external states. It was used to compute some novel processes and was tested against well-known results by the original authors. Here we present the results of several thorough checks of the package. Exhaustive checks performed by the original authors are not publicly available, and hence the present effort. Some new results are obtained from the software especially in the kaon odd-intrinsic parity non-leptonic decay sector involving the coupling G(27). Another illustrative set of amplitudes at tree level we provide is in the context of tau-decays with several mesons including quark mass effects, of use to the BELLE experiment. All eight meson-meson scattering amplitudes have been checked. The Kaon-Compton amplitude has been checked and a minor error in the published results has been pointed out. This exercise is a tutorial-based one, wherein several input and output notebooks are also being made available as ancillary files on the arXiv. Some of the additional notebooks we provide contain explicit expressions that we have used for comparison with established results. The purpose is to encourage users to apply the software to suit their specific needs. An automatic amplitude generator of this type can provide error-free outputs that could be used as inputs for further simplification, and in varied scenarios such as applications of chiral perturbation theory at finite temperature, density and volume. This can also be used by students as a learning aid in low-energy hadron dynamics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The effectiveness of the last-level shared cache is crucial to the performance of a multi-core system. In this paper, we observe and make use of the DelinquentPC - Next-Use characteristic to improve shared cache performance. We propose a new PC-centric cache organization, NUcache, for the shared last level cache of multi-cores. NUcache logically partitions the associative ways of a cache set into MainWays and DeliWays. While all lines have access to the MainWays, only lines brought in by a subset of delinquent PCs, selected by a PC selection mechanism, are allowed to enter the DeliWays. The PC selection mechanism is an intelligent cost-benefit analysis based algorithm that utilizes Next-Use information to select the set of PCs that can maximize the hits experienced in DeliWays. Performance evaluation reveals that NUcache improves the performance over a baseline design by 9.6%, 30% and 33% respectively for dual, quad and eight core workloads comprised of SPEC benchmarks. We also show that NUcache is more effective than other well-known cache-partitioning algorithms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Advances in technology have increased the number of cores and size of caches present on chip multicore platforms(CMPs). As a result, leakage power consumption of on-chip caches has already become a major power consuming component of the memory subsystem. We propose to reduce leakage power consumption in static nonuniform cache architecture(SNUCA) on a tiled CMP by dynamically varying the number of cache slices used and switching off unused cache slices. A cache slice in a tile includes all cache banks present in that tile. Switched-off cache slices are remapped considering the communication costs to reduce cache usage with minimal impact on execution time. This saves leakage power consumption in switched-off L2 cache slices. On an average, there map policy achieves 41% and 49% higher EDP savings compared to static and dynamic NUCA (DNUCA) cache policies on a scalable tiled CMP, respectively.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Past studies use deterministic models to evaluate optimal cache configuration or to explore its design space. However, with the increasing number of components present on a chip multiprocessor (CMP), deterministic approaches do not scale well. Hence, we apply probabilistic genetic algorithms (GA) to determine a near-optimal cache configuration for a sixteen tiled CMP. We propose and implement a faster trace based approach to estimate fitness of a chromosome. It shows up-to 218x simulation speedup over the cycle-accurate architectural simulation. Our methodology can be applied to solve other cache optimization problems such as design space exploration of cache and its partitioning among applications/ virtual machines.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

One of the challenges for accurately estimating Worst Case Execu-tion Time(WCET) of executables is to accurately predict their cache behaviour. Various techniques have been developed to predict the cache contents at different program points to estimate the execution time of memory-accessing instructions. One of the most widely used techniques is Abstract Interpretation based Must Analysis, which de-termines the cache blocks guaranteed to be present in the cache, and hence provides safe estimation of cache hits and misses. However,Must Analysis is highly imprecise, and platforms using Must Analysis have been known to produce blown-up WCET estimates. In our work, we propose to use May Analysis to assist the Must Analysis cache up-date and make it more precise. We prove the safety of our approach as well as provide examples where our Improved Must Analysis provides better precision. Further, we also detect a serious flaw in the original Persistence Analysis, and use Must and May Analysis to assist the Persistence Analysis cache update, to make it safe and more precise than the known solutions to the problem.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cache analysis plays a very important role in obtaining precise Worst Case Execution Time (WCET) estimates of programs for real-time systems. While Abstract Interpretation based approaches are almost universally used for cache analysis, they fail to take advantage of its unique requirement: it is not necessary to find the guaranteed cache behavior that holds across all executions of a program. We only need the cache behavior along one particular program path, which is the path with the maximum execution time. In this work, we introduce the concept of cache miss paths, which allows us to use the worst-case path information to improve the precision of AI-based cache analysis. We use Abstract Interpretation to determine the cache miss paths, and then integrate them in the IPET formulation. An added advantage is that this further allows us to use infeasible path information for cache analysis. Experimentally, our approach gives more precise WCETs as compared to AI-based cache analysis, and we also provide techniques to trade-off analysis time with precision to provide scalability.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Current applications of statistical thermodynamic theories for clathrate hydrates do not incorporate the translational and rotational movement of water molecules of the hydrate lattice,in a rigorous manner. Previous studies have shown that the movement of water molecules has a significant effect on the properties of clathrate hydrates. In this Article, a method is presented to incorporate the effect of water movement with as much rigor as possible. This method is then used to calculate the Langmuir constant of the guest species in a clathrate hydrate. Unlike previous studies on modeling of clathrate hydrate thermodynamics, the method presented in this paper does not regress either the intermolecular potentials or the properties of the empty hydrate from clathrate phase equilibria data. Also the properties of empty hydrate used in the theory do not depend on the nature and composition of the guest molecules. The predicted phase equilibria from the resulting theory are shown to be highly accurate and thermodynamically consistent by comparing them with the phase equilibria computed directly from molecular simulations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we present Bi-Modal Cache - a flexible stacked DRAM cache organization which simultaneously achieves several objectives: (i) improved cache hit ratio, (ii) moving the tag storage overhead to DRAM, (iii) lower cache hit latency than tags-in-SRAM, and (iv) reduction in off-chip bandwidth wastage. The Bi-Modal Cache addresses the miss rate versus off-chip bandwidth dilemma by organizing the data in a bi-modal fashion - blocks with high spatial locality are organized as large blocks and those with little spatial locality as small blocks. By adaptively selecting the right granularity of storage for individual blocks at run-time, the proposed DRAM cache organization is able to make judicious use of the available DRAM cache capacity as well as reduce the off-chip memory bandwidth consumption. The Bi-Modal Cache improves cache hit latency despite moving the metadata to DRAM by means of a small SRAM based Way Locator. Further by leveraging the tremendous internal bandwidth and capacity that stacked DRAM organizations provide, the Bi-Modal Cache enables efficient concurrent accesses to tags and data to reduce hit time. Through detailed simulations, we demonstrate that the Bi-Modal Cache achieves overall performance improvement (in terms of Average Normalized Turnaround Time (ANTT)) of 10.8%, 13.8% and 14.0% in 4-core, 8-core and 16-core workloads respectively.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Animals that hoard food to mediate seasonal deficits in resource availability might be particularly vulnerable to climate-mediated reductions in the quality and accessibility of food during the caching season. Central-place foragers might be additionally impacted by climatic constraints on their already restricted foraging range. Aims: We sought evidence for these patterns in a study of the American pika (Ochotona princeps), a territorial, central-place forager sensitive to climate. Methods: Pika food caches and available forage were re-sampled using historical methods at two long-term study sites, to quantify changes over two decades. Taxa that changed in availability or use were analysed for primary and secondary metabolites. Results: Both sites trended towards warmer summers, and snowmelt trended earlier at the lower latitude site. Graminoid cover increased at each site, and caching trends appeared to reflect available forage rather than primary metabolites. Pikas at the lower latitude site preferred species higher in secondary metabolites, known to provide higher-nutrient winter forage. However, caching of lower-nutrient graminoids increased in proportion with graminoid availability at that site. Conclusions: If our results represent trends in climate, cache quality and available forage, we predict that pikas at the lower latitude site will soon face nutritional deficiencies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Esta dissertação tem por objetivo investigar o papel das slave narratives como poderoso gênero literário na denúncia da escravidão africana e na representação do homem negro e da mulher negra nos séculos dezoito e dezenove. Este trabalho também se propõe a investigar o papel das neo-slave narratives no estudo do passado e a representação da identidade negra no século vinte. Ambos os gêneros desafiam seus tempos presentes ao discutirem questões de etnia e subjugação humana, em uma abordagem crítica. Em Incidents in the Life of a Slave Girl (1861), Harriet Jacobs narra sua experiência na escravidão, deixando um importante legado não somente para a História mas também para a Literatura Afro-Americana. Em Dessa Rose (1986), Sherley Anne Williams, revisa o passado para resgatar a memória da escravidão e reescrever a história para examinar seu tempo presente. Além disso, as duas autoras apresentam questões de gênero, levantando questões feministas em suas obras