100 resultados para cache placement

em Indian Institute of Science - Bangalore - Índia


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Packet forwarding is a memory-intensive application requiring multiple accesses through a trie structure. With the requirement to process packets at line rates, high-performance routers need to forward millions of packets every second with each packet needing up to seven memory accesses. Earlier work shows that a single cache for the nodes of a trie can reduce the number of external memory accesses. It is observed that the locality characteristics of the level-one nodes of a trie are significantly different from those of lower level nodes. Hence, we propose a heterogeneously segmented cache architecture (HSCA) which uses separate caches for level-one and lower level nodes, each with carefully chosen sizes. Besides reducing misses, segmenting the cache allows us to focus on optimizing the more frequently accessed level-one node segment. We find that due to the nonuniform distribution of nodes among cache sets, the level-one nodes cache is susceptible t high conflict misses. We reduce conflict misses by introducing a novel two-level mapping-based cache placement framework. We also propose an elegant way to fit the modified placement function into the cache organization with minimal increase in access time. Further, we propose an attribute preserving trace generation methodology which emulates real traces and can generate traces with varying locality. Performanc results reveal that our HSCA scheme results in a 32 percent speedup in average memory access time over a unified nodes cache. Also, HSC outperforms IHARC, a cache for lookup results, with as high as a 10-fold speedup in average memory access time. Two-level mappin further enhances the performance of the base HSCA by up to 13 percent leading to an overall improvement of up to 40 percent over the unified scheme.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Packet forwarding is a memory-intensive application requiring multiple accesses through a trie structure. The efficiency of a cache for this application critically depends on the placement function to reduce conflict misses. Traditional placement functions use a one-level mapping that naively partitions trie-nodes into cache sets. However, as a significant percentage of trie nodes are not useful, these schemes suffer from a non-uniform distribution of useful nodes to sets. This in turn results in increased conflict misses. Newer organizations such as variable associativity caches achieve flexibility in placement at the expense of increased hit-latency. This makes them unsuitable for L1 caches.We propose a novel two-level mapping framework that retains the hit-latency of one-level mapping yet incurs fewer conflict misses. This is achieved by introducing a secondlevel mapping which reorganizes the nodes in the naive initial partitions into refined partitions with near-uniform distribution of nodes. Further as this remapping is accomplished by simply adapting the index bits to a given routing table the hit-latency is not affected. We propose three new schemes which result in up to 16% reduction in the number of misses and 13% speedup in memory access time. In comparison, an XOR-based placement scheme known to perform extremely well for general purpose architectures, can obtain up to 2% speedup in memory access time.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Based on trial interchanges, this paper develops three algorithms for the solution of the placement problem of logic modules in a circuit. A significant decrease in the computation time of such placement algorithms can be achieved by restricting the trial interchanges to only a subset of all the modules in a circuit. The three algorithms are simulated on a DEC 1090 system in Pascal and the performance of these algorithms in terms of total wirelength and computation time is compared with the results obtained by Steinberg, for the 34-module backboard wiring problem. Performance analysis of the first two algorithms reveals that algorithms based on pairwise trial interchanges (2 interchanges) achieve a desired placement faster than the algorithms based on trial N interchanges. The first two algorithms do not perform better than Steinberg's algorithm1, whereas the third algorithm based on trial pairwise interchange among unconnected pairs of modules (UPM) and connected pairs of modules (CPM) performs better than Steinberg's algorithm, both in terms of total wirelength (TWL) and computation time.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper is concerned with the development of an algorithm for pole placement in multi-input dynamic systems. The algorithm which uses a series of elementary transformations is believed to be simpler, computationally more efficient and numerically stable when compared with earlier methods. In this paper two methods have been presented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper is concerned with the development of an algorithm for pole placement in multi-input dynamic systems. The algorithm which uses a series of elementary transformations is believed to be simpler, computationally more efficient and numerically stable when compared with earlier methods. In this paper two methods have been presented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Printed Circuit Board (PCB) layout design is one of the most important and time consuming phases during equipment design process in all electronic industries. This paper is concerned with the development and implementation of a computer aided PCB design package. A set of programs which operate on a description of the circuit supplied by the user in the form of a data file and subsequently design the layout of a double-sided PCB has been developed. The algorithms used for the design of the PCB optimise the board area and the length of copper tracks used for the interconnections. The output of the package is the layout drawing of the PCB, drawn on a CALCOMP hard copy plotter and a Tektronix 4012 storage graphics display terminal. The routing density (the board area required for one component) achieved by this package is typically 0.8 sq. inch per IC. The package is implemented on a DEC 1090 system in Pascal and FORTRAN and SIGN(1) graphics package is used for display generation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

CMPs enable simultaneous execution of multiple applications on the same platforms that share cache resources. Diversity in the cache access patterns of these simultaneously executing applications can potentially trigger inter-application interference, leading to cache pollution. Whereas a large cache can ameliorate this problem, the issues of larger power consumption with increasing cache size, amplified at sub-100nm technologies, makes this solution prohibitive. In this paper in order to address the issues relating to power-aware performance of caches, we propose a caching structure that addresses the following: 1. Definition of application-specific cache partitions as an aggregation of caching units (molecules). The parameters of each molecule namely size, associativity and line size are chosen so that the power consumed by it and access time are optimal for the given technology. 2. Application-Specific resizing of cache partitions with variable and adaptive associativity per cache line, way size and variable line size. 3. A replacement policy that is transparent to the partition in terms of size, heterogeneity in associativity and line size. Through simulation studies we establish the superiority of molecular cache (caches built as aggregations of molecules) that offers a 29% power advantage over that of an equivalently performing traditional cache.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a novel algorithm for placement of standard cells in VLSI circuits based on an analogy of this problem with neural networks. By employing some of the organising principles of these nets, we have attempted to improve the behaviour of the bipartitioning method as proposed by Kernighan and Lin. Our algorithm yields better quality placements compared with the above method, and also makes the final placement independent of the initial partition.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Massively parallel SIMD computing is applied to obtain an order of magnitude improvement in the executional speed of an important algorithm in VLSI design automation. The physical design of a VLSI circuit involves logic module placement as a subtask. The paper is concerned with accelerating the well known Min-cut placement technique for logic cell placement. The inherent parallelism of the Min-cut algorithm is identified, and it is shown that a parallel machine based on the efficient execution of the placement procedure.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A procedure to evaluate surface-to-air missile battery placement patterns for air defense is presented. A measure of defense effectiveness is defined as a function of kill probability of the defense missiles and the nature of the surrounding terrain features. The concept of cumulative danger index is used to select the best path for a penetrating hostile aircraft for any given pattern of placement. The aircraft is assumed to be intelligent and well-informed. The path is generated using a dynamic programming methodology. The software package so developed can be used off-line to choose the best among a number of possible battery placement patterns.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Software transactional memory (STM) is a promising programming paradigm for shared memory multithreaded programs as an alternative to traditional lock based synchronization. However adoption of STM in mainstream software has been quite low due to its considerable overheads and its poor cache/memory performance. In this paper, we perform a detailed study of the cache behavior of STM applications and quantify the impact of different STM factors on the cache misses experienced by the applications. Based on our analysis, we propose a compiler driven Lock-Data Colocation (LDC), targeted at reducing the cache overheads on STM. We show that LDC is effective in improving the cache behavior of STM applications by reducing the dcache miss latency and improving execution time performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this work, we propose a new organization for the last level shared cache of a rnulticore system. Our design is based on the observation that the Next-Use distance, measured in terms of intervening misses between the eviction of a line and its next use, for lines brought in by a given delinquent PC falls within a predictable range of values. We exploit this correlation to improve the performance of shared caches in multi-core architectures by proposing the NUcache organization.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Increasing network lifetime is important in wireless sensor/ad-hoc networks. In this paper, we are concerned with algorithms to increase network lifetime and amount of data delivered during the lifetime by deploying multiple mobile base stations in the sensor network field. Specifically, we allow multiple mobile base stations to be deployed along the periphery of the sensor network field and develop algorithms to dynamically choose the locations of these base stations so as to improve network lifetime. We propose energy efficient low-complexity algorithms to determine the locations of the base stations; they include i) Top-K-max algorithm, ii) maximizing the minimum residual energy (Max-Min-RE) algorithm, and iii) minimizing the residual energy difference (MinDiff-RE) algorithm. We show that the proposed base stations placement algorithms provide increased network lifetimes and amount of data delivered during the network lifetime compared to single base station scenario as well as multiple static base stations scenario, and close to those obtained by solving an integer linear program (ILP) to determine the locations of the mobile base stations. We also investigate the lifetime gain when an energy aware routing protocol is employed along with multiple base stations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we present a cache coherence protocol for multistage interconnection network (MIN)-based multiprocessors with two distinct private caches: private-blocks caches (PCache) containing blocks private to a process and shared-blocks caches (SCache) containing data accessible by all processes. The architecture is extended by a coherence control bus connecting all shared-block cache controllers. Timing problems due to variable transit delays through the MIN are dealt with by introducing Transient states in the proposed cache coherence protocol. The impact of the coherence protocol on system performance is evaluated through a performance study of three phases. Assuming homogeneity of all nodes, a single-node queuing model (phase 3) is developed to analyze system performance. This model is solved for processor and coherence bus utilizations using the mean value analysis (MVA) technique with shared-blocks steady state probabilities (phase 1) and communication delays (phase 2) as input parameters. The performance of our system is compared to that of a system with an equivalent-sized unified cache and with a multiprocessor implementing a directory-based coherence protocol. System performance measures are verified through simulation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Let M be an m-sided simple polygon and N be an n-sided polygon with holes. In this paper we consider the problem of computing the feasible region, i.e., the set of all placements by translation of M so that M lies inside N without intersecting any hole. First we propose an O (mn(2)) time algorithm for computing the feasible region for the case when M is a monotone polygon. Then we consider the general case when M is a simple polygon and propose an O(m(2)n(2)) time algorithm for computing the feasible region. Both algorithms are optimal upto a constant factor.