56 resultados para cache coherence protocols

em Indian Institute of Science - Bangalore - Índia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we present a cache coherence protocol for multistage interconnection network (MIN)-based multiprocessors with two distinct private caches: private-blocks caches (PCache) containing blocks private to a process and shared-blocks caches (SCache) containing data accessible by all processes. The architecture is extended by a coherence control bus connecting all shared-block cache controllers. Timing problems due to variable transit delays through the MIN are dealt with by introducing Transient states in the proposed cache coherence protocol. The impact of the coherence protocol on system performance is evaluated through a performance study of three phases. Assuming homogeneity of all nodes, a single-node queuing model (phase 3) is developed to analyze system performance. This model is solved for processor and coherence bus utilizations using the mean value analysis (MVA) technique with shared-blocks steady state probabilities (phase 1) and communication delays (phase 2) as input parameters. The performance of our system is compared to that of a system with an equivalent-sized unified cache and with a multiprocessor implementing a directory-based coherence protocol. System performance measures are verified through simulation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Software transactional memory (STM) is a promising programming paradigm for shared memory multithreaded programs. In order for STMs to be adopted widely for performance critical software, understanding and improving the cache performance of applications running on STM becomes increasingly crucial, as the performance gap between processor and memory continues to grow. In this paper, we present the most detailed experimental evaluation to date, of the cache behavior of STM applications and quantify the impact of the different STM factors on the cache misses experienced by the applications. We find that STMs are not cache friendly, with the data cache stall cycles contributing to more than 50% of the execution cycles in a majority of the benchmarks. We find that on an average, misses occurring inside the STM account for 62% of total data cache miss latency cycles experienced by the applications and the cache performance is impacted adversely due to certain inherent characteristics of the STM itself. The above observations motivate us to propose a set of specific compiler transformations targeted at making the STMs cache friendly. We find that STM's fine grained and application unaware locking is a major contributor to its poor cache behavior. Hence we propose selective Lock Data co-location (LDC) and Redundant Lock Access Removal (RLAR) to address the lock access misses. We find that even transactions that are completely disjoint access parallel, suffer from costly coherence misses caused by the centralized global time stamp updates and hence we propose the Selective Per-Partition Time Stamp (SPTS) transformation to address this. We show that our transformations are effective in improving the cache behavior of STM applications by reducing the data cache miss latency by 20.15% to 37.14% and improving execution time by 18.32% to 33.12% in five of the 8 STAMP applications.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recent measurements on the resistivity of (La-Sr)(2)CuO4 are shown to tit within the general framework of Luttinger liquid transport theory. They exhibit a crossover from the spin-charge separated ''holon nondrag regime'' usually observed, with rho(ab) similar to T, to a ''localizing'' regime dominated by impurity scattering at low temperature. The proportionality of rho(c) and rho(ab) and the giant anisotropy follow directly from the theory.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A generalized pulse pair has been suggested in which the longitudinal spin order is retained and the transverse components cancelled by random variation of the interval between pulses, in successive applications of the two-dimensional NMR algorithm. This method leads to pure phases and has been exploited to provide a simpler scheme for two-spin filtering and for pure phase spectroscopy in multiple-quantum-filtered two-dimensional NMR experiments.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cooperative relay communication in a fading channel environment under the orthogonal amplify-and-forward (OAF), nonorthogonal and orthogonal selection decode-and-forward (NSDF and OSDF) protocols is considered here. The diversity-multiplexing gain tradeoff (DMT) of the three protocols is determined and DMT-optimal distributed space-time (ST) code constructions are provided. The codes constructed are sphere decodable and in some instances incur minimum possible delay. Included in our results is the perhaps surprising finding that the orthogonal and the nonorthogonal amplify-and-forward (NAF) protocols have identical DMT when the time durations of the broadcast and cooperative phases are optimally chosen to suit the respective protocol. Moreover our code construction for the OAF protocol incurs less delay. Two variants of the NSDF protocol are considered: fixed-NSDF and variable-NSDF protocol. In the variable-NSDF protocol, the fraction of time occupied by the broadcast phase is allowed to vary with multiplexing gain. The variable-NSDF protocol is shown to improve on the DMT of the best previously known static protocol when the number of relays is greater than two. Also included is a DMT optimal code construction for the NAF protocol.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Packet forwarding is a memory-intensive application requiring multiple accesses through a trie structure. With the requirement to process packets at line rates, high-performance routers need to forward millions of packets every second with each packet needing up to seven memory accesses. Earlier work shows that a single cache for the nodes of a trie can reduce the number of external memory accesses. It is observed that the locality characteristics of the level-one nodes of a trie are significantly different from those of lower level nodes. Hence, we propose a heterogeneously segmented cache architecture (HSCA) which uses separate caches for level-one and lower level nodes, each with carefully chosen sizes. Besides reducing misses, segmenting the cache allows us to focus on optimizing the more frequently accessed level-one node segment. We find that due to the nonuniform distribution of nodes among cache sets, the level-one nodes cache is susceptible t high conflict misses. We reduce conflict misses by introducing a novel two-level mapping-based cache placement framework. We also propose an elegant way to fit the modified placement function into the cache organization with minimal increase in access time. Further, we propose an attribute preserving trace generation methodology which emulates real traces and can generate traces with varying locality. Performanc results reveal that our HSCA scheme results in a 32 percent speedup in average memory access time over a unified nodes cache. Also, HSC outperforms IHARC, a cache for lookup results, with as high as a 10-fold speedup in average memory access time. Two-level mappin further enhances the performance of the base HSCA by up to 13 percent leading to an overall improvement of up to 40 percent over the unified scheme.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We address the issue of noise robustness of reconstruction techniques for frequency-domain optical-coherence tomography (FDOCT). We consider three reconstruction techniques: Fourier, iterative phase recovery, and cepstral techniques. We characterize the reconstructions in terms of their statistical bias and variance and obtain approximate analytical expressions under the assumption of small noise. We also perform Monte Carlo analyses and show that the experimental results are in agreement with the theoretical predictions. It turns out that the iterative and cepstral techniques yield reconstructions with a smaller bias than the Fourier method. The three techniques, however, have identical variance profiles, and their consistency increases linearly as a function of the signal-to-noise ratio.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Effect of disorder on the electrical resistance near the superconducting transition temperature in the paracoherence region of high temperature YBa2CU3O7-delta (YBCO) thin film superconductor is reported. For this, c-axis oriented YBa2Cu3O7-delta thin films having superconducting transition width varying between 0.27 K and 6 K were deposited using laser ablation and high pressure oxygen sputtering techniques. Disorder in these films was further created by using 100 MeV oxygen and 200 MeV silver ions with varying fluences. It is observed that the critical exponent in the paracoherence region for films with high transition temperature and small transition width is in agreement with the theoretically predicted value (gamma = 1.33) and is not affected by disorder, while for films with lower transition temperature and larger transition width the value of exponent is much larger as compared to that theoretically predicted and it varies from sample to sample and usually changes with disorder induced by radiation. This difference in the behaviour of the exponent has been explained on the basis of differences in the strength of weak links and the transition between temperatures T. and T, is interpreted as a percolation like transition with disorder. (c) 2006 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An analysis based on coherence theory is presented, which explains the experimentally observed rotation sensitivity of the contrast of Lau fringes obtained under spatially incoherent illumination.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this thesis work, we design rigorous and efficient protocols/mechanisms for different types of wireless networks using a mechanism design [1] and game theoretic approach [2]. Our work can broadly be viewed in two parts. In the first part, we concentrate on ad hoc wireless networks [3] and [4]. In particular, we consider broadcast in these networks where each node is owned by independent and selfish users. Being selfish, these nodes do not forward the broadcast packets. All existing protocols for broadcast assume that nodes forward the transit packets. So, there is need for developing new broadcast protocols to overcome node selfishness. In our paper [5], we develop a strategy proof pricing mechanism which we call immediate predecessor node pricing mechanism (IPNPM) and an efficient new broadcast protocol based on IPNPM. We show the efficacy of our proposed broadcast protocol using simulation results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

CMPs enable simultaneous execution of multiple applications on the same platforms that share cache resources. Diversity in the cache access patterns of these simultaneously executing applications can potentially trigger inter-application interference, leading to cache pollution. Whereas a large cache can ameliorate this problem, the issues of larger power consumption with increasing cache size, amplified at sub-100nm technologies, makes this solution prohibitive. In this paper in order to address the issues relating to power-aware performance of caches, we propose a caching structure that addresses the following: 1. Definition of application-specific cache partitions as an aggregation of caching units (molecules). The parameters of each molecule namely size, associativity and line size are chosen so that the power consumed by it and access time are optimal for the given technology. 2. Application-Specific resizing of cache partitions with variable and adaptive associativity per cache line, way size and variable line size. 3. A replacement policy that is transparent to the partition in terms of size, heterogeneity in associativity and line size. Through simulation studies we establish the superiority of molecular cache (caches built as aggregations of molecules) that offers a 29% power advantage over that of an equivalently performing traditional cache.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Results of a theoretical study on ultrasonic attenuation and NMR relaxation in excitonic insulators are reported. The transition rates derived have anomalous temperature dependence owing to the occurrence of coherence factors analogous to the case of superconductors. It is found that these coherence factors are characteristically different for the interband and the intraband scattering processes. It is suggested that experimental observation of these temperature-dependent coherence factors may help identify the existence of an excitonic phase.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A simple technique involving the use of a rotating and a stationary diffuser has been developed to vary the spatial coherence of light from a He-Ne laser. Using this technique an experimental investigation of the dependence of rotation sensitivity of Lau fringes on the spatial coherence of the illuminating wavefield has been carried out. It is observed that (i) the rotation sensitivity of Lau fringes varies in a well-defined manner as a function of the spatial coherence of the light used; (ii) the extremely good rotation sensitivity of Lau fringes can be used to great advantage (compared to the conventional double slit method) in the measurement of the spatial coherence of a wavefield; (iii) Lau fringes are formed at various levels of spatial coherence and as such it appears that the Lau effect need not be associated with an incoherent optical field

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Three different types of consistencies, viz., semiweak, weak, and strong, of a read-only transaction in a schedule s of a set T of transactions are defined and these are compared with the existing notions of consistencies of a read-only transaction in a schedule. We present a technique that enables a user to control the consistency of a read-only transaction in heterogeneous locking protocols. Since the weak consistency of a read-only transaction improves concurrency in heterogeneous locking protocols, the users can help to improve concurrency in heterogeneous locking protocols by supplying the consistency requirements of read-only transactions. A heterogeneous locking protocol P' derived from a locking protocol P that uses exclusive mode locks only and ensures serializability need not be deadlock-free. We present a sufficient condition that ensures the deadlock-freeness of Pprime, when P is deadlock-free and all the read-only transactions in Pprime are two phase.