904 resultados para Cache coherence
Resumo:
In this paper we present a cache coherence protocol for multistage interconnection network (MIN)-based multiprocessors with two distinct private caches: private-blocks caches (PCache) containing blocks private to a process and shared-blocks caches (SCache) containing data accessible by all processes. The architecture is extended by a coherence control bus connecting all shared-block cache controllers. Timing problems due to variable transit delays through the MIN are dealt with by introducing Transient states in the proposed cache coherence protocol. The impact of the coherence protocol on system performance is evaluated through a performance study of three phases. Assuming homogeneity of all nodes, a single-node queuing model (phase 3) is developed to analyze system performance. This model is solved for processor and coherence bus utilizations using the mean value analysis (MVA) technique with shared-blocks steady state probabilities (phase 1) and communication delays (phase 2) as input parameters. The performance of our system is compared to that of a system with an equivalent-sized unified cache and with a multiprocessor implementing a directory-based coherence protocol. System performance measures are verified through simulation.
Resumo:
在处理器从单核向多核演进的过程中,为了获得更好的性能和可扩展性,适用于多核处理器系统的Cache一致性协议变得越来越复杂。Cache一致性协议的验证一直是模型检测在工业界主要应用之一,被工业界和学术界关注。相对传统方法而言,微结构级的模型检测能够描述和验证更多的协议细节。利用NuSMV工具对Intel公司的MESIF Cache一致性协议进行模型检测在微结构层次上进行了建模,并对该协议进行模型检测,试验结果证明了此方法的有效性。
Resumo:
This paper introduces hybrid address spaces as a fundamental design methodology for implementing scalable runtime systems on many-core architectures without hardware support for cache coherence. We use hybrid address spaces for an implementation of MapReduce, a programming model for large-scale data processing, and the implementation of a remote memory access (RMA) model. Both implementations are available on the Intel SCC and are portable to similar architectures. We present the design and implementation of HyMR, a MapReduce runtime system whereby different stages and the synchronization operations between them alternate between a distributed memory address space and a shared memory address space, to improve performance and scalability. We compare HyMR to a reference implementation and we find that HyMR improves performance by a factor of 1.71× over a set of representative MapReduce benchmarks. We also compare HyMR with Phoenix++, a state-of-art implementation for systems with hardware-managed cache coherence in terms of scalability and sustained to peak data processing bandwidth, where HyMR demon- strates improvements of a factor of 3.1× and 3.2× respectively. We further evaluate our hybrid remote memory access (HyRMA) programming model and assess its performance to be superior of that of message passing.
Resumo:
This work presents the concept, design and implementation of a MP-SoC platform, named STORM (MP-SoC DirecTory-Based PlatfORM). Currently the platform is composed of the following modules: SPARC V8 processor, GPOP processor, Cache module, Memory module, Directory module and two different modles of Network-on-Chip, NoCX4 and Obese Tree. All modules were implemented using SystemC, simulated and validated, individually or in group. The modules description is presented in details. For programming the platform in C it was implemented a SPARC assembler, fully compatible with gcc s generated assembly code. For the parallel programming it was implemented a library for mutex managing, using the due assembler s support. A total of 10 simulations of increasing complexity are presented for the validation of the presented concepts. The simulations include real parallel applications, such as matrix multiplication, Mergesort, KMP, Motion Estimation and DCT 2D
Resumo:
Software transactional memory (STM) is a promising programming paradigm for shared memory multithreaded programs. In order for STMs to be adopted widely for performance critical software, understanding and improving the cache performance of applications running on STM becomes increasingly crucial, as the performance gap between processor and memory continues to grow. In this paper, we present the most detailed experimental evaluation to date, of the cache behavior of STM applications and quantify the impact of the different STM factors on the cache misses experienced by the applications. We find that STMs are not cache friendly, with the data cache stall cycles contributing to more than 50% of the execution cycles in a majority of the benchmarks. We find that on an average, misses occurring inside the STM account for 62% of total data cache miss latency cycles experienced by the applications and the cache performance is impacted adversely due to certain inherent characteristics of the STM itself. The above observations motivate us to propose a set of specific compiler transformations targeted at making the STMs cache friendly. We find that STM's fine grained and application unaware locking is a major contributor to its poor cache behavior. Hence we propose selective Lock Data co-location (LDC) and Redundant Lock Access Removal (RLAR) to address the lock access misses. We find that even transactions that are completely disjoint access parallel, suffer from costly coherence misses caused by the centralized global time stamp updates and hence we propose the Selective Per-Partition Time Stamp (SPTS) transformation to address this. We show that our transformations are effective in improving the cache behavior of STM applications by reducing the data cache miss latency by 20.15% to 37.14% and improving execution time by 18.32% to 33.12% in five of the 8 STAMP applications.
Resumo:
With web caching and cache-related services like CDNs and edge services playing an increasingly significant role in the modern internet, the problem of the weak consistency and coherence provisions in current web protocols is becoming increasingly significant and drawing the attention of the standards community [LCD01]. Toward this end, we present definitions of consistency and coherence for web-like environments, that is, distributed client-server information systems where the semantics of interactions with resource are more general than the read/write operations found in memory hierarchies and distributed file systems. We then present a brief review of proposed mechanisms which strengthen the consistency of caches in the web, focusing upon their conceptual contributions and their weaknesses in real-world practice. These insights motivate a new mechanism, which we call "Basis Token Consistency" or BTC; when implemented at the server, this mechanism allows any client (independent of the presence and conformity of any intermediaries) to maintain a self-consistent view of the server's state. This is accomplished by annotating responses with additional per-resource application information which allows client caches to recognize the obsolescence of currently cached entities and identify responses from other caches which are already stale in light of what has already been seen. The mechanism requires no deviation from the existing client-server communication model, and does not require servers to maintain any additional per-client state. We discuss how our mechanism could be integrated into a fragment-assembling Content Management System (CMS), and present a simulation-driven performance comparison between the BTC algorithm and the use of the Time-To-Live (TTL) heuristic.
Resumo:
We present a novel method and instrument for in vivo imaging and measurement of the human corneal dynamics during an air puff. The instrument is based on high-speed swept source optical coherence tomography (ssOCT) combined with a custom adapted air puff chamber from a non-contact tonometer, which uses an air stream to deform the cornea in a non-invasive manner. During the short period of time that the deformation takes place, the ssOCT acquires multiple A-scans in time (M-scan) at the center of the air puff, allowing observation of the dynamics of the anterior and posterior corneal surfaces as well as the anterior lens surface. The dynamics of the measurement are driven by the biomechanical properties of the human eye as well as its intraocular pressure. Thus, the analysis of the M-scan may provide useful information about the biomechanical behavior of the anterior segment during the applanation caused by the air puff. An initial set of controlled clinical experiments are shown to comprehend the performance of the instrument and its potential applicability to further understand the eye biomechanics and intraocular pressure measurements. Limitations and possibilities of the new apparatus are discussed.
Resumo:
Signal-degrading speckle is one factor that can reduce the quality of optical coherence tomography images. We demonstrate the use of a hierarchical model-based motion estimation processing scheme based on an affine-motion model to reduce speckle in optical coherence tomography imaging, by image registration and the averaging of multiple B-scans. The proposed technique is evaluated against other methods available in the literature. The results from a set of retinal images show the benefit of the proposed technique, which provides an improvement in signal-to-noise ratio of the square root of the number of averaged images, leading to clearer visual information in the averaged image. The benefits of the proposed technique are also explored in the case of ocular anterior segment imaging.
Resumo:
Computer resource allocation represents a significant challenge particularly for multiprocessor systems, which consist of shared computing resources to be allocated among co-runner processes and threads. While an efficient resource allocation would result in a highly efficient and stable overall multiprocessor system and individual thread performance, ineffective poor resource allocation causes significant performance bottlenecks even for the system with high computing resources. This thesis proposes a cache aware adaptive closed loop scheduling framework as an efficient resource allocation strategy for the highly dynamic resource management problem, which requires instant estimation of highly uncertain and unpredictable resource patterns. Many different approaches to this highly dynamic resource allocation problem have been developed but neither the dynamic nature nor the time-varying and uncertain characteristics of the resource allocation problem is well considered. These approaches facilitate either static and dynamic optimization methods or advanced scheduling algorithms such as the Proportional Fair (PFair) scheduling algorithm. Some of these approaches, which consider the dynamic nature of multiprocessor systems, apply only a basic closed loop system; hence, they fail to take the time-varying and uncertainty of the system into account. Therefore, further research into the multiprocessor resource allocation is required. Our closed loop cache aware adaptive scheduling framework takes the resource availability and the resource usage patterns into account by measuring time-varying factors such as cache miss counts, stalls and instruction counts. More specifically, the cache usage pattern of the thread is identified using QR recursive least square algorithm (RLS) and cache miss count time series statistics. For the identified cache resource dynamics, our closed loop cache aware adaptive scheduling framework enforces instruction fairness for the threads. Fairness in the context of our research project is defined as a resource allocation equity, which reduces corunner thread dependence in a shared resource environment. In this way, instruction count degradation due to shared cache resource conflicts is overcome. In this respect, our closed loop cache aware adaptive scheduling framework contributes to the research field in two major and three minor aspects. The two major contributions lead to the cache aware scheduling system. The first major contribution is the development of the execution fairness algorithm, which degrades the co-runner cache impact on the thread performance. The second contribution is the development of relevant mathematical models, such as thread execution pattern and cache access pattern models, which in fact formulate the execution fairness algorithm in terms of mathematical quantities. Following the development of the cache aware scheduling system, our adaptive self-tuning control framework is constructed to add an adaptive closed loop aspect to the cache aware scheduling system. This control framework in fact consists of two main components: the parameter estimator, and the controller design module. The first minor contribution is the development of the parameter estimators; the QR Recursive Least Square(RLS) algorithm is applied into our closed loop cache aware adaptive scheduling framework to estimate highly uncertain and time-varying cache resource patterns of threads. The second minor contribution is the designing of a controller design module; the algebraic controller design algorithm, Pole Placement, is utilized to design the relevant controller, which is able to provide desired timevarying control action. The adaptive self-tuning control framework and cache aware scheduling system in fact constitute our final framework, closed loop cache aware adaptive scheduling framework. The third minor contribution is to validate this cache aware adaptive closed loop scheduling framework efficiency in overwhelming the co-runner cache dependency. The timeseries statistical counters are developed for M-Sim Multi-Core Simulator; and the theoretical findings and mathematical formulations are applied as MATLAB m-file software codes. In this way, the overall framework is tested and experiment outcomes are analyzed. According to our experiment outcomes, it is concluded that our closed loop cache aware adaptive scheduling framework successfully drives co-runner cache dependent thread instruction count to co-runner independent instruction count with an error margin up to 25% in case cache is highly utilized. In addition, thread cache access pattern is also estimated with 75% accuracy.
Resumo:
Purpose: To determine likely errors in estimating retinal shape using partial coherence interferometric instruments when no allowance is made for optical distortion. Method: Errors were estimated using Gullstrand’s No. 1 schematic eye and variants which included a 10 D axial myopic eye, an emmetropic eye with a gradient-index lens, and a 10.9 D accommodating eye with a gradient-index lens. Performance was simulated for two commercial instruments, the IOLMaster (Carl Zeiss Meditec) and the Lenstar LS 900 (Haag-Streit AG). The incident beam was directed towards either the centre of curvature of the anterior cornea (corneal-direction method) or the centre of the entrance pupil (pupil-direction method). Simple trigonometry was used with the corneal intercept and the incident beam angle to estimate retinal contour. Conics were fitted to the estimated contours. Results: The pupil-direction method gave estimates of retinal contour that were much too flat. The cornea-direction method gave similar results for IOLMaster and Lenstar approaches. The steepness of the retinal contour was slightly overestimated, the exact effects varying with the refractive error, gradient index and accommodation. Conclusion: These theoretical results suggest that, for field angles ≤30º, partial coherence interferometric instruments are of use in estimating retinal shape by the corneal-direction method with the assumptions of a regular retinal shape and no optical distortion. It may be possible to improve on these estimates out to larger field angles by using optical modeling to correct for distortion.
Resumo:
To compare measurements of retinal thickness (RT) and choroidal thickness (ChT) obtained with an optical low coherence reflectometry (OLCR) biometer (Lenstar LS 900) with those obtained with a spectral domain optical coherence tomographer (SD OCT) (Copernicus SOCT HR) in young normal subjects.
Resumo:
Clinicians regularly face the confronting challenge of differentiating a choroidal naevus from a melanoma. Uveal naevi are a relatively common finding during routine eye examinations: a prevalence of 6.5 per cent has been reported.1 In contrast, malignant melanomata are uncommon, being found in six persons per million population, but they can have devastating implications and consequences.2 Differential diagnoses can be difficult to make with certainty; any additional information that can assist in this process is advantageous...