338 resultados para cache-oblivious
Resumo:
This paper presents the results of a pilot study examining the factors that impact most on the effective implementation of, and improvement to, Quality Mangement Sytems (QMSs) amongst Indonesian construction companies. Nine critical factors were identified from an extensive literature review, and a survey was conducted of 23 respondents from three specific groups (Quality Managers, Project Managers, and Site Engineers) undertaking work in the Indonesian infrastructure construction sector. The data has been analyzed initially using simple descriptive techniques. This study reveals that different groups within the sector have different opinions of the factors regardless of the degree of importance of each factor. However, the evaluation of construction project success and the incentive schemes for high performance staff, are the two factors that were considered very important by most of the respondents in all three groups. In terms of their assessment of tools for measuring contractor’s performance, additional QMS guidelines, techniques related to QMS practice provided by the Government, and benchmarking, a clear majority in each group regarded their usefulness as ‘of some importance’.
Resumo:
We demonstrate a modification of the algorithm of Dani et al for the online linear optimization problem in the bandit setting, which allows us to achieve an O( \sqrt{T ln T} ) regret bound in high probability against an adaptive adversary, as opposed to the in expectation result against an oblivious adversary of Dani et al. We obtain the same dependence on the dimension as that exhibited by Dani et al. The results of this paper rest firmly on those of Dani et al and the remarkable technique of Auer et al for obtaining high-probability bounds via optimistic estimates. This paper answers an open question: it eliminates the gap between the high-probability bounds obtained in the full-information vs bandit settings.
Resumo:
We present a modification of the algorithm of Dani et al. [8] for the online linear optimization problem in the bandit setting, which with high probability has regret at most O ∗ ( √ T) against an adaptive adversary. This improves on the previous algorithm [8] whose regret is bounded in expectation against an oblivious adversary. We obtain the same dependence on the dimension (n 3/2) as that exhibited by Dani et al. The results of this paper rest firmly on those of [8] and the remarkable technique of Auer et al. [2] for obtaining high probability bounds via optimistic estimates. This paper answers an open question: it eliminates the gap between the high-probability bounds obtained in the full-information vs bandit settings.
Resumo:
We address the problem of constructing randomized online algorithms for the Metrical Task Systems (MTS) problem on a metric δ against an oblivious adversary. Restricting our attention to the class of “work-based” algorithms, we provide a framework for designing algorithms that uses the technique of regularization. For the case when δ is a uniform metric, we exhibit two algorithms that arise from this framework, and we prove a bound on the competitive ratio of each. We show that the second of these algorithms is ln n + O(loglogn) competitive, which is the current state-of-the art for the uniform MTS problem.
Resumo:
X-ray microtomography (micro-CT) with micron resolution enables new ways of characterizing microstructures and opens pathways for forward calculations of multiscale rock properties. A quantitative characterization of the microstructure is the first step in this challenge. We developed a new approach to extract scale-dependent characteristics of porosity, percolation, and anisotropic permeability from 3-D microstructural models of rocks. The Hoshen-Kopelman algorithm of percolation theory is employed for a standard percolation analysis. The anisotropy of permeability is calculated by means of the star volume distribution approach. The local porosity distribution and local percolation probability are obtained by using the local porosity theory. Additionally, the local anisotropy distribution is defined and analyzed through two empirical probability density functions, the isotropy index and the elongation index. For such a high-resolution data set, the typical data sizes of the CT images are on the order of gigabytes to tens of gigabytes; thus an extremely large number of calculations are required. To resolve this large memory problem parallelization in OpenMP was used to optimally harness the shared memory infrastructure on cache coherent Non-Uniform Memory Access architecture machines such as the iVEC SGI Altix 3700Bx2 Supercomputer. We see adequate visualization of the results as an important element in this first pioneering study.
Resumo:
One-time proxy signatures are one-time signatures for which a primary signer can delegate his or her signing capability to a proxy signer. In this work we propose two one-time proxy signature schemes with different security properties. Unlike other existing one-time proxy signatures that are constructed from public key cryptography, our proposed schemes are based one-way functions without trapdoors and so they inherit the communication and computation efficiency from the traditional one-time signatures. Although from a verifier point of view, signatures generated by the proxy are indistinguishable from those created by the primary signer, a trusted authority can be equipped with an algorithm that allows the authority to settle disputes between the signers. In our constructions, we use a combination of one-time signatures, oblivious transfer protocols and certain combinatorial objects. We characterise these new combinatorial objects and present constructions for them.
Resumo:
Loads that miss in L1 or L2 caches and waiting for their data at the head of the ROB cause significant slow down in the form of commit stalls. We identify that most of these commit stalls are caused by a small set of loads, referred to as LIMCOS (Loads Incurring Majority of COmmit Stalls). We propose simple history-based classifiers that track commit stalls suffered by loads to help us identify this small set of loads. We study an application of these classifiers to prefetching. The classifiers are used to train the prefetcher to focus on the misses suffered by LIMCOS. This, referred to as focused prefetching, results in a 9.8% gain in IPC over naive GHB based delta correlation prefetcher along with a 20.3% reduction in memory traffic for a set of 17 memory-intensive SPEC2000 benchmarks. Another important impact of focused prefetching is a 61% improvement in the accuracy of prefetches. We demonstrate that the proposed classification criterion performs better than other existing criteria like criticality and delinquent loads. Also we show that the criterion of focusing on commit stalls is robust enough across cache levels and can be applied to any prefetcher without any modifications to the prefetcher.
Resumo:
Content delivery networks (CDNs) are an essential component of modern website infrastructures: edge servers located closer to users cache content, increasing robustness and capacity while decreasing latency. However, this situation becomes complicated for HTTPS content that is to be delivered using the Transport Layer Security (TLS) protocol: the edge server must be able to carry out TLS handshakes for the cached domain. Most commercial CDNs require that the domain owner give their certificate's private key to the CDN's edge server or abandon caching of HTTPS content entirely. We examine the security and performance of a recently commercialized delegation technique in which the domain owner retains possession of their private key and splits the TLS state machine geographically with the edge server using a private key proxy service. This allows the domain owner to limit the amount of trust given to the edge server while maintaining the benefits of CDN caching. On the performance front, we find that latency is slightly worse compared to the insecure approach, but still significantly better than the domain owner serving the content directly. On the security front, we enumerate the security goals for TLS handshake proxying and identify a subtle difference between the security of RSA key transport and signed-Diffie--Hellman in TLS handshake proxying; we also discuss timing side channel resistance of the key server and the effect of TLS session resumption.
Resumo:
A major concern of embedded system architects is the design for low power. We address one aspect of the problem in this paper, namely the effect of executable code compression. There are two benefits of code compression – firstly, a reduction in the memory footprint of embedded software, and secondly, potential reduction in memory bus traffic and power consumption. Since decompression has to be performed at run time it is achieved by hardware. We describe a tool called COMPASS which can evaluate a range of strategies for any given set of benchmarks and display compression ratios. Also, given an execution trace, it can compute the effect on bus toggles, and cache misses for a range of compression strategies. The tool is interactive and allows the user to vary a set of parameters, and observe their effect on performance. We describe an implementation of the tool and demonstrate its effectiveness. To the best of our knowledge this is the first tool proposed for such a purpose.
Resumo:
CD-ROMs have proliferated as a distribution media for desktop machines for a large variety of multimedia applications (targeted for a single-user environment) like encyclopedias, magazines and games. With CD-ROM capacities up to 3 GB being available in the near future, they will form an integral part of Video on Demand (VoD) servers to store full-length movies and multimedia. In the first section of this paper we look at issues related to the single- user desktop environment. Since these multimedia applications are highly interactive in nature, we take a pragmatic approach, and have made a detailed study of the multimedia application behavior in terms of the I/O request patterns generated to the CD-ROM subsystem by tracing these patterns. We discuss prefetch buffer design and seek time characteristics in the context of the analysis of these traces. We also propose an adaptive main-memory hosted cache that receives caching hints from the application to reduce the latency when the user moves from one node of the hyper graph to another. In the second section we look at the use of CD-ROM in a VoD server and discuss the problem of scheduling multiple request streams and buffer management in this scenario. We adapt the C-SCAN (Circular SCAN) algorithm to suit the CD-ROM drive characteristics and prove that it is optimal in terms of buffer size management. We provide computationally inexpensive relations by which this algorithm can be implemented. We then propose an admission control algorithm which admits new request streams without disrupting the continuity of playback of the previous request streams. The algorithm also supports operations such as fast forward and replay. Finally, we discuss the problem of optimal placement of MPEG streams on CD-ROMs in the third section.
Resumo:
We describe the design of a directory-based shared memory architecture on a hierarchical network of hypercubes. The distributed directory scheme comprises two separate hierarchical networks for handling cache requests and transfers. Further, the scheme assumes a single address space and each processing element views the entire network as contiguous memory space. The size of individual directories stored at each node of the network remains constant throughout the network. Although the size of the directory increases with the network size, the architecture is scalable. The results of the analytical studies demonstrate superior performance characteristics of our scheme compared with those of other schemes.
Resumo:
Simulation is an important means of evaluating new microarchitectures. With the invention of multi-core (CMP) platforms, simulators are becoming larger and more complex. However, with the availability of CMPs with larger caches and higher operating frequency, the wall clock time required for simulating an application has become comparatively shorter. Reducing this simulation time further is a great challenge, especially in the case of multi-threaded workload due to indeterminacy introduced due to simultaneously executing various threads. In this paper, we propose a technique for speeding multi-core simulation. The model of the processor core and cache are replaced with functional models, to achieve speedup. A timed Petri net model is used to estimate the execution time of the processor and the memory access latencies are estimated using hit/miss information obtained from the functional model of the cache. This model can be used to predict performance of data parallel applications or multiprogramming workload on CMP platform with various cache hierarchies and shared bus interconnect. The error in estimation of the execution time of an application is within 6%. The speedup achieved ranges between an average of 2x--4x over the cycle accurate simulator.
Resumo:
In this paper we propose a new method of data handling for web servers. We call this method Network Aware Buffering and Caching (NABC for short). NABC facilitates reduction of data copies in web server's data sending path, by doing three things: (1) Layout the data in main memory in a way that protocol processing can be done without data copies (2) Keep a unified cache of data in kernel and ensure safe access to it by various processes and kernel and (3) Pass only the necessary meta data between processes so that bulk data handling time spent during IPC can be reduced. We realize NABC by implementing a set of system calls and an user library. The end product of the implementation is a set of APIs specifically designed for use by the web servers. We port an in house web server called SWEET, to NABC APIs and evaluate performance using a range of workloads both simulated and real. The results show a very impressive gain of 12% to 21% in throughput for static file serving and 1.6 to 4 times gain in throughput for lightweight dynamic content serving for a server using NABC APIs over the one using UNIX APIs.
Resumo:
Road transportation, as an important requirement of modern society, is presently hindered by restrictions in emission legislations as well as the availability of petroleum fuels, and as a consequence, the fuel cost. For nearly 270 years, we burned our fossil cache and have come to within a generation of exhausting the liquid part of it. Besides, to reduce the greenhouse gases, and to obey the environmental laws of most countries, it would be necessary to replace a significant number of the petroleum-fueled internal-combustion-engine vehicles (ICEVs) with electric cars in the near future. In this article, we briefly describe the merits and demerits of various proposed electrochemical systems for electric cars, namely the storage batteries, fuel cells and electrochemical supercapacitors, and determine the power and energy requirements of a modern car. We conclude that a viable electric car could be operated with a 50 kW polymer-electrolyte fuel cell stack to provide power for cruising and climbing, coupled in parallel with a 30 kW supercapacitor and/or battery bank to deliver additional short-term burst-power during acceleration.
Resumo:
We describe a System-C based framework we are developing, to explore the impact of various architectural and microarchitectural level parameters of the on-chip interconnection network elements on its power and performance. The framework enables one to choose from a variety of architectural options like topology, routing policy, etc., as well as allows experimentation with various microarchitectural options for the individual links like length, wire width, pitch, pipelining, supply voltage and frequency. The framework also supports a flexible traffic generation and communication model. We provide preliminary results of using this framework to study the power, latency and throughput of a 4x4 multi-core processing array using mesh, torus and folded torus, for two different communication patterns of dense and sparse linear algebra. The traffic consists of both Request-Response messages (mimicing cache accesses)and One-Way messages. We find that the average latency can be reduced by increasing the pipeline depth, as it enables higher link frequencies. We also find that there exists an optimum degree of pipelining which minimizes energy-delay product.