61 resultados para cloud computing datacenter performance QoS


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dynamic Voltage and Frequency Scaling (DVFS) is a very effective tool for designing trade-offs between energy and performance. In this paper, we use a formal Petri net based program performance model that directly captures both the application and system properties, to find energy efficient DVFS settings for CMP systems, that satisfy a given performance constraint, for SPMD multithreaded programs. Experimental evaluation shows that we achieve significant energy savings, while meeting the performance constraints.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Data Prefetchers identify and make use of any regularity present in the history/training stream to predict future references and prefetch them into the cache. The training information used is typically the primary misses seen at a particular cache level, which is a filtered version of the accesses seen by the cache. In this work we demonstrate that extending the training information to include secondary misses and hits along with primary misses helps improve the performance of prefetchers. In addition to empirical evaluation, we use the information theoretic metric entropy, to quantify the regularity present in extended histories. Entropy measurements indicate that extended histories are more regular than the default primary miss only training stream. Entropy measurements also help corroborate our empirical findings. With extended histories, further benefits can be achieved by triggering prefetches during secondary misses also. In this paper we explore the design space of extended prefetch histories and alternative prefetch trigger points for delta correlation prefetchers. We observe that different prefetch schemes benefit to a different extent with extended histories and alternative trigger points. Also the best performing design point varies on a per-benchmark basis. To meet these requirements, we propose a simple adaptive scheme that identifies the best performing design point for a benchmark-prefetcher combination at runtime. In SPEC2000 benchmarks, using all the L2 accesses as history for prefetcher improves the performance in terms of both IPC and misses reduced over techniques that use only primary misses as history. The adaptive scheme improves the performance of CZone prefetcher over Baseline by 4.6% on an average. These performance gains are accompanied by a moderate reduction in the memory traffic requirements.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The contour tree is a topological abstraction of a scalar field that captures evolution in level set connectivity. It is an effective representation for visual exploration and analysis of scientific data. We describe a work-efficient, output sensitive, and scalable parallel algorithm for computing the contour tree of a scalar field defined on a domain that is represented using either an unstructured mesh or a structured grid. A hybrid implementation of the algorithm using the GPU and multi-core CPU can compute the contour tree of an input containing 16 million vertices in less than ten seconds with a speedup factor of upto 13. Experiments based on an implementation in a multi-core CPU environment show near-linear speedup for large data sets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A computationally efficient approach that computes the optimal regularization parameter for the Tikhonov-minimization scheme is developed for photoacoustic imaging. This approach is based on the least squares-QR decomposition which is a well-known dimensionality reduction technique for a large system of equations. It is shown that the proposed framework is effective in terms of quantitative and qualitative reconstructions of initial pressure distribution enabled via finding an optimal regularization parameter. The computational efficiency and performance of the proposed method are shown using a test case of numerical blood vessel phantom, where the initial pressure is exactly known for quantitative comparison. (C) 2013 Society of Photo-Optical Instrumentation Engineers (SPIE)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The presence of software bloat in large flexible software systems can hurt energy efficiency. However, identifying and mitigating bloat is fairly effort intensive. To enable such efforts to be directed where there is a substantial potential for energy savings, we investigate the impact of bloat on power consumption under different situations. We conduct the first systematic experimental study of the joint power-performance implications of bloat across a range of hardware and software configurations on modern server platforms. The study employs controlled experiments to expose different effects of a common type of Java runtime bloat, excess temporary objects, in the context of the SPECPower_ssj2008 workload. We introduce the notion of equi-performance power reduction to characterize the impact, in addition to peak power comparisons. The results show a wide variation in energy savings from bloat reduction across these configurations. Energy efficiency benefits at peak performance tend to be most pronounced when bloat affects a performance bottleneck and non-bloated resources have low energy-proportionality. Equi-performance power savings are highest when bloated resources have a high degree of energy proportionality. We develop an analytical model that establishes a general relation between resource pressure caused by bloat and its energy efficiency impact under different conditions of resource bottlenecks and energy proportionality. Applying the model to different "what-if" scenarios, we predict the impact of bloat reduction and corroborate these predictions with empirical observations. Our work shows that the prevalent software-only view of bloat is inadequate for assessing its power-performance impact and instead provides a full systems approach for reasoning about its implications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The twin demands of energy-efficiency and higher performance on DRAM are highly emphasized in multicore architectures. A variety of schemes have been proposed to address either the latency or the energy consumption of DRAMs. These schemes typically require non-trivial hardware changes and end up improving latency at the cost of energy or vice-versa. One specific DRAM performance problem in multicores is that interleaved accesses from different cores can potentially degrade row-buffer locality. In this paper, based on the temporal and spatial locality characteristics of memory accesses, we propose a reorganization of the existing single large row-buffer in a DRAM bank into multiple sub-row buffers (MSRB). This re-organization not only improves row hit rates, and hence the average memory latency, but also brings down the energy consumed by the DRAM. The first major contribution of this work is proposing such a reorganization without requiring any significant changes to the existing widely accepted DRAM specifications. Our proposed reorganization improves weighted speedup by 35.8%, 14.5% and 21.6% in quad, eight and sixteen core workloads along with a 42%, 28% and 31% reduction in DRAM energy. The proposed MSRB organization enables opportunities for the management of multiple row-buffers at the memory controller level. As the memory controller is aware of the behaviour of individual cores it allows us to implement coordinated buffer allocation schemes for different cores that take into account program behaviour. We demonstrate two such schemes, namely Fairness Oriented Allocation and Performance Oriented Allocation, which show the flexibility that memory controllers can now exploit in our MSRB organization to improve overall performance and/or fairness. Further, the MSRB organization enables additional opportunities for DRAM intra-bank parallelism and selective early precharging of the LRU row-buffer to further improve memory access latencies. These two optimizations together provide an additional 5.9% performance improvement.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We discuss the computational bottlenecks in molecular dynamics (MD) and describe the challenges in parallelizing the computation-intensive tasks. We present a hybrid algorithm using MPI (Message Passing Interface) with OpenMP threads for parallelizing a generalized MD computation scheme for systems with short range interatomic interactions. The algorithm is discussed in the context of nano-indentation of Chromium films with carbon indenters using the Embedded Atom Method potential for Cr-Cr interaction and the Morse potential for Cr-C interactions. We study the performance of our algorithm for a range of MPI-thread combinations and find the performance to depend strongly on the computational task and load sharing in the multi-core processor. The algorithm scaled poorly with MPI and our hybrid schemes were observed to outperform the pure message passing scheme, despite utilizing the same number of processors or cores in the cluster. Speed-up achieved by our algorithm compared favorably with that achieved by standard MD packages. (C) 2013 Elsevier Inc. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this study, the Tropical Rainfall Measurement Mission based Microwave Imager estimates (2A12) have been used to compare and contrast the characteristics of cloud liquid water and ice over the Indian land region and the ocean surrounding it, during the premonsoon (May) and monsoon (June-September) seasons. Based on the spatial homogeneity of rainfall, we have selected five regions for our study (three over ocean, two over land). Comparison across three ocean regions suggests that the cloud liquid water (CLW) over the orographically influenced Arabian Sea (close to the Indian west coast) behaves differently from the CLW over a trapped ocean (Bay of Bengal) or an open ocean (equatorial Indian Ocean). Specifically, the Arabian Sea region shows higher liquid water for a lower range of rainfall, whereas the Bay of Bengal and the equatorial Indian Ocean show higher liquid water for a higher range of rainfall. Apart from geographic differences, we also documented seasonal differences by comparing CLW profiles between monsoon and premonsoon periods, as well as between early and peak phases of the monsoon. We find that the CLW during the lean periods of rainfall (May or June) is higher than during the peak and late monsoon season (July-September) for raining clouds. As active and break phases are important signatures of the monsoon progression, we also analysed the differences in CLW during various phases of the monsoon, namely, active, break, active-to-break and break-to-active transition phases. We find that the cloud liquid water content during the break-to-active transition phase is significantly higher than during the active-to-break transition phase over central India. We speculate that this could be attributed to higher amount of aerosol loading over this region during the break phase. We lend credence to this aerosol-CLW/rain association by comparing the central Indian CLW with that over southeast Asia (where the aerosol loading is significantly smaller) and find that in the latter region, there are no significant differences in CLW during the different phases of the monsoon. While our hypothesis needs to be further investigated with numerical models, the results presented in this study can potentially serve as a good benchmark in evaluating the performance of cloud resolving models over the Indian region.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we study a problem of designing a multi-hop wireless network for interconnecting sensors (hereafter called source nodes) to a Base Station (BS), by deploying a minimum number of relay nodes at a subset of given potential locations, while meeting a quality of service (QoS) objective specified as a hop count bound for paths from the sources to the BS. The hop count bound suffices to ensure a certain probability of the data being delivered to the BS within a given maximum delay under a light traffic model. We observe that the problem is NP-Hard. For this problem, we propose a polynomial time approximation algorithm based on iteratively constructing shortest path trees and heuristically pruning away the relay nodes used until the hop count bound is violated. Results show that the algorithm performs efficiently in various randomly generated network scenarios; in over 90% of the tested scenarios, it gave solutions that were either optimal or were worse than optimal by just one relay. We then use random graph techniques to obtain, under a certain stochastic setting, an upper bound on the average case approximation ratio of a class of algorithms (including the proposed algorithm) for this problem as a function of the number of source nodes, and the hop count bound. To the best of our knowledge, the average case analysis is the first of its kind in the relay placement literature. Since the design is based on a light traffic model, we also provide simulation results (using models for the IEEE 802.15.4 physical layer and medium access control) to assess the traffic levels up to which the QoS objectives continue to be met. (C) 2014 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

X-ray polarimeters based on Time Projection Chamber (TPC) geometry are currently being studied and developed to make sensitive measurement of polarization in 2-10keV energy range. TPC soft X-ray polarimeters exploit the fact that emission direction of the photoelectron ejected via photoelectric effect in a gas proportional counter carries the information of the polarization of the incident X-ray photon. Operating parameters such as pressure, drift field and drift-gap affect the performance of a TPC polarimeter. Simulations presented here showcase the effect of these operating parameters on the modulation factor of the TPC polarimeter. Models of Garfield are used to study photoelectron interaction in gas and drift of electron cloud towards Gas Electron Multiplier (GEM). The emission direction is reconstructed from the image and modulation factor is computed. Our study has shown that Ne/DME (50/50) at lower pressure and drift field can be used for a TPC polarimeter with modulation factor of 50-65%.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An abundance of spectrum access and sensing algorithms are available in the dynamic spectrum access (DSA) and cognitive radio (CR) literature. Often, however, the functionality and performance of such algorithms are validated against theoretical calculations using only simulations. Both the theoretical calculations and simulations come with their attendant sets of assumptions. For instance, designers of dynamic spectrum access algorithms often take spectrum sensing and rendezvous mechanisms between transmitter-receiver pairs for granted. Test bed designers, on the other hand, either customize so much of their design that it becomes difficult to replicate using commercial off the shelf (COTS) components or restrict themselves to simulation, emulation /hardware-in-Ioop (HIL), or pure hardware but not all three. Implementation studies on test beds sophisticated enough to combine the three aforementioned aspects, but at the same time can also be put together using COTS hardware and software packages are rare. In this paper we describe i) the implementation of a hybrid test bed using a previously proposed hardware agnostic system architecture ii) the implementation of DSA on this test bed, and iii) the realistic hardware and software-constrained performance of DSA. Snapshot energy detector (ED) and Cumulative Summation (CUSUM), a sequential change detection algorithm, are available for spectrum sensing and a two-way handshake mechanism in a dedicated control channel facilitates transmitter-receiver rendezvous.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Wireless Sensor Networks have gained popularity due to their real time applications and low-cost nature. These networks provide solutions to scenarios that are critical, complicated and sensitive like military fields, habitat monitoring, and disaster management. The nodes in wireless sensor networks are highly resource constrained. Routing protocols are designed to make efficient utilization of the available resources in communicating a message from source to destination. In addition to the resource management, the trustworthiness of neighboring nodes or forwarding nodes and the energy level of the nodes to keep the network alive for longer duration is to be considered. This paper proposes a QoS Aware Trust Metric based Framework for Wireless Sensor Networks. The proposed framework safeguards a wireless sensor network from intruders by considering the trustworthiness of the forwarder node at every stage of multi-hop routing. Increases network lifetime by considering the energy level of the node, prevents the adversary from tracing the route from source to destination by providing path variation. The framework is built on NS2 Simulator. Experimental results show that the framework provides energy balance through establishment of trustworthy paths from the source to the destination. (C) 2015 The Authors. Published by Elsevier B.V.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Scalable stream processing and continuous dataflow systems are gaining traction with the rise of big data due to the need for processing high velocity data in near real time. Unlike batch processing systems such as MapReduce and workflows, static scheduling strategies fall short for continuous dataflows due to the variations in the input data rates and the need for sustained throughput. The elastic resource provisioning of cloud infrastructure is valuable to meet the changing resource needs of such continuous applications. However, multi-tenant cloud resources introduce yet another dimension of performance variability that impacts the application's throughput. In this paper we propose PLAStiCC, an adaptive scheduling algorithm that balances resource cost and application throughput using a prediction-based lookahead approach. It not only addresses variations in the input data rates but also the underlying cloud infrastructure. In addition, we also propose several simpler static scheduling heuristics that operate in the absence of accurate performance prediction model. These static and adaptive heuristics are evaluated through extensive simulations using performance traces obtained from Amazon AWS IaaS public cloud. Our results show an improvement of up to 20% in the overall profit as compared to the reactive adaptation algorithm.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the context of wireless sensor networks, we are motivated by the design of a tree network spanning a set of source nodes that generate packets, a set of additional relay nodes that only forward packets from the sources, and a data sink. We assume that the paths from the sources to the sink have bounded hop count, that the nodes use the IEEE 802.15.4 CSMA/CA for medium access control, and that there are no hidden terminals. In this setting, starting with a set of simple fixed point equations, we derive explicit conditions on the packet generation rates at the sources, so that the tree network approximately provides certain quality of service (QoS) such as end-to-end delivery probability and mean delay. The structures of our conditions provide insight on the dependence of the network performance on the arrival rate vector, and the topological properties of the tree network. Our numerical experiments suggest that our approximations are able to capture a significant part of the QoS aware throughput region (of a tree network), that is adequate for many sensor network applications. Furthermore, for the special case of equal arrival rates, default backoff parameters, and for a range of values of target QoS, we show that among all path-length-bounded trees (spanning a given set of sources and the data sink) that meet the conditions derived in the paper, a shortest path tree achieves the maximum throughput. (C) 2015 Elsevier B.V. All rights reserved.