73 resultados para cloud computing datacenter performance QoS
Resumo:
With the over-provisioned routing resource on FPGA, the topology choice for NoC implementation on FPGA is more flexible than on ASIC. However, it is well understood that the global wire routing impacts the performance of NoC on FPGA because the topology is routed by using fixed routing fabric. An important question that arises is: will the benefit of diameter reduction by using a highly connective topology outweigh the impact of global routing? To answer this question, we investigate FPGA based packet switched NoC implementations with different sizes and topologies, and quantitatively measure the impact of global routing to each of these networks. The result shows that with sufficient routing resources on modern FPGA, the global routing is not on the critical path of the system, and thus is not a dominating factor for the performance of practical multi-hop NoC system. © 2011 IEEE.
Resumo:
An extension of approximate computing, significance-based computing exploits applications' inherent error resiliency and offers a new structural paradigm that strategically relaxes full computational precision to provide significant energy savings with minimal performance degradation.
Resumo:
In this paper, weconsider switch-and-stay combining (SSC) in two-way relay systems with two amplify-and-forward relays, one of which is activated to assist the information exchange between the two sources. The system operates in either analog network coding (ANC) protocol where the communication is only achieved with the help of the active relay or timedivision broadcast (TDBC) protocol where the direct link between two sources can be utilized to exploit more diversity gain. In both cases, we study the outage probability and bit error rate (BER) for Rayleigh fading channels. In particular, we derive closed-form lower bounds for the outage probability and the average BER, which remain tight for different fading conditions. We also present asymptotic analysis for both the outage probability and the average BER at high signalto-noise ratio. It is shown that SSC can achieve the full diversity order in two-way relay systems for both ANC and TDBC protocols with proper switching thresholds. Copyright © 2014 John Wiley & Sons, Ltd.
Resumo:
To cope with the rapid growth of multimedia applications that requires dynamic levels of quality of service (QoS), cross-layer (CL) design, where multiple protocol layers are jointly combined, has been considered to provide diverse QoS provisions for mobile multimedia networks. However, there is a lack of a general mathematical framework to model such CL scheme in wireless networks with different types of multimedia classes. In this paper, to overcome this shortcoming, we therefore propose a novel CL design for integrated real-time/non-real-time traffic with strict preemptive priority via a finite-state Markov chain. The main strategy of the CL scheme is to design a Markov model by explicitly including adaptive modulation and coding at the physical layer, queuing at the data link layer, and the bursty nature of multimedia traffic classes at the application layer. Utilizing this Markov model, several important performance metrics in terms of packet loss rate, delay, and throughput are examined. In addition, our proposed framework is exploited in various multimedia applications, for example, the end-to-end real-time video streaming and CL optimization, which require the priority-based QoS adaptation for different applications. More importantly, the CL framework reveals important guidelines as to optimize the network performance
Resumo:
We introduce a task-based programming model and runtime system that exploit the observation that not all parts of a program are equally significant for the accuracy of the end-result, in order to trade off the quality of program outputs for increased energy-efficiency. This is done in a structured and flexible way, allowing for easy exploitation of different points in the quality/energy space, without adversely affecting application performance. The runtime system can apply a number of different policies to decide whether it will execute less-significant tasks accurately or approximately.
The experimental evaluation indicates that our system can achieve an energy reduction of up to 83% compared with a fully accurate execution and up to 35% compared with an approximate version employing loop perforation. At the same time, our approach always results in graceful quality degradation.
Resumo:
We present a mathematically rigorous Quality-of-Service (QoS) metric which relates the achievable quality of service metric (QoS) for a real-time analytics service to the server energy cost of offering the service. Using a new iso-QoS evaluation methodology, we scale server resources to meet QoS targets and directly rank the servers in terms of their energy-efficiency and by extension cost of ownership. Our metric and method are platform-independent and enable fair comparison of datacenter compute servers with significant architectural diversity, including micro-servers. We deploy our metric and methodology to compare three servers running financial option pricing workloads on real-life market data. We find that server ranking is sensitive to data inputs and desired QoS level and that although scale-out micro-servers can be up to two times more energy-efficient than conventional heavyweight servers for the same target QoS, they are still six times less energy efficient than high-performance computational accelerators.
Resumo:
Do patterns in the YouTube viewing analytics of Lecture Capture videos point to areas of potential teaching and learning performance enhancement? The goal of this action based research project was to capture and quantitatively analyse the viewing behaviours and patterns of a series of video lecture captures across several computing modules in Queen’s University, Belfast, Northern Ireland. The research sought to establish if a quantitative analysis of viewing behaviours coupled with a qualitative evaluation of the material provided from the students could be correlated to provide generalised patterns that could then be used to understand the learning experience of students during face to face lectures and, thereby, present opportunities to reflectively enhance lecturer performance and the students’ overall learning experience and, ultimately, their level of academic attainment.
Resumo:
We present a rigorous methodology and new metrics for fair comparison of server and microserver platforms. Deploying our methodology and metrics, we compare a microserver with ARM cores against two servers with ×86 cores running the same real-time financial analytics workload. We define workload-specific but platform-independent performance metrics for platform comparison, targeting both datacenter operators and end users. Our methodology establishes that a server based on the Xeon Phi co-processor delivers the highest performance and energy efficiency. However, by scaling out energy-efficient microservers, we achieve competitive or better energy efficiency than a power-equivalent server with two Sandy Bridge sockets, despite the microserver's slower cores. Using a new iso-QoS metric, we find that the ARM microserver scales enough to meet market throughput demand, that is, a 100% QoS in terms of timely option pricing, with as little as 55% of the energy consumed by the Sandy Bridge server.
Resumo:
As the complexity of computing systems grows, reliability and energy are two crucial challenges asking for holistic solutions. In this paper, we investigate the interplay among concurrency, power dissipation, energy consumption and voltage-frequency scaling for a key numerical kernel for the solution of sparse linear systems. Concretely, we leverage a task-parallel implementation of the Conjugate Gradient method, equipped with an state-of-the-art pre-conditioner embedded in the ILUPACK software, and target a low-power multi core processor from ARM.In addition, we perform a theoretical analysis on the impact of a technique like Near Threshold Voltage Computing (NTVC) from the points of view of increased hardware concurrency and error rate.
Performance Research and Simulation Analysis of a Bidirectional On-Board Charger for V2G Application
Resumo:
In the reinsurance market, the risks natural catastrophes pose to portfolios of properties must be quantified, so that they can be priced, and insurance offered. The analysis of such risks at a portfolio level requires a simulation of up to 800 000 trials with an average of 1000 catastrophic events per trial. This is sufficient to capture risk for a global multi-peril reinsurance portfolio covering a range of perils including earthquake, hurricane, tornado, hail, severe thunderstorm, wind storm, storm surge and riverine flooding, and wildfire. Such simulations are both computation and data intensive, making the application of high-performance computing techniques desirable.
In this paper, we explore the design and implementation of portfolio risk analysis on both multi-core and many-core computing platforms. Given a portfolio of property catastrophe insurance treaties, key risk measures, such as probable maximum loss, are computed by taking both primary and secondary uncertainties into account. Primary uncertainty is associated with whether or not an event occurs in a simulated year, while secondary uncertainty captures the uncertainty in the level of loss due to the use of simplified physical models and limitations in the available data. A combination of fast lookup structures, multi-threading and careful hand tuning of numerical operations is required to achieve good performance. Experimental results are reported for multi-core processors and systems using NVIDIA graphics processing unit and Intel Phi many-core accelerators.
Resumo:
Background: Large-scale biological jobs on high-performance computing systems require manual intervention if one or more computing cores on which they execute fail. This places not only a cost on the maintenance of the job, but also a cost on the time taken for reinstating the job and the risk of losing data and execution accomplished by the job before it failed. Approaches which can proactively detect computing core failures and take action to relocate the computing core's job onto reliable cores can make a significant step towards automating fault tolerance. Method: This paper describes an experimental investigation into the use of multi-agent approaches for fault tolerance. Two approaches are studied, the first at the job level and the second at the core level. The approaches are investigated for single core failure scenarios that can occur in the execution of parallel reduction algorithms on computer clusters. A third approach is proposed that incorporates multi-agent technology both at the job and core level. Experiments are pursued in the context of genome searching, a popular computational biology application. Result: The key conclusion is that the approaches proposed are feasible for automating fault tolerance in high-performance computing systems with minimal human intervention. In a typical experiment in which the fault tolerance is studied, centralised and decentralised checkpointing approaches on an average add 90% to the actual time for executing the job. On the other hand, in the same experiment the multi-agent approaches add only 10% to the overall execution time