61 resultados para cloud computing datacenter performance QoS
Resumo:
In this paper, we present a decentralized dynamic load scheduling/balancing algorithm called ELISA (Estimated Load Information Scheduling Algorithm) for general purpose distributed computing systems. ELISA uses estimated state information based upon periodic exchange of exact state information between neighbouring nodes to perform load scheduling. The primary objective of the algorithm is to cut down on the communication and load transfer overheads by minimizing the frequency of status exchange and by restricting the load transfer and status exchange within the buddy set of a processor. It is shown that the resulting algorithm performs almost as well as a perfect information algorithm and is superior to other load balancing schemes based on the random sharing and Ni-Hwang algorithms. A sensitivity analysis to study the effect of various design parameters on the effectiveness of load balancing is also carried out. Finally, the algorithm's performance is tested on large dimensional hypercubes in the presence of time-varying load arrival process and is shown to perform well in comparison to other algorithms. This makes ELISA a viable and implementable load balancing algorithm for use in general purpose distributed computing systems.
Resumo:
Simultaneous consideration of both performance and reliability issues is important in the choice of computer architectures for real-time aerospace applications. One of the requirements for such a fault-tolerant computer system is the characteristic of graceful degradation. A shared and replicated resources computing system represents such an architecture. In this paper, a combinatorial model is used for the evaluation of the instruction execution rate of a degradable, replicated resources computing system such as a modular multiprocessor system. Next, a method is presented to evaluate the computation reliability of such a system utilizing a reliability graph model and the instruction execution rate. Finally, this computation reliability measure, which simultaneously describes both performance and reliability, is applied as a constraint in an architecture optimization model for such computing systems. Index Terms-Architecture optimization, computation
Resumo:
This paper is aimed at reviewing the notion of Byzantine-resilient distributed computing systems, the relevant protocols and their possible applications as reported in the literature. The three agreement problems, namely, the consensus problem, the interactive consistency problem, and the generals problem have been discussed. Various agreement protocols for the Byzantine generals problem have been summarized in terms of their performance and level of fault-tolerance. The three classes of Byzantine agreement protocols discussed are the deterministic, randomized, and approximate agreement protocols. Finally, application of the Byzantine agreement protocols to clock synchronization is highlighted.
Resumo:
Loads that miss in L1 or L2 caches and waiting for their data at the head of the ROB cause significant slow down in the form of commit stalls. We identify that most of these commit stalls are caused by a small set of loads, referred to as LIMCOS (Loads Incurring Majority of COmmit Stalls). We propose simple history-based classifiers that track commit stalls suffered by loads to help us identify this small set of loads. We study an application of these classifiers to prefetching. The classifiers are used to train the prefetcher to focus on the misses suffered by LIMCOS. This, referred to as focused prefetching, results in a 9.8% gain in IPC over naive GHB based delta correlation prefetcher along with a 20.3% reduction in memory traffic for a set of 17 memory-intensive SPEC2000 benchmarks. Another important impact of focused prefetching is a 61% improvement in the accuracy of prefetches. We demonstrate that the proposed classification criterion performs better than other existing criteria like criticality and delinquent loads. Also we show that the criterion of focusing on commit stalls is robust enough across cache levels and can be applied to any prefetcher without any modifications to the prefetcher.
Resumo:
In our earlier work ([1]) we proposed WLAN Manager (or WM) a centralised controller for QoS management of infrastructure WLANs based on the IEEE 802.11 DCF standards. The WM approach is based on queueing and scheduling packets in a device that sits between all traffic flowing between the APs and the wireline LAN, requires no changes to the AP or the STAs, and can be viewed as implementing a "Split-MAC" architecture. The objectives of WM were to manage various TCP performance related issues (such as the throughput "anomaly" when STAs associate with an AP with mixed PHY rates, and upload-download unfairness induced by finite AP buffers), and also to serve as the controller for VoIP admission control and handovers, and for other QoS management measures. In this paper we report our experiences in implementing the proposals in [1]: the insights gained, new control techniques developed, and the effectiveness of the WM approach in managing TCP performance in an infrastructure WLAN. We report results from a hybrid experiment where a physical WM manages actual TCP controlled packet flows between a server and clients, with the WLAN being simulated, and also from a small physical testbed with an actual AP.
Resumo:
In this paper, we propose an extension to the I/O device architecture, as recommended in the PCI-SIG IOV specification, for virtualizing network I/O devices. The aim is to enable fine-grained controls to a virtual machine on the I/O path of a shared device. The architecture allows native access of I/O devices to virtual machines and provides device level QoS hooks for controlling VM specific device usage. For evaluating the architecture we use layered queuing network (LQN) models. We implement the architecture and evaluate it using simulation techniques, on the LQN model, to demonstrate the benefits. With the architecture, the benefit for network I/O is 60% more than what can be expected on the existing architecture. Also, the proposed architecture improves scalability in terms of the number of virtual machines intending to share the I/O device.
Resumo:
Effectiveness evaluation of aerospace fault-tolerant computing systems used in a phased-mission environment is rather tricky and difficult because of the interaction of its several degraded performance levels with the multiple objectives of the mission and the use environment. Part I uses an approach based on multiobjective phased-mission analysis to evaluate the effectiveness of a distributed avionics architecture used in a transport aircraft. Part II views the computing system as a multistate s-coherent structure. Lower bounds on the probabilities of accomplishing various levels of performance are evaluated.
Resumo:
Pricing is an effective tool to control congestion and achieve quality of service (QoS) provisioning for multiple differentiated levels of service. In this paper, we consider the problem of pricing for congestion control in the case of a network of nodes under multiple service classes. Our work draws upon [1] and [2] in various ways. We use the Tirupati pricing scheme in conjunction with the stochastic approximation based adaptive pricing methodology for queue control (proposed in [1]) for minimizing network congestion. However, unlike the methodology of [1] where pricing for entire routes is directly considered, we consider prices for individual link-service grade tuples. Further, we adapt the methodology proposed in [21 for a single-node scenario to the case of a network of nodes, for evaluating performance in terms of price, revenue rate and disutility. We obtain considerable performance improvements using our approach over that in [1]. In particular, our approach exhibits a throughput improvement in the range of 54 to 80 percent in all cases studied (over all routes) while exhibiting a lower packet delay in the range of 26 to 38 percent over the scheme in [1].
Resumo:
The IEEE 802.1le medium access control (MAC) standard provides distributed service differentiation or Quality-of- Service (QoS) by employing a priority system. In 802.1 le networks, network traffic is classified into different priorities or access categories (ACs). Nodes maintain separate queues for each AC and packets at the head-of-line (HOL) of each queue contend for channel access using AC-specific parameters. Such a mechanism allows the provision of differentiated QoS where high priority, performance sensitive traffic such as voice and video applications will enjoy less delay, greater throughput and smaller loss, compared to low priority traffic (e. g. file transfer). The standard implicitly assumes that nodes are honest and will truthfully classify incoming traffic into its appropriate AC. However, in the absence of any additional mechanism, selfish users can gain enhanced performance by selectively classifying low priority traffic as high priority, potentially destroying the QoS capability of the system.
Resumo:
Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Although clustering helps by improving clock speed, reducing energy consumption of the logic, and making the design simpler, it introduces extra overheads by way of inter-cluster communication. This communication happens over long global wires which leads to delay in execution and significantly high energy consumption.In this paper, we propose a new instruction scheduling algorithm that exploits scheduling slacks of instructions and communication slacks of data values together to achieve better energy-performance trade-offs for clustered architectures with heterogeneous interconnect. Our instruction scheduling algorithm achieves 35% and 40% reduction in communication energy, whereas the overall energy-delay product improves by 4.5% and 6.5% respectively for 2 cluster and 4 cluster machines with marginal increase (1.6% and 1.1%) in execution time. Our test bed uses the Trimaran compiler infrastructure.
Resumo:
The move towards IT outsourcing is the first step towards an environment where compute infrastructure is treated as a service. In utility computing this IT service has to honor Service Level Agreements (SLA) in order to meet the desired Quality of Service (QoS) guarantees. Such an environment requires reliable services in order to maximize the utilization of the resources and to decrease the Total Cost of Ownership (TCO). Such reliability cannot come at the cost of resource duplication, since it increases the TCO of the data center and hence the cost per compute unit. We, in this paper, look into aspects of projecting impact of hardware failures on the SLAs and techniques required to take proactive recovery steps in case of a predicted failure. By maintaining health vectors of all hardware and system resources, we predict the failure probability of resources based on observed hardware errors/failure events, at runtime. This inturn influences an availability aware middleware to take proactive action (even before the application is affected in case the system and the application have low recoverability). The proposed framework has been prototyped on a system running HP-UX. Our offline analysis of the prediction system on hardware error logs indicate no more than 10% false positives. This work to the best of our knowledge is the first of its kind to perform an end-to-end analysis of the impact of a hardware fault on application SLAs, in a live system.
Resumo:
An important issue in the design of a distributed computing system (DCS) is the development of a suitable protocol. This paper presents an effort to systematize the protocol design procedure for a DCS. Protocol design and development can be divided into six phases: specification of the DCS, specification of protocol requirements, protocol design, specification and validation of the designed protocol, performance evaluation, and hardware/software implementation. This paper describes techniques for the second and third phases, while the first phase has been considered by the authors in their earlier work. Matrix and set theoretic based approaches are used for specification of a DCS and for specification of the protocol requirements. These two formal specification techniques form the basis of the development of a simple and straightforward procedure for the design of the protocol. The applicability of the above design procedure has been illustrated by considering an example of a computing system encountered on board a spacecraft. A Petri-net based approach has been adopted to model the protocol. The methodology developed in this paper can be used in other DCS applications.
Resumo:
A fuzzy system is developed using a linearized performance model of the gas turbine engine for performing gas turbine fault isolation from noisy measurements. By using a priori information about measurement uncertainties and through design variable linking, the design of the fuzzy system is posed as an optimization problem with low number of design variables which can be solved using the genetic algorithm in considerably low amount of computer time. The faults modeled are module faults in five modules: fan, low pressure compressor, high pressure compressor, high pressure turbine and low pressure turbine. The measurements used are deviations in exhaust gas temperature, low rotor speed, high rotor speed and fuel flow from a base line 'good engine'. The genetic fuzzy system (GFS) allows rapid development of the rule base if the fault signatures and measurement uncertainties change which happens for different engines and airlines. In addition, the genetic fuzzy system reduces the human effort needed in the trial and error process used to design the fuzzy system and makes the development of such a system easier and faster. A radial basis function neural network (RBFNN) is also used to preprocess the measurements before fault isolation. The RBFNN shows significant noise reduction and when combined with the GFS leads to a diagnostic system that is highly robust to the presence of noise in data. Showing the advantage of using a soft computing approach for gas turbine diagnostics.
Analyzing Cache Performance Bottlenecks of STM Applications and addressing them with Compiler's help
Resumo:
Software transactional memory (STM) is a promising programming paradigm for shared memory multithreaded programs as an alternative to traditional lock based synchronization. However adoption of STM in mainstream software has been quite low due to its considerable overheads and its poor cache/memory performance. In this paper, we perform a detailed study of the cache behavior of STM applications and quantify the impact of different STM factors on the cache misses experienced by the applications. Based on our analysis, we propose a compiler driven Lock-Data Colocation (LDC), targeted at reducing the cache overheads on STM. We show that LDC is effective in improving the cache behavior of STM applications by reducing the dcache miss latency and improving execution time performance.