11 resultados para cloud computing, hypervisor, virtualizzazione, live migration, infrastructure as a service
em Indian Institute of Science - Bangalore - Índia
Resumo:
There are many applications such as software for processing customer records in telecom, patient records in hospitals, email processing software accessing a single email in a mailbox etc. which require to access a single record in a database consisting of millions of records. A basic feature of these applications is that they need to access data sets which are very large but simple. Cloud computing provides computing requirements for these kinds of new generation of applications involving very large data sets which cannot possibly be handled efficiently using traditional computing infrastructure. In this paper, we describe storage services provided by three well-known cloud service providers and give a comparison of their features with a view to characterize storage requirements of very large data sets as examples and we hope that it would act as a catalyst for the design of storage services for very large data set requirements in future. We also give a brief overview of other kinds of storage that have come up in the recent past for cloud computing.
Resumo:
Realization of cloud computing has been possible due to availability of virtualization technologies on commodity platforms. Measuring resource usage on the virtualized servers is difficult because of the fact that the performance counters used for resource accounting are not virtualized. Hence, many of the prevalent virtualization technologies like Xen, VMware, KVM etc., use host specific CPU usage monitoring, which is coarse grained. In this paper, we present a performance monitoring tool for KVM based virtualized machines, which measures the CPU overhead incurred by the hypervisor on behalf of the virtual machine along-with the CPU usage of virtual machine itself. This fine-grained resource usage information, provided by the above tool, can be used for diverse situations like resource provisioning to support performance associated QoS requirements, identification of bottlenecks during VM placements, resource profiling of applications in cloud environments, etc. We demonstrate a use case of this tool by measuring the performance of web-servers hosted on a KVM based virtualized server.
Resumo:
Elasticity in cloud systems provides the flexibility to acquire and relinquish computing resources on demand. However, in current virtualized systems resource allocation is mostly static. Resources are allocated during VM instantiation and any change in workload leading to significant increase or decrease in resources is handled by VM migration. Hence, cloud users tend to characterize their workloads at a coarse grained level which potentially leads to under-utilized VM resources or under performing application. A more flexible and adaptive resource allocation mechanism would benefit variable workloads, such as those characterized by web servers. In this paper, we present an elastic resources framework for IaaS cloud layer that addresses this need. The framework provisions for application workload forecasting engine, that predicts at run-time the expected demand, which is input to the resource manager to modulate resource allocation based on the predicted demand. Based on the prediction errors, resources can be over-allocated or under-allocated as compared to the actual demand made by the application. Over-allocation leads to unused resources and under allocation could cause under performance. To strike a good trade-off between over-allocation and under-performance we derive an excess cost model. In this model excess resources allocated are captured as over-allocation cost and under-allocation is captured as a penalty cost for violating application service level agreement (SLA). Confidence interval for predicted workload is used to minimize this excess cost with minimal effect on SLA violations. An example case-study for an academic institute web server workload is presented. Using the confidence interval to minimize excess cost, we achieve significant reduction in resource allocation requirement while restricting application SLA violations to below 2-3%.
Resumo:
In this paper we present a combination of technologies to provide an Energy-on-Demand (EoD) service to enable low cost innovation suitable for microgrid networks. The system is designed around the low cost and simple Rural Energy Device (RED) Box which in combination with Short Message Service (SMS) communication methodology serves as an elementary proxy for Smart meters which are typically used in urban settings. Further, customer behavior and familiarity in using such devices based on mobile experience has been incorporated into the design philosophy. Customers are incentivized to interact with the system thus providing valuable behavioral and usage data to the Utility Service Provider (USP). Data that is collected over time can be used by the USP for analytics envisioned by using remote computing services known as cloud computing service. Cloud computing allows for a sharing of computational resources at the virtual level across several networks. The customer-system interaction is facilitated by a third party Telecom Service provider (TSP). The approximate cost of the RED Box is envisaged to be under USD 10 on production scale.
Resumo:
Virtualization is one of the key enabling technologies for Cloud computing. Although it facilitates improved utilization of resources, virtualization can lead to performance degradation due to the sharing of physical resources like CPU, memory, network interfaces, disk controllers, etc. Multi-tenancy can cause highly unpredictable performance for concurrent I/O applications running inside virtual machines that share local disk storage in Cloud. Disk I/O requests in a typical Cloud setup may have varied requirements in terms of latency and throughput as they arise from a range of heterogeneous applications having diverse performance goals. This necessitates providing differential performance services to different I/O applications. In this paper, we present PriDyn, a novel scheduling framework which is designed to consider I/O performance metrics of applications such as acceptable latency and convert them to an appropriate priority value for disk access based on the current system state. This framework aims to provide differentiated I/O service to various applications and ensures predictable performance for critical applications in multi-tenant Cloud environment. We demonstrate through experimental validations on real world I/O traces that this framework achieves appreciable enhancements in I/O performance, indicating that this approach is a promising step towards enabling QoS guarantees on Cloud storage.
Resumo:
Exascale systems of the future are predicted to have mean time between failures (MTBF) of less than one hour. Malleable applications, where the number of processors on which the applications execute can be changed during executions, can make use of their malleability to better tolerate high failure rates. We present AdFT, an adaptive fault tolerance framework for long running malleable applications to maximize application performance in the presence of failures. AdFT framework includes cost models for evaluating the benefits of various fault tolerance actions including checkpointing, live-migration and rescheduling, and runtime decisions for dynamically selecting the fault tolerance actions at different points of application execution to maximize performance. Simulations with real and synthetic failure traces show that our approach outperforms existing fault tolerance mechanisms for malleable applications yielding up to 23% improvement in application performance, and is effective even for petascale systems and beyond.
Resumo:
Moore's Law has driven the semiconductor revolution enabling over four decades of scaling in frequency, size, complexity, and power. However, the limits of physics are preventing further scaling of speed, forcing a paradigm shift towards multicore computing and parallelization. In effect, the system is taking over the role that the single CPU was playing: high-speed signals running through chips but also packages and boards connect ever more complex systems. High-speed signals making their way through the entire system cause new challenges in the design of computing hardware. Inductance, phase shifts and velocity of light effects, material resonances, and wave behavior become not only prevalent but need to be calculated accurately and rapidly to enable short design cycle times. In essence, to continue scaling with Moore's Law requires the incorporation of Maxwell's equations in the design process. Incorporating Maxwell's equations into the design flow is only possible through the combined power that new algorithms, parallelization and high-speed computing provide. At the same time, incorporation of Maxwell-based models into circuit and system-level simulation presents a massive accuracy, passivity, and scalability challenge. In this tutorial, we navigate through the often confusing terminology and concepts behind field solvers, show how advances in field solvers enable integration into EDA flows, present novel methods for model generation and passivity assurance in large systems, and demonstrate the power of cloud computing in enabling the next generation of scalable Maxwell solvers and the next generation of Moore's Law scaling of systems. We intend to show the truly symbiotic growing relationship between Maxwell and Moore!
Resumo:
The move towards IT outsourcing is the first step towards an environment where compute infrastructure is treated as a service. In utility computing this IT service has to honor Service Level Agreements (SLA) in order to meet the desired Quality of Service (QoS) guarantees. Such an environment requires reliable services in order to maximize the utilization of the resources and to decrease the Total Cost of Ownership (TCO). Such reliability cannot come at the cost of resource duplication, since it increases the TCO of the data center and hence the cost per compute unit. We, in this paper, look into aspects of projecting impact of hardware failures on the SLAs and techniques required to take proactive recovery steps in case of a predicted failure. By maintaining health vectors of all hardware and system resources, we predict the failure probability of resources based on observed hardware errors/failure events, at runtime. This inturn influences an availability aware middleware to take proactive action (even before the application is affected in case the system and the application have low recoverability). The proposed framework has been prototyped on a system running HP-UX. Our offline analysis of the prediction system on hardware error logs indicate no more than 10% false positives. This work to the best of our knowledge is the first of its kind to perform an end-to-end analysis of the impact of a hardware fault on application SLAs, in a live system.
Resumo:
There have been several studies on the performance of TCP controlled transfers over an infrastructure IEEE 802.11 WLAN, assuming perfect channel conditions. In this paper, we develop an analytical model for the throughput of TCP controlled file transfers over the IEEE 802.11 DCF with different packet error probabilities for the stations, accounting for the effect of packet drops on the TCP window. Our analysis proceeds by combining two models: one is an extension of the usual TCP-over-DCF model for an infrastructure WLAN, where the throughput of a station depends on the probability that the head-of-the-line packet at the Access Point belongs to that station; the second is a model for the TCP window process for connections with different drop probabilities. Iterative calculations between these models yields the head-of-the-line probabilities, and then, performance measures such as the throughputs and packet failure probabilities can be derived. We find that, due to MAC layer retransmissions, packet losses are rare even with high channel error probabilities and the stations obtain fair throughputs even when some of them have packet error probabilities as high as 0.1 or 0.2. For some restricted settings we are also able to model tail-drop loss at the AP. Although involving many approximations, the model captures the system behavior quite accurately, as compared with simulations.
Resumo:
Scalable stream processing and continuous dataflow systems are gaining traction with the rise of big data due to the need for processing high velocity data in near real time. Unlike batch processing systems such as MapReduce and workflows, static scheduling strategies fall short for continuous dataflows due to the variations in the input data rates and the need for sustained throughput. The elastic resource provisioning of cloud infrastructure is valuable to meet the changing resource needs of such continuous applications. However, multi-tenant cloud resources introduce yet another dimension of performance variability that impacts the application's throughput. In this paper we propose PLAStiCC, an adaptive scheduling algorithm that balances resource cost and application throughput using a prediction-based lookahead approach. It not only addresses variations in the input data rates but also the underlying cloud infrastructure. In addition, we also propose several simpler static scheduling heuristics that operate in the absence of accurate performance prediction model. These static and adaptive heuristics are evaluated through extensive simulations using performance traces obtained from Amazon AWS IaaS public cloud. Our results show an improvement of up to 20% in the overall profit as compared to the reactive adaptation algorithm.