992 resultados para Amazon EC2
Resumo:
En aquest Treball Final de Grau s'ha implementat un clúster d'alta eficiència (HTC) amb nodes físics d'una aula d'informàtica i nodes virtuals a partir d'instàncies d'Amazon EC2 (Elastic Compute Cloud). Mitjançant la creació de dos bash scripts es gestionarà la posada en marxa dels nodes del clúster, tant físics com virtuals, per tal d'optimitzar l'ús dels recursos físics, reduir el cost energètic i disposar de la flexibilitat que ofereix el sistema de màquines virtuals d'Amazon EC2.
Resumo:
Data analytic applications are characterized by large data sets that are subject to a series of processing phases. Some of these phases are executed sequentially but others can be executed concurrently or in parallel on clusters, grids or clouds. The MapReduce programming model has been applied to process large data sets in cluster and cloud environments. For developing an application using MapReduce there is a need to install/configure/access specific frameworks such as Apache Hadoop or Elastic MapReduce in Amazon Cloud. It would be desirable to provide more flexibility in adjusting such configurations according to the application characteristics. Furthermore the composition of the multiple phases of a data analytic application requires the specification of all the phases and their orchestration. The original MapReduce model and environment lacks flexible support for such configuration and composition. Recognizing that scientific workflows have been successfully applied to modeling complex applications, this paper describes our experiments on implementing MapReduce as subworkflows in the AWARD framework (Autonomic Workflow Activities Reconfigurable and Dynamic). A text mining data analytic application is modeled as a complex workflow with multiple phases, where individual workflow nodes support MapReduce computations. As in typical MapReduce environments, the end user only needs to define the application algorithms for input data processing and for the map and reduce functions. In the paper we present experimental results when using the AWARD framework to execute MapReduce workflows deployed over multiple Amazon EC2 (Elastic Compute Cloud) instances.
Resumo:
Video transcoding refers to the process of converting a digital video from one format into another format. It is a compute-intensive operation. Therefore, transcoding of a large number of simultaneous video streams requires a large amount of computing resources. Moreover, to handle di erent load conditions in a cost-e cient manner, the video transcoding service should be dynamically scalable. Infrastructure as a Service Clouds currently offer computing resources, such as virtual machines, under the pay-per-use business model. Thus the IaaS Clouds can be leveraged to provide a coste cient, dynamically scalable video transcoding service. To use computing resources e ciently in a cloud computing environment, cost-e cient virtual machine provisioning is required to avoid overutilization and under-utilization of virtual machines. This thesis presents proactive virtual machine resource allocation and de-allocation algorithms for video transcoding in cloud computing. Since users' requests for videos may change at di erent times, a check is required to see if the current computing resources are adequate for the video requests. Therefore, the work on admission control is also provided. In addition to admission control, temporal resolution reduction is used to avoid jitters in a video. Furthermore, in a cloud computing environment such as Amazon EC2, the computing resources are more expensive as compared with the storage resources. Therefore, to avoid repetition of transcoding operations, a transcoded video needs to be stored for a certain time. To store all videos for the same amount of time is also not cost-e cient because popular transcoded videos have high access rate while unpopular transcoded videos are rarely accessed. This thesis provides a cost-e cient computation and storage trade-o strategy, which stores videos in the video repository as long as it is cost-e cient to store them. This thesis also proposes video segmentation strategies for bit rate reduction and spatial resolution reduction video transcoding. The evaluation of proposed strategies is performed using a message passing interface based video transcoder, which uses a coarse-grain parallel processing approach where video is segmented at group of pictures level.
Resumo:
Es una plataforma de doble lado que tiene como objetivo fundamental acercar a los buscadores de talento de una forma eficaz y precisa a los actores. Esta propuesta empresarial quiere mejorar las condiciones laborales de estos últimos mediante la tecnología disponible y un know know digital. Así mismo, generar el primer banco consolidado de talentos del país y contribuir a la solidificación de un segmento cultural.
Resumo:
Cloud services are becoming ever more important for everyone's life. Cloud storage? Web mails? Yes, we don't need to be working in big IT companies to be surrounded by cloud services. Another thing that's growing in importance, or at least that should be considered ever more important, is the concept of privacy. The more we rely on services of which we know close to nothing about, the more we should be worried about our privacy. In this work, I will analyze a prototype software based on a peer to peer architecture for the offering of cloud services, to see if it's possible to make it completely anonymous, meaning that not only the users using it will be anonymous, but also the Peers composing it will not know the real identity of each others. To make it possible, I will make use of anonymizing networks like Tor. I will start by studying the state of art of Cloud Computing, by looking at some real example, followed by analyzing the architecture of the prototype, trying to expose the differences between its distributed nature and the somehow centralized solutions offered by the famous vendors. After that, I will get as deep as possible into the working principle of the anonymizing networks, because they are not something that can just be 'applied' mindlessly. Some de-anonymizing techniques are very subtle so things must be studied carefully. I will then implement the required changes, and test the new anonymized prototype to see how its performances differ from those of the standard one. The prototype will be run on many machines, orchestrated by a tester script that will automatically start, stop and do all the required API calls. As to where to find all these machines, I will make use of Amazon EC2 cloud services and their on-demand instances.
Resumo:
In just a few years cloud computing has become a very popular paradigm and a business success story, with storage being one of the key features. To achieve high data availability, cloud storage services rely on replication. In this context, one major challenge is data consistency. In contrast to traditional approaches that are mostly based on strong consistency, many cloud storage services opt for weaker consistency models in order to achieve better availability and performance. This comes at the cost of a high probability of stale data being read, as the replicas involved in the reads may not always have the most recent write. In this paper, we propose a novel approach, named Harmony, which adaptively tunes the consistency level at run-time according to the application requirements. The key idea behind Harmony is an intelligent estimation model of stale reads, allowing to elastically scale up or down the number of replicas involved in read operations to maintain a low (possibly zero) tolerable fraction of stale reads. As a result, Harmony can meet the desired consistency of the applications while achieving good performance. We have implemented Harmony and performed extensive evaluations with the Cassandra cloud storage on Grid?5000 testbed and on Amazon EC2. The results show that Harmony can achieve good performance without exceeding the tolerated number of stale reads. For instance, in contrast to the static eventual consistency used in Cassandra, Harmony reduces the stale data being read by almost 80% while adding only minimal latency. Meanwhile, it improves the throughput of the system by 45% while maintaining the desired consistency requirements of the applications when compared to the strong consistency model in Cassandra.
Resumo:
The inherent complexity of modern cloud infrastructures has created the need for innovative monitoring approaches, as state-of-the-art solutions used for other large-scale environments do not address specific cloud features. Although cloud monitoring is nowadays an active research field, a comprehensive study covering all its aspects has not been presented yet. This paper provides a deep insight into cloud monitoring. It proposes a unified cloud monitoring taxonomy, based on which it defines a layered cloud monitoring architecture. To illustrate it, we have implemented GMonE, a general-purpose cloud monitoring tool which covers all aspects of cloud monitoring by specifically addressing the needs of modern cloud infrastructures. Furthermore, we have evaluated the performance, scalability and overhead of GMonE with Yahoo Cloud Serving Benchmark (YCSB), by using the OpenNebula cloud middleware on the Grid’5000 experimental testbed. The results of this evaluation demonstrate the benefits of our approach, surpassing the monitoring performance and capabilities of cloud monitoring alternatives such as those present in state-of-the-art systems such as Amazon EC2 and OpenNebula.
Resumo:
Internet traffic classification is a relevant and mature research field, anyway of growing importance and with still open technical challenges, also due to the pervasive presence of Internet-connected devices into everyday life. We claim the need for innovative traffic classification solutions capable of being lightweight, of adopting a domain-based approach, of not only concentrating on application-level protocol categorization but also classifying Internet traffic by subject. To this purpose, this paper originally proposes a classification solution that leverages domain name information extracted from IPFIX summaries, DNS logs, and DHCP leases, with the possibility to be applied to any kind of traffic. Our proposed solution is based on an extension of Word2vec unsupervised learning techniques running on a specialized Apache Spark cluster. In particular, learning techniques are leveraged to generate word-embeddings from a mixed dataset composed by domain names and natural language corpuses in a lightweight way and with general applicability. The paper also reports lessons learnt from our implementation and deployment experience that demonstrates that our solution can process 5500 IPFIX summaries per second on an Apache Spark cluster with 1 slave instance in Amazon EC2 at a cost of $ 3860 year. Reported experimental results about Precision, Recall, F-Measure, Accuracy, and Cohen's Kappa show the feasibility and effectiveness of the proposal. The experiments prove that words contained in domain names do have a relation with the kind of traffic directed towards them, therefore using specifically trained word embeddings we are able to classify them in customizable categories. We also show that training word embeddings on larger natural language corpuses leads improvements in terms of precision up to 180%.
Resumo:
This work contributes to the development of search engines that self-adapt their size in response to fluctuations in workload. Deploying a search engine in an Infrastructure as a Service (IaaS) cloud facilitates allocating or deallocating computational resources to or from the engine. In this paper, we focus on the problem of regrouping the metric-space search index when the number of virtual machines used to run the search engine is modified to reflect changes in workload. We propose an algorithm for incrementally adjusting the index to fit the varying number of virtual machines. We tested its performance using a custom-build prototype search engine deployed in the Amazon EC2 cloud, while calibrating the results to compensate for the performance fluctuations of the platform. Our experiments show that, when compared with computing the index from scratch, the incremental algorithm speeds up the index computation 2–10 times while maintaining a similar search performance.
Resumo:
This research focuses on automatically adapting a search engine size in response to fluctuations in query workload. Deploying a search engine in an Infrastructure as a Service (IaaS) cloud facilitates allocating or deallocating computer resources to or from the engine. Our solution is to contribute an adaptive search engine that will repeatedly re-evaluate its load and, when appropriate, switch over to a dierent number of active processors. We focus on three aspects and break them out into three sub-problems as follows: Continually determining the Number of Processors (CNP), New Grouping Problem (NGP) and Regrouping Order Problem (ROP). CNP means that (in the light of the changes in the query workload in the search engine) there is a problem of determining the ideal number of processors p active at any given time to use in the search engine and we call this problem CNP. NGP happens when changes in the number of processors are determined and it must also be determined which groups of search data will be distributed across the processors. ROP is how to redistribute this data onto processors while keeping the engine responsive and while also minimising the switchover time and the incurred network load. We propose solutions for these sub-problems. For NGP we propose an algorithm for incrementally adjusting the index to t the varying number of virtual machines. For ROP we present an ecient method for redistributing data among processors while keeping the search engine responsive. Regarding the solution for CNP, we propose an algorithm determining the new size of the search engine by re-evaluating its load. We tested the solution performance using a custom-build prototype search engine deployed in the Amazon EC2 cloud. Our experiments show that when we compare our NGP solution with computing the index from scratch, the incremental algorithm speeds up the index computation 2{10 times while maintaining a similar search performance. The chosen redistribution method is 25% to 50% faster than other methods and reduces the network load around by 30%. For CNP we present a deterministic algorithm that shows a good ability to determine a new size of search engine. When combined, these algorithms give an adapting algorithm that is able to adjust the search engine size with a variable workload.
Resumo:
Mobile Cloud Computing promises to overcome the physical limitations of mobile devices by executing demanding mobile applications on cloud infrastructure. In practice, implementing this paradigm is difficult; network disconnection often occurs, bandwidth may be limited, and a large power draw is required from the battery, resulting in a poor user experience. This thesis presents a mobile cloud middleware solution, Context Aware Mobile Cloud Services (CAMCS), which provides cloudbased services to mobile devices, in a disconnected fashion. An integrated user experience is delivered by designing for anticipated network disconnection, and low data transfer requirements. CAMCS achieves this by means of the Cloud Personal Assistant (CPA); each user of CAMCS is assigned their own CPA, which can complete user-assigned tasks, received as descriptions from the mobile device, by using existing cloud services. Service execution is personalised to the user's situation with contextual data, and task execution results are stored with the CPA until the user can connect with his/her mobile device to obtain the results. Requirements for an integrated user experience are outlined, along with the design and implementation of CAMCS. The operation of CAMCS and CPAs with cloud-based services is presented, specifically in terms of service description, discovery, and task execution. The use of contextual awareness to personalise service discovery and service consumption to the user's situation is also presented. Resource management by CAMCS is also studied, and compared with existing solutions. Additional application models that can be provided by CAMCS are also presented. Evaluation is performed with CAMCS deployed on the Amazon EC2 cloud. The resource usage of the CAMCS Client, running on Android-based mobile devices, is also evaluated. A user study with volunteers using CAMCS on their own mobile devices is also presented. Results show that CAMCS meets the requirements outlined for an integrated user experience.
Resumo:
We present Dithen, a novel computation-as-a-service (CaaS) cloud platform specifically tailored to the parallel ex-ecution of large-scale multimedia tasks. Dithen handles the upload/download of both multimedia data and executable items, the assignment of compute units to multimedia workloads, and the reactive control of the available compute units to minimize the cloud infrastructure cost under deadline-abiding execution. Dithen combines three key properties: (i) the reactive assignment of individual multimedia tasks to available computing units according to availability and predetermined time-to-completion constraints; (ii) optimal resource estimation based on Kalman-filter estimates; (iii) the use of additive increase multiplicative decrease (AIMD) algorithms (famous for being the resource management in the transport control protocol) for the control of the number of units servicing workloads. The deployment of Dithen over Amazon EC2 spot instances is shown to be capable of processing more than 80,000 video transcoding, face detection and image processing tasks (equivalent to the processing of more than 116 GB of compressed data) for less than $1 in billing cost from EC2. Moreover, the proposed AIMD-based control mechanism, in conjunction with the Kalman estimates, is shown to provide for more than 27% reduction in EC2 spot instance cost against methods based on reactive resource estimation. Finally, Dithen is shown to offer a 38% to 500% reduction of the billing cost against the current state-of-the-art in CaaS platforms on Amazon EC2 (Amazon Lambda and Amazon Autoscale). A baseline version of Dithen is currently available at dithen.com.
Resumo:
En la actualidad, el uso del Cloud Computing se está incrementando y existen muchos proveedores que ofrecen servicios que hacen uso de esta tecnología. Uno de ellos es Amazon Web Services, que a través de su servicio Amazon EC2, nos ofrece diferentes tipos de instancias que podemos utilizar según nuestras necesidades. El modelo de negocio de AWS se basa en el pago por uso, es decir, solo realizamos el pago por el tiempo que se utilicen las instancias. En este trabajo se implementa en Amazon EC2, una aplicación cuyo objetivo es extraer de diferentes fuentes de información, los datos de las ventas realizadas por las editoriales y librerías de España. Estos datos son procesados, cargados en una base de datos y con ellos se generan reportes estadísticos, que ayudarán a los clientes a tomar mejores decisiones. Debido a que la aplicación procesa una gran cantidad de datos, se propone el desarrollo y validación de un modelo, que nos permita obtener una ejecución óptima en Amazon EC2. En este modelo se tienen en cuenta el tiempo de ejecución, el coste por uso y una métrica de coste/rendimiento. Adicionalmente, se utilizará la tecnología de contenedores Docker para llevar a cabo un caso específico del despliegue de la aplicación.
Resumo:
In the Amazon Region, there is a virtual absence of severe malaria and few fatal cases of naturally occurring Plasmodium falciparum infections; this presents an intriguing and underexplored area of research. In addition to the rapid access of infected persons to effective treatment, one cause of this phenomenon might be the recognition of cytoadherent variant proteins on the infected red blood cell (IRBC) surface, including the var gene encoded P. falciparum erythrocyte membrane protein 1. In order to establish a link between cytoadherence, IRBC surface antibody recognition and the presence or absence of malaria symptoms, we phenotype-selected four Amazonian P. falciparum isolates and the laboratory strain 3D7 for their cytoadherence to CD36 and ICAM1 expressed on CHO cells. We then mapped the dominantly expressed var transcripts and tested whether antibodies from symptomatic or asymptomatic infections showed a differential recognition of the IRBC surface. As controls, the 3D7 lineages expressing severe disease-associated phenotypes were used. We showed that there was no profound difference between the frequency and intensity of antibody recognition of the IRBC-exposed P. falciparum proteins in symptomatic vs. asymptomatic infections. The 3D7 lineages, which expressed severe malaria-associated phenotypes, were strongly recognised by most, but not all plasmas, meaning that the recognition of these phenotypes is frequent in asymptomatic carriers, but is not necessarily a prerequisite to staying free of symptoms.
Resumo:
Abstract Introduction: Hypertension (HTN) is a preventable cause of cardiovascular morbidity and mortality. To compare the prevalence, awareness, treatment, and control of HTN among urban and riverside populations in Porto Velho, Amazon region. We conducted a cross-sectional study between July and December 2013 based on a household survey of individuals aged 35-80 years. Interviews by using a standardized questionnaire, and blood pressure (BP), weight, height, and waist circumference measurements were performed. HTN was defined when individuals reported having the disease, received antihypertensive medications, or had a systolic BP ≥ 140 mm Hg or diastolic BP ≥ 90 mm Hg. Awareness was based on self-reports and the use of antihypertensive medications. Control was defined as a BP ≤ 140/90 mm Hg. Among the 1410 participants, 750 (53.19%) had HTN and 473 (63.06%) had diagnosis awareness, of whom 404 (85.41%) received pharmacological treatment but with low control rate. The prevalence and treatment rates were higher in the urban areas (55.48% vs. 48.87% [p = 0.02] and 61.25% vs. 52.30% [p < 0.01], respectively). HTN awareness was higher in the riverside area (61.05% vs. 67.36% ; p < 0.01), but the control rates showed no statistically significant difference (22.11% vs. 23.43% ; p = 0.69). HTN prevalence was higher in the urban population than in the riverside population. Of the hypertensive individuals in both areas, <25% had controlled HTN. Comprehensive public health measures are needed to improve the prevention and treatment of systemic arterial HTN and prevent other cardiovascular diseases.