892 resultados para Parallel computing, Virtual machine, Composition, Determinism, Abstraction
Resumo:
A parallel computing environment to support optimization of large-scale engineering systems is designed and implemented on Windows-based personal computer networks, using the master-worker model and the Parallel Virtual Machine (PVM). It is involved in decomposition of a large engineering system into a number of smaller subsystems optimized in parallel on worker nodes and coordination of subsystem optimization results on the master node. The environment consists of six functional modules, i.e. the master control, the optimization model generator, the optimizer, the data manager, the monitor, and the post processor. Object-oriented design of these modules is presented. The environment supports steps from the generation of optimization models to the solution and the visualization on networks of computers. User-friendly graphical interfaces make it easy to define the problem, and monitor and steer the optimization process. It has been verified by an example of a large space truss optimization. (C) 2004 Elsevier Ltd. All rights reserved.
Resumo:
Object-oriented programming languages presently are the dominant paradigm of application development (e. g., Java,. NET). Lately, increasingly more Java applications have long (or very long) execution times and manipulate large amounts of data/information, gaining relevance in fields related with e-Science (with Grid and Cloud computing). Significant examples include Chemistry, Computational Biology and Bio-informatics, with many available Java-based APIs (e. g., Neobio). Often, when the execution of such an application is terminated abruptly because of a failure (regardless of the cause being a hardware of software fault, lack of available resources, etc.), all of its work already performed is simply lost, and when the application is later re-initiated, it has to restart all its work from scratch, wasting resources and time, while also being prone to another failure and may delay its completion with no deadline guarantees. Our proposed solution to address these issues is through incorporating mechanisms for checkpointing and migration in a JVM. These make applications more robust and flexible by being able to move to other nodes, without any intervention from the programmer. This article provides a solution to Java applications with long execution times, by extending a JVM (Jikes research virtual machine) with such mechanisms. Copyright (C) 2011 John Wiley & Sons, Ltd.
Resumo:
One of the main challenges in Software Engineering is to cope with the transition from an industry based on software as a product to software as a service. The field of Software Engineering should provide the necessary methods and tools to develop and deploy new cost-efficient and scalable digital services. In this thesis, we focus on deployment platforms to ensure cost-efficient scalability of multi-tier web applications and on-demand video transcoding service for different types of load conditions. Infrastructure as a Service (IaaS) clouds provide Virtual Machines (VMs) under the pay-per-use business model. Dynamically provisioning VMs on demand allows service providers to cope with fluctuations on the number of service users. However, VM provisioning must be done carefully, because over-provisioning results in an increased operational cost, while underprovisioning leads to a subpar service. Therefore, our main focus in this thesis is on cost-efficient VM provisioning for multi-tier web applications and on-demand video transcoding. Moreover, to prevent provisioned VMs from becoming overloaded, we augment VM provisioning with an admission control mechanism. Similarly, to ensure efficient use of provisioned VMs, web applications on the under-utilized VMs are consolidated periodically. Thus, the main problem that we address is cost-efficient VM provisioning augmented with server consolidation and admission control on the provisioned VMs. We seek solutions for two types of applications: multi-tier web applications that follow the request-response paradigm and on-demand video transcoding that is based on video streams with soft realtime constraints. Our first contribution is a cost-efficient VM provisioning approach for multi-tier web applications. The proposed approach comprises two subapproaches: a reactive VM provisioning approach called ARVUE and a hybrid reactive-proactive VM provisioning approach called Cost-efficient Resource Allocation for Multiple web applications with Proactive scaling. Our second contribution is a prediction-based VM provisioning approach for on-demand video transcoding in the cloud. Moreover, to prevent virtualized servers from becoming overloaded, the proposed VM provisioning approaches are augmented with admission control approaches. Therefore, our third contribution is a session-based admission control approach for multi-tier web applications called adaptive Admission Control for Virtualized Application Servers. Similarly, the fourth contribution in this thesis is a stream-based admission control and scheduling approach for on-demand video transcoding called Stream-Based Admission Control and Scheduling. Our fifth contribution is a computation and storage trade-o strategy for cost-efficient video transcoding in cloud computing. Finally, the sixth and the last contribution is a web application consolidation approach, which uses Ant Colony System to minimize the under-utilization of the virtualized application servers.
Resumo:
The increasing use of model-driven software development has renewed emphasis on using domain-specific models during application development. More specifically, there has been emphasis on using domain-specific modeling languages (DSMLs) to capture user-specified requirements when creating applications. The current approach to realizing these applications is to translate DSML models into source code using several model-to-model and model-to-code transformations. This approach is still dependent on the underlying source code representation and only raises the level of abstraction during development. Experience has shown that developers will many times be required to manually modify the generated source code, which can be error-prone and time consuming. ^ An alternative to the aforementioned approach involves using an interpreted domain-specific modeling language (i-DSML) whose models can be directly executed using a Domain Specific Virtual Machine (DSVM). Direct execution of i-DSML models require a semantically rich platform that reduces the gap between the application models and the underlying services required to realize the application. One layer in this platform is the domain-specific middleware that is responsible for the management and delivery of services in the specific domain. ^ In this dissertation, we investigated the problem of designing the domain-specific middleware of the DSVM to facilitate the bifurcation of the semantics of the domain and the model of execution (MoE) while supporting runtime adaptation and validation. We approached our investigation by seeking solutions to the following sub-problems: (1) How can the domain-specific knowledge (DSK) semantics be separated from the MoE for a given domain? (2) How do we define a generic model of execution (GMoE) of the middleware so that it is adaptable and realizes DSK operations to support delivery of services? (3) How do we validate the realization of DSK operations at runtime? ^ Our research into the domain-specific middleware was done using an i-DSML for the user-centric communication domain, Communication Modeling Language (CML), and for microgrid energy management domain, Microgrid Modeling Language (MGridML). We have successfully developed a methodology to separate the DSK and GMoE of the middleware of a DSVM that supports specialization for a given domain, and is able to perform adaptation and validation at runtime. ^
Resumo:
Virtual machines (VMs) are powerful platforms for building agile datacenters and emerging cloud systems. However, resource management for a VM-based system is still a challenging task. First, the complexity of application workloads as well as the interference among competing workloads makes it difficult to understand their VMs’ resource demands for meeting their Quality of Service (QoS) targets; Second, the dynamics in the applications and system makes it also difficult to maintain the desired QoS target while the environment changes; Third, the transparency of virtualization presents a hurdle for guest-layer application and host-layer VM scheduler to cooperate and improve application QoS and system efficiency. This dissertation proposes to address the above challenges through fuzzy modeling and control theory based VM resource management. First, a fuzzy-logic-based nonlinear modeling approach is proposed to accurately capture a VM’s complex demands of multiple types of resources automatically online based on the observed workload and resource usages. Second, to enable fast adaption for resource management, the fuzzy modeling approach is integrated with a predictive-control-based controller to form a new Fuzzy Modeling Predictive Control (FMPC) approach which can quickly track the applications’ QoS targets and optimize the resource allocations under dynamic changes in the system. Finally, to address the limitations of black-box-based resource management solutions, a cross-layer optimization approach is proposed to enable cooperation between a VM’s host and guest layers and further improve the application QoS and resource usage efficiency. The above proposed approaches are prototyped and evaluated on a Xen-based virtualized system and evaluated with representative benchmarks including TPC-H, RUBiS, and TerraFly. The results demonstrate that the fuzzy-modeling-based approach improves the accuracy in resource prediction by up to 31.4% compared to conventional regression approaches. The FMPC approach substantially outperforms the traditional linear-model-based predictive control approach in meeting application QoS targets for an oversubscribed system. It is able to manage dynamic VM resource allocations and migrations for over 100 concurrent VMs across multiple hosts with good efficiency. Finally, the cross-layer optimization approach further improves the performance of a virtualized application by up to 40% when the resources are contended by dynamic workloads.
Resumo:
Single processor architectures are unable to provide the required performance of high performance embedded systems. Parallel processing based on general-purpose processors can achieve these performances with a considerable increase of required resources. However, in many cases, simplified optimized parallel cores can be used instead of general-purpose processors achieving better performance at lower resource utilization. In this paper, we propose a configurable many-core architecture to serve as a co-processor for high-performance embedded computing on Field-Programmable Gate Arrays. The architecture consists of an array of configurable simple cores with support for floating-point operations interconnected with a configurable interconnection network. For each core it is possible to configure the size of the internal memory, the supported operations and number of interfacing ports. The architecture was tested in a ZYNQ-7020 FPGA in the execution of several parallel algorithms. The results show that the proposed many-core architecture achieves better performance than that achieved with a parallel generalpurpose processor and that up to 32 floating-point cores can be implemented in a ZYNQ-7020 SoC FPGA.
Resumo:
Breast cancer is the most common cancer among women, being a major public health problem. Worldwide, X-ray mammography is the current gold-standard for medical imaging of breast cancer. However, it has associated some well-known limitations. The false-negative rates, up to 66% in symptomatic women, and the false-positive rates, up to 60%, are a continued source of concern and debate. These drawbacks prompt the development of other imaging techniques for breast cancer detection, in which Digital Breast Tomosynthesis (DBT) is included. DBT is a 3D radiographic technique that reduces the obscuring effect of tissue overlap and appears to address both issues of false-negative and false-positive rates. The 3D images in DBT are only achieved through image reconstruction methods. These methods play an important role in a clinical setting since there is a need to implement a reconstruction process that is both accurate and fast. This dissertation deals with the optimization of iterative algorithms, with parallel computing through an implementation on Graphics Processing Units (GPUs) to make the 3D reconstruction faster using Compute Unified Device Architecture (CUDA). Iterative algorithms have shown to produce the highest quality DBT images, but since they are computationally intensive, their clinical use is currently rejected. These algorithms have the potential to reduce patient dose in DBT scans. A method of integrating CUDA in Interactive Data Language (IDL) is proposed in order to accelerate the DBT image reconstructions. This method has never been attempted before for DBT. In this work the system matrix calculation, the most computationally expensive part of iterative algorithms, is accelerated. A speedup of 1.6 is achieved proving the fact that GPUs can accelerate the IDL implementation.
Resumo:
Variations in different types of genomes have been found to be responsible for a large degree of physical diversity such as appearance and susceptibility to disease. Identification of genomic variations is difficult and can be facilitated through computational analysis of DNA sequences. Newly available technologies are able to sequence billions of DNA base pairs relatively quickly. These sequences can be used to identify variations within their specific genome but must be mapped to a reference sequence first. In order to align these sequences to a reference sequence, we require mapping algorithms that make use of approximate string matching and string indexing methods. To date, few mapping algorithms have been tailored to handle the massive amounts of output generated by newly available sequencing technologies. In otrder to handle this large amount of data, we modified the popular mapping software BWA to run in parallel using OpenMPI. Parallel BWA matches the efficiency of multithreaded BWA functions while providing efficient parallelism for BWA functions that do not currently support multithreading. Parallel BWA shows significant wall time speedup in comparison to multithreaded BWA on high-performance computing clusters, and will thus facilitate the analysis of genome sequencing data.
Resumo:
L’observation de l’exécution d’applications JavaScript est habituellement réalisée en instrumentant une machine virtuelle (MV) industrielle ou en effectuant une traduction source-à-source ad hoc et complexe. Ce mémoire présente une alternative basée sur la superposition de machines virtuelles. Notre approche consiste à faire une traduction source-à-source d’un programme pendant son exécution pour exposer ses opérations de bas niveau au travers d’un modèle objet flexible. Ces opérations de bas niveau peuvent ensuite être redéfinies pendant l’exécution pour pouvoir en faire l’observation. Pour limiter la pénalité en performance introduite, notre approche exploite les opérations rapides originales de la MV sous-jacente, lorsque cela est possible, et applique les techniques de compilation à-la-volée dans la MV superposée. Notre implémentation, Photon, est en moyenne 19% plus rapide qu’un interprète moderne, et entre 19× et 56× plus lente en moyenne que les compilateurs à-la-volée utilisés dans les navigateurs web populaires. Ce mémoire montre donc que la superposition de machines virtuelles est une technique alternative compétitive à la modification d’un interprète moderne pour JavaScript lorsqu’appliqué à l’observation à l’exécution des opérations sur les objets et des appels de fonction.
Resumo:
In this thesis, I designed and implemented a virtual machine (VM) for a monomorphic variant of Athena, a type-omega denotational proof language (DPL). This machine attempts to maintain the minimum state required to evaluate Athena phrases. This thesis also includes the design and implementation of a compiler for monomorphic Athena that compiles to the VM. Finally, it includes details on my implementation of a read-eval-print loop that glues together the VM core and the compiler to provide a full, user-accessible interface to monomorphic Athena. The Athena VM provides the same basis for DPLs that the SECD machine does for pure, functional programming and the Warren Abstract Machine does for Prolog.
Resumo:
The work reported in this paper is motivated by the fact that there is a need to apply autonomic computing concepts to parallel computing systems. Advancing on prior work based on intelligent cores [36], a swarm-array computing approach, this paper focuses on ‘Intelligent agents’ another swarm-array computing approach in which the task to be executed on a parallel computing core is considered as a swarm of autonomous agents. A task is carried to a computing core by carrier agents and is seamlessly transferred between cores in the event of a predicted failure, thereby achieving self-ware objectives of autonomic computing. The feasibility of the proposed swarm-array computing approach is validated on a multi-agent simulator.
Resumo:
We have optimised the atmospheric radiation algorithm of the FAMOUS climate model on several hardware platforms. The optimisation involved translating the Fortran code to C and restructuring the algorithm around the computation of a single air column. Instead of the existing MPI-based domain decomposition, we used a task queue and a thread pool to schedule the computation of individual columns on the available processors. Finally, four air columns are packed together in a single data structure and computed simultaneously using Single Instruction Multiple Data operations. The modified algorithm runs more than 50 times faster on the CELL’s Synergistic Processing Elements than on its main PowerPC processing element. On Intel-compatible processors, the new radiation code runs 4 times faster. On the tested graphics processor, using OpenCL, we find a speed-up of more than 2.5 times as compared to the original code on the main CPU. Because the radiation code takes more than 60% of the total CPU time, FAMOUS executes more than twice as fast. Our version of the algorithm returns bit-wise identical results, which demonstrates the robustness of our approach. We estimate that this project required around two and a half man-years of work.
Resumo:
One problem with using component-based software development approach is that once software modules are reused over generations of products, they form legacy structures that can be challenging to understand, making validating these systems difficult. Therefore, tools and methodologies that enable engineers to see interactions of these software modules will enhance their ability to make these software systems more dependable. To address this need, we propose SimSight, a framework to capture dynamic call graphs in Simics, a widely adopted commercial full-system simulator. Simics is a software system that simulates complete computer systems. Thus, it performs nearly identical tasks to a real system but at a much lower speed while providing greater execution observability. We have implemented SimSight to generate dynamic call graphs of statically and dynamically linked functions in x86/Linux environment. A case study illustrates how we can use SimSight to identify sources of software errors. We then evaluate its performance using 12 integer programs from SPEC CPU2006 benchmark suite.