776 resultados para Heterogeneous computing


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Graphics Processing Unit (GPU) is present in almost every modern day personal computer. Despite its specific purpose design, they have been increasingly used for general computations with very good results. Hence, there is a growing effort from the community to seamlessly integrate this kind of devices in everyday computing. However, to fully exploit the potential of a system comprising GPUs and CPUs, these devices should be presented to the programmer as a single platform. The efficient combination of the power of CPU and GPU devices is highly dependent on each device’s characteristics, resulting in platform specific applications that cannot be ported to different systems. Also, the most efficient work balance among devices is highly dependable on the computations to be performed and respective data sizes. In this work, we propose a solution for heterogeneous environments based on the abstraction level provided by algorithmic skeletons. Our goal is to take full advantage of the power of all CPU and GPU devices present in a system, without the need for different kernel implementations nor explicit work-distribution.To that end, we extended Marrow, an algorithmic skeleton framework for multi-GPUs, to support CPU computations and efficiently balance the work-load between devices. Our approach is based on an offline training execution that identifies the ideal work balance and platform configurations for a given application and input data size. The evaluation of this work shows that the combination of CPU and GPU devices can significantly boost the performance of our benchmarks in the tested environments, when compared to GPU-only executions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Includes bibliographical references.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This thesis deals with heterogeneous architectures in standard workstations. Heterogeneous architectures represent an appealing alternative to traditional supercomputers because they are based on commodity components fabricated in large quantities. Hence their price-performance ratio is unparalleled in the world of high performance computing (HPC). In particular, different aspects related to the performance and consumption of heterogeneous architectures have been explored. The thesis initially focuses on an efficient implementation of a parallel application, where the execution time is dominated by an high number of floating point instructions. Then the thesis touches the central problem of efficient management of power peaks in heterogeneous computing systems. Finally it discusses a memory-bounded problem, where the execution time is dominated by the memory latency. Specifically, the following main contributions have been carried out: A novel framework for the design and analysis of solar field for Central Receiver Systems (CRS) has been developed. The implementation based on desktop workstation equipped with multiple Graphics Processing Units (GPUs) is motivated by the need to have an accurate and fast simulation environment for studying mirror imperfection and non-planar geometries. Secondly, a power-aware scheduling algorithm on heterogeneous CPU-GPU architectures, based on an efficient distribution of the computing workload to the resources, has been realized. The scheduler manages the resources of several computing nodes with a view to reducing the peak power. The two main contributions of this work follow: the approach reduces the supply cost due to high peak power whilst having negligible impact on the parallelism of computational nodes. from another point of view the developed model allows designer to increase the number of cores without increasing the capacity of the power supply unit. Finally, an implementation for efficient graph exploration on reconfigurable architectures is presented. The purpose is to accelerate graph exploration, reducing the number of random memory accesses.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The scarcity and diversity of resources among the devices of heterogeneous computing environments may affect their ability to perform services with specific Quality of Service constraints, particularly in dynamic distributed environments where the characteristics of the computational load cannot always be predicted in advance. Our work addresses this problem by allowing resource constrained devices to cooperate with more powerful neighbour nodes, opportunistically taking advantage of global distributed resources and processing power. Rather than assuming that the dynamic configuration of this cooperative service executes until it computes its optimal output, the paper proposes an anytime approach that has the ability to tradeoff deliberation time for the quality of the solution. Extensive simulations demonstrate that the proposed anytime algorithms are able to quickly find a good initial solution and effectively optimise the rate at which the quality of the current solution improves at each iteration, with an overhead that can be considered negligible.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Heterogeneous computing systems have become common in modern processor architectures. These systems, such as those released by AMD, Intel, and Nvidia, include both CPU and GPU cores on a single die available with reduced communication overhead compared to their discrete predecessors. Currently, discrete CPU/GPU systems are limited, requiring larger, regular, highly-parallel workloads to overcome the communication costs of the system. Without the traditional communication delay assumed between GPUs and CPUs, we believe non-traditional workloads could be targeted for GPU execution. Specifically, this thesis focuses on the execution model of nested parallel workloads on heterogeneous systems. We have designed a simulation flow which utilizes widely used CPU and GPU simulators to model heterogeneous computing architectures. We then applied this simulator to non-traditional GPU workloads using different execution models. We also have proposed a new execution model for nested parallelism allowing users to exploit these heterogeneous systems to reduce execution time.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Nei prossimi anni è atteso un aggiornamento sostanziale di LHC, che prevede di aumentare la luminosità integrata di un fattore 10 rispetto a quella attuale. Tale parametro è proporzionale al numero di collisioni per unità di tempo. Per questo, le risorse computazionali necessarie a tutti i livelli della ricostruzione cresceranno notevolmente. Dunque, la collaborazione CMS ha cominciato già da alcuni anni ad esplorare le possibilità offerte dal calcolo eterogeneo, ovvero la pratica di distribuire la computazione tra CPU e altri acceleratori dedicati, come ad esempio schede grafiche (GPU). Una delle difficoltà di questo approccio è la necessità di scrivere, validare e mantenere codice diverso per ogni dispositivo su cui dovrà essere eseguito. Questa tesi presenta la possibilità di usare SYCL per tradurre codice per la ricostruzione di eventi in modo che sia eseguibile ed efficiente su diversi dispositivi senza modifiche sostanziali. SYCL è un livello di astrazione per il calcolo eterogeneo, che rispetta lo standard ISO C++. Questo studio si concentra sul porting di un algoritmo di clustering dei depositi di energia calorimetrici, CLUE, usando oneAPI, l'implementazione SYCL supportata da Intel. Inizialmente, è stato tradotto l'algoritmo nella sua versione standalone, principalmente per prendere familiarità con SYCL e per la comodità di confronto delle performance con le versioni già esistenti. In questo caso, le prestazioni sono molto simili a quelle di codice CUDA nativo, a parità di hardware. Per validare la fisica, l'algoritmo è stato integrato all'interno di una versione ridotta del framework usato da CMS per la ricostruzione. I risultati fisici sono identici alle altre implementazioni mentre, dal punto di vista delle prestazioni computazionali, in alcuni casi, SYCL produce codice più veloce di altri livelli di astrazione adottati da CMS, presentandosi dunque come una possibilità interessante per il futuro del calcolo eterogeneo nella fisica delle alte energie.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Cloud computing has been one of the most important topics in Information Technology which aims to assure scalable and reliable on-demand services over the Internet. The expansion of the application scope of cloud services would require cooperation between clouds from different providers that have heterogeneous functionalities. This collaboration between different cloud vendors can provide better Quality of Services (QoS) at the lower price. However, current cloud systems have been developed without concerns of seamless cloud interconnection, and actually they do not support intercloud interoperability to enable collaboration between cloud service providers. Hence, the PhD work is motivated to address interoperability issue between cloud providers as a challenging research objective. This thesis proposes a new framework which supports inter-cloud interoperability in a heterogeneous computing resource cloud environment with the goal of dispatching the workload to the most effective clouds available at runtime. Analysing different methodologies that have been applied to resolve various problem scenarios related to interoperability lead us to exploit Model Driven Architecture (MDA) and Service Oriented Architecture (SOA) methods as appropriate approaches for our inter-cloud framework. Moreover, since distributing the operations in a cloud-based environment is a nondeterministic polynomial time (NP-complete) problem, a Genetic Algorithm (GA) based job scheduler proposed as a part of interoperability framework, offering workload migration with the best performance at the least cost. A new Agent Based Simulation (ABS) approach is proposed to model the inter-cloud environment with three types of agents: Cloud Subscriber agent, Cloud Provider agent, and Job agent. The ABS model is proposed to evaluate the proposed framework.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The web services (WS) technology provides a comprehensive solution for representing, discovering, and invoking services in a wide variety of environments, including Service Oriented Architectures (SOA) and grid computing systems. At the core of WS technology lie a number of XML-based standards, such as the Simple Object Access Protocol (SOAP), that have successfully ensured WS extensibility, transparency, and interoperability. Nonetheless, there is an increasing demand to enhance WS performance, which is severely impaired by XML's verbosity. SOAP communications produce considerable network traffic, making them unfit for distributed, loosely coupled, and heterogeneous computing environments such as the open Internet. Also, they introduce higher latency and processing delays than other technologies, like Java RMI and CORBA. WS research has recently focused on SOAP performance enhancement. Many approaches build on the observation that SOAP message exchange usually involves highly similar messages (those created by the same implementation usually have the same structure, and those sent from a server to multiple clients tend to show similarities in structure and content). Similarity evaluation and differential encoding have thus emerged as SOAP performance enhancement techniques. The main idea is to identify the common parts of SOAP messages, to be processed only once, avoiding a large amount of overhead. Other approaches investigate nontraditional processor architectures, including micro-and macrolevel parallel processing solutions, so as to further increase the processing rates of SOAP/XML software toolkits. This survey paper provides a concise, yet comprehensive review of the research efforts aimed at SOAP performance enhancement. A unified view of the problem is provided, covering almost every phase of SOAP processing, ranging over message parsing, serialization, deserialization, compression, multicasting, security evaluation, and data/instruction-level processing.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The increasing volume of data describing humandisease processes and the growing complexity of understanding, managing, and sharing such data presents a huge challenge for clinicians and medical researchers. This paper presents the@neurIST system, which provides an infrastructure for biomedical research while aiding clinical care, by bringing together heterogeneous data and complex processing and computing services. Although @neurIST targets the investigation and treatment of cerebral aneurysms, the system’s architecture is generic enough that it could be adapted to the treatment of other diseases.Innovations in @neurIST include confining the patient data pertaining to aneurysms inside a single environment that offers cliniciansthe tools to analyze and interpret patient data and make use of knowledge-based guidance in planning their treatment. Medicalresearchers gain access to a critical mass of aneurysm related data due to the system’s ability to federate distributed informationsources. A semantically mediated grid infrastructure ensures that both clinicians and researchers are able to seamlessly access andwork on data that is distributed across multiple sites in a secure way in addition to providing computing resources on demand forperforming computationally intensive simulations for treatment planning and research.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

MSC Dissertation in Computer Engineering

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Consider the problem of assigning implicit-deadline sporadic tasks on a heterogeneous multiprocessor platform comprising a constant number (denoted by t) of distinct types of processors—such a platform is referred to as a t-type platform. We present two algorithms, LPGIM and LPGNM, each providing the following guarantee. For a given t-type platform and a task set, if there exists a task assignment such that tasks can be scheduled to meet their deadlines by allowing them to migrate only between processors of the same type (intra-migrative), then: (i) LPGIM succeeds in finding such an assignment where the same restriction on task migration applies (intra-migrative) but given a platform in which only one processor of each type is 1 + α × t-1/t times faster and (ii) LPGNM succeeds in finding a task assignment where tasks are not allowed to migrate between processors (non-migrative) but given a platform in which every processor is 1 + α times faster. The parameter α is a property of the task set; it is the maximum of all the task utilizations that are no greater than one. To the best of our knowledge, for t-type heterogeneous multiprocessors: (i) for the problem of intra-migrative task assignment, no previous algorithm exists with a proven bound and hence our algorithm, LPGIM, is the first of its kind and (ii) for the problem of non-migrative task assignment, our algorithm, LPGNM, has superior performance compared to state-of-the-art.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Consider scheduling of real-time tasks on a multiprocessor where migration is forbidden. Specifically, consider the problem of determining a task-to-processor assignment for a given collection of implicit-deadline sporadic tasks upon a multiprocessor platform in which there are two distinct types of processors. For this problem, we propose a new algorithm, LPC (task assignment based on solving a Linear Program with Cutting planes). The algorithm offers the following guarantee: for a given task set and a platform, if there exists a feasible task-to-processor assignment, then LPC succeeds in finding such a feasible task-to-processor assignment as well but on a platform in which each processor is 1.5 × faster and has three additional processors. For systems with a large number of processors, LPC has a better approximation ratio than state-of-the-art algorithms. To the best of our knowledge, this is the first work that develops a provably good real-time task assignment algorithm using cutting planes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para a obtenção do Grau de Mestre em Engenharia Informática

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Heterogeneous multicore platforms are becoming an interesting alternative for embedded computing systems with limited power supply as they can execute specific tasks in an efficient manner. Nonetheless, one of the main challenges of such platforms consists of optimising the energy consumption in the presence of temporal constraints. This paper addresses the problem of task-to-core allocation onto heterogeneous multicore platforms such that the overall energy consumption of the system is minimised. To this end, we propose a two-phase approach that considers both dynamic and leakage energy consumption: (i) the first phase allocates tasks to the cores such that the dynamic energy consumption is reduced; (ii) the second phase refines the allocation performed in the first phase in order to achieve better sleep states by trading off the dynamic energy consumption with the reduction in leakage energy consumption. This hybrid approach considers core frequency set-points, tasks energy consumption and sleep states of the cores to reduce the energy consumption of the system. Major value has been placed on a realistic power model which increases the practical relevance of the proposed approach. Finally, extensive simulations have been carried out to demonstrate the effectiveness of the proposed algorithm. In the best-case, savings up to 18% of energy are reached over the first fit algorithm, which has shown, in previous works, to perform better than other bin-packing heuristics for the target heterogeneous multicore platform.