965 resultados para Heterogeneous multiprocessor systems
Resumo:
Consider the problem of scheduling a set of implicit-deadline sporadic tasks to meet all deadlines on a two-type heterogeneous multiprocessor platform where a task may request at most one of |R| shared resources. There are m1 processors of type-1 and m2 processors of type-2. Tasks may migrate only when requesting or releasing resources. We present a new algorithm, FF-3C-vpr, which offers a guarantee that if a task set is schedulable to meet deadlines by an optimal task assignment scheme that only allows tasks to migrate when requesting or releasing a resource, then FF-3Cvpr also meets deadlines if given processors 4+6*ceil(|R|/min(m1,m2)) times as fast. As far as we know, it is the first result for resource sharing on heterogeneous platforms with provable performance.
Resumo:
Os sistemas de tempo real modernos geram, cada vez mais, cargas computacionais pesadas e dinâmicas, começando-se a tornar pouco expectável que sejam implementados em sistemas uniprocessador. Na verdade, a mudança de sistemas com um único processador para sistemas multi- processador pode ser vista, tanto no domínio geral, como no de sistemas embebidos, como uma forma eficiente, em termos energéticos, de melhorar a performance das aplicações. Simultaneamente, a proliferação das plataformas multi-processador transformaram a programação paralela num tópico de elevado interesse, levando o paralelismo dinâmico a ganhar rapidamente popularidade como um modelo de programação. A ideia, por detrás deste modelo, é encorajar os programadores a exporem todas as oportunidades de paralelismo através da simples indicação de potenciais regiões paralelas dentro das aplicações. Todas estas anotações são encaradas pelo sistema unicamente como sugestões, podendo estas serem ignoradas e substituídas, por construtores sequenciais equivalentes, pela própria linguagem. Assim, o modo como a computação é na realidade subdividida, e mapeada nos vários processadores, é da responsabilidade do compilador e do sistema computacional subjacente. Ao retirar este fardo do programador, a complexidade da programação é consideravelmente reduzida, o que normalmente se traduz num aumento de produtividade. Todavia, se o mecanismo de escalonamento subjacente não for simples e rápido, de modo a manter o overhead geral em níveis reduzidos, os benefícios da geração de um paralelismo com uma granularidade tão fina serão meramente hipotéticos. Nesta perspetiva de escalonamento, os algoritmos que empregam uma política de workstealing são cada vez mais populares, com uma eficiência comprovada em termos de tempo, espaço e necessidades de comunicação. Contudo, estes algoritmos não contemplam restrições temporais, nem outra qualquer forma de atribuição de prioridades às tarefas, o que impossibilita que sejam diretamente aplicados a sistemas de tempo real. Além disso, são tradicionalmente implementados no runtime da linguagem, criando assim um sistema de escalonamento com dois níveis, onde a previsibilidade, essencial a um sistema de tempo real, não pode ser assegurada. Nesta tese, é descrita a forma como a abordagem de work-stealing pode ser resenhada para cumprir os requisitos de tempo real, mantendo, ao mesmo tempo, os seus princípios fundamentais que tão bons resultados têm demonstrado. Muito resumidamente, a única fila de gestão de processos convencional (deque) é substituída por uma fila de deques, ordenada de forma crescente por prioridade das tarefas. De seguida, aplicamos por cima o conhecido algoritmo de escalonamento dinâmico G-EDF, misturamos as regras de ambos, e assim nasce a nossa proposta: o algoritmo de escalonamento RTWS. Tirando partido da modularidade oferecida pelo escalonador do Linux, o RTWS é adicionado como uma nova classe de escalonamento, de forma a avaliar na prática se o algoritmo proposto é viável, ou seja, se garante a eficiência e escalonabilidade desejadas. Modificar o núcleo do Linux é uma tarefa complicada, devido à complexidade das suas funções internas e às fortes interdependências entre os vários subsistemas. Não obstante, um dos objetivos desta tese era ter a certeza que o RTWS é mais do que um conceito interessante. Assim, uma parte significativa deste documento é dedicada à discussão sobre a implementação do RTWS e à exposição de situações problemáticas, muitas delas não consideradas em teoria, como é o caso do desfasamento entre vários mecanismo de sincronização. Os resultados experimentais mostram que o RTWS, em comparação com outro trabalho prático de escalonamento dinâmico de tarefas com restrições temporais, reduz significativamente o overhead de escalonamento através de um controlo de migrações, e mudanças de contexto, eficiente e escalável (pelo menos até 8 CPUs), ao mesmo tempo que alcança um bom balanceamento dinâmico da carga do sistema, até mesmo de uma forma não custosa. Contudo, durante a avaliação realizada foi detetada uma falha na implementação do RTWS, pela forma como facilmente desiste de roubar trabalho, o que origina períodos de inatividade, no CPU em questão, quando a utilização geral do sistema é baixa. Embora o trabalho realizado se tenha focado em manter o custo de escalonamento baixo e em alcançar boa localidade dos dados, a escalonabilidade do sistema nunca foi negligenciada. Na verdade, o algoritmo de escalonamento proposto provou ser bastante robusto, não falhando qualquer meta temporal nas experiências realizadas. Portanto, podemos afirmar que alguma inversão de prioridades, causada pela sub-política de roubo BAS, não compromete os objetivos de escalonabilidade, e até ajuda a reduzir a contenção nas estruturas de dados. Mesmo assim, o RTWS também suporta uma sub-política de roubo determinística: PAS. A avaliação experimental, porém, não ajudou a ter uma noção clara do impacto de uma e de outra. No entanto, de uma maneira geral, podemos concluir que o RTWS é uma solução promissora para um escalonamento eficiente de tarefas paralelas com restrições temporais.
Resumo:
Consider the problem of scheduling a set of implicit-deadline sporadic tasks to meet all deadlines on a heterogeneous multiprocessor platform. We use an algorithm proposed in [1] (we refer to it as LP-EE) from state-of-the-art for assigning tasks to heterogeneous multiprocessor platform and (re-)prove its performance guarantee but for a stronger adversary.We conjecture that if a task set can be scheduled to meet deadlines on a heterogeneous multiprocessor platform by an optimal task assignment scheme that allows task migrations then LP-EE meets deadlines as well with no migrations if given processors twice as fast. We illustrate this with an example.
Resumo:
Consider the problem of non-migratively scheduling a set of implicit-deadline sporadic tasks to meet all deadlines on a two-type heterogeneous multiprocessor platform. We ask the following question: Does there exist a phase transition behavior for the two-type heterogeneous multiprocessor scheduling problem? We also provide some initial observations via simulations performed on randomly generated task sets.
Resumo:
Consider the problem of scheduling a set of implicit-deadline sporadic tasks to meet all deadlines on a heterogeneous multiprocessor platform. We consider a restricted case where the maximum utilization of any task on any processor in the system is no greater than one. We use an algorithm proposed in [1] (we refer to it as LP-EE) from state-of-the-art for assigning tasks to heterogeneous multiprocessor platform and (re-)prove its performance guarantee for this restricted case but for a stronger adversary. We show that if a task set can be scheduled to meet deadlines on a heterogeneous multiprocessor platform by an optimal task assignment scheme that allows task migrations then LP-EE meets deadlines as well with no migrations if given processors twice as fast.
Resumo:
Consider the problem of scheduling a set of implicit-deadline sporadic tasks to meet all deadlines on a two-type heterogeneous multiprocessor platform. Each processor is either of type-1 or type-2 with each task having different execution time on each processor type. Jobs can migrate between processors of same type (referred to as intra-type migration) but cannot migrate between processors of different types. We present a new scheduling algorithm namely, LP-Relax(THR) which offers a guarantee that if a task set can be scheduled to meet deadlines by an optimal task assignment scheme that allows intra-type migration then LP-Relax(THR) meets deadlines as well with intra-type migration if given processors 1/THR as fast (referred to as speed competitive ratio) where THR <= 2/3.
Resumo:
In this paper we discuss challenges and design principles of an implementation of slot-based tasksplitting algorithms into the Linux 2.6.34 version. We show that this kernel version is provided with the required features for implementing such scheduling algorithms. We show that the real behavior of the scheduling algorithm is very close to the theoretical. We run and discuss experiments on 4-core and 24-core machines.
Resumo:
Developing an efficient server-based real-time scheduling solution that supports dynamic task-level parallelism is now relevant to even the desktop and embedded domains and no longer only to the high performance computing market niche. This paper proposes a novel approach that combines the constantbandwidth server abstraction with a work-stealing load balancing scheme which, while ensuring isolation among tasks, enables a task to be executed on more than one processor at a given time instant.
Resumo:
Comparison-based diagnosis is an effective approach to system-level fault diagnosis. Under the Maeng-Malek comparison model (NM* model), Sengupta and Dahbura proposed an O(N-5) diagnosis algorithm for general diagnosable systems with N nodes. Thanks to lower diameter and better graph embedding capability as compared with a hypercube of the same size, the crossed cube has been a promising candidate for interconnection networks. In this paper, we propose a fault diagnosis algorithm tailored for crossed cube connected multicomputer systems under the MM* model. By introducing appropriate data structures, this algorithm runs in O(Nlog(2)(2) N) time, which is linear in the size of the input. As a result, this algorithm is significantly superior to the Sengupta-Dahbura's algorithm when applied to crossed cube systems. (C) 2004 Elsevier B.V. All rights reserved.
Resumo:
This thesis explores the capabilities of heterogeneous multi-core systems, based on multiple Graphics Processing Units (GPUs) in a standard desktop framework. Multi-GPU accelerated desk side computers are an appealing alternative to other high performance computing (HPC) systems: being composed of commodity hardware components fabricated in large quantities, their price-performance ratio is unparalleled in the world of high performance computing. Essentially bringing “supercomputing to the masses”, this opens up new possibilities for application fields where investing in HPC resources had been considered unfeasible before. One of these is the field of bioelectrical imaging, a class of medical imaging technologies that occupy a low-cost niche next to million-dollar systems like functional Magnetic Resonance Imaging (fMRI). In the scope of this work, several computational challenges encountered in bioelectrical imaging are tackled with this new kind of computing resource, striving to help these methods approach their true potential. Specifically, the following main contributions were made: Firstly, a novel dual-GPU implementation of parallel triangular matrix inversion (TMI) is presented, addressing an crucial kernel in computation of multi-mesh head models of encephalographic (EEG) source localization. This includes not only a highly efficient implementation of the routine itself achieving excellent speedups versus an optimized CPU implementation, but also a novel GPU-friendly compressed storage scheme for triangular matrices. Secondly, a scalable multi-GPU solver for non-hermitian linear systems was implemented. It is integrated into a simulation environment for electrical impedance tomography (EIT) that requires frequent solution of complex systems with millions of unknowns, a task that this solution can perform within seconds. In terms of computational throughput, it outperforms not only an highly optimized multi-CPU reference, but related GPU-based work as well. Finally, a GPU-accelerated graphical EEG real-time source localization software was implemented. Thanks to acceleration, it can meet real-time requirements in unpreceeded anatomical detail running more complex localization algorithms. Additionally, a novel implementation to extract anatomical priors from static Magnetic Resonance (MR) scansions has been included.
Resumo:
The new generation of multicore processors opens new perspectives for the design of embedded systems. Multiprocessing, however, poses new challenges to the scheduling of real-time applications, in which the ever-increasing computational demands are constantly flanked by the need of meeting critical time constraints. Many research works have contributed to this field introducing new advanced scheduling algorithms. However, despite many of these works have solidly demonstrated their effectiveness, the actual support for multiprocessor real-time scheduling offered by current operating systems is still very limited. This dissertation deals with implementative aspects of real-time schedulers in modern embedded multiprocessor systems. The first contribution is represented by an open-source scheduling framework, which is capable of realizing complex multiprocessor scheduling policies, such as G-EDF, on conventional operating systems exploiting only their native scheduler from user-space. A set of experimental evaluations compare the proposed solution to other research projects that pursue the same goals by means of kernel modifications, highlighting comparable scheduling performances. The principles that underpin the operation of the framework, originally designed for symmetric multiprocessors, have been further extended first to asymmetric ones, which are subjected to major restrictions such as the lack of support for task migrations, and later to re-programmable hardware architectures (FPGAs). In the latter case, this work introduces a scheduling accelerator, which offloads most of the scheduling operations to the hardware and exhibits extremely low scheduling jitter. The realization of a portable scheduling framework presented many interesting software challenges. One of these has been represented by timekeeping. In this regard, a further contribution is represented by a novel data structure, called addressable binary heap (ABH). Such ABH, which is conceptually a pointer-based implementation of a binary heap, shows very interesting average and worst-case performances when addressing the problem of tick-less timekeeping of high-resolution timers.
Resumo:
MultiProcessor Systems-on-Chip (MPSoC) are the core of nowadays and next generation computing platforms. Their relevance in the global market continuously increase, occupying an important role both in everydaylife products (e.g. smartphones, tablets, laptops, cars) and in strategical market sectors as aviation, defense, robotics, medicine. Despite of the incredible performance improvements in the recent years processors manufacturers have had to deal with issues, commonly called “Walls”, that have hindered the processors development. After the famous “Power Wall”, that limited the maximum frequency of a single core and marked the birth of the modern multiprocessors system-on-chip, the “Thermal Wall” and the “Utilization Wall” are the actual key limiter for performance improvements. The former concerns the damaging effects of the high temperature on the chip caused by the large power densities dissipation, whereas the second refers to the impossibility of fully exploiting the computing power of the processor due to the limitations on power and temperature budgets. In this thesis we faced these challenges by developing efficient and reliable solutions able to maximize performance while limiting the maximum temperature below a fixed critical threshold and saving energy. This has been possible by exploiting the Model Predictive Controller (MPC) paradigm that solves an optimization problem subject to constraints in order to find the optimal control decisions for the future interval. A fully-distributedMPC-based thermal controller with a far lower complexity respect to a centralized one has been developed. The control feasibility and interesting properties for the simplification of the control design has been proved by studying a partial differential equation thermal model. Finally, the controller has been efficiently included in more complex control schemes able to minimize energy consumption and deal with mixed-criticalities tasks
Resumo:
Includes bibliographical references.
Resumo:
"UILU-ENG 77 1720."
Resumo:
"NSF-OCA-MCS76-81686-000030."