135 resultados para Multiprocessor


Relevância:

10.00% 10.00%

Publicador:

Resumo:

The multiprocessor scheduling scheme NPS-F for sporadic tasks has a high utilisation bound and an overall number of preemptions bounded at design time. NPS-F binpacks tasks offline to as many servers as needed. At runtime, the scheduler ensures that each server is mapped to at most one of the m processors, at any instant. When scheduled, servers use EDF to select which of their tasks to run. Yet, unlike the overall number of preemptions, the migrations per se are not tightly bounded. Moreover, we cannot know a priori which task a server will be currently executing at the instant when it migrates. This uncertainty complicates the estimation of cache-related preemption and migration costs (CPMD), potentially resulting in their overestimation. Therefore, to simplify the CPMD estimation, we propose an amended bin-packing scheme for NPS-F allowing us (i) to identify at design time, which task migrates at which instant and (ii) bound a priori the number of migrating tasks, while preserving the utilisation bound of NPS-F.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Nowadays, many real-time operating systems discretize the time relying on a system time unit. To take this behavior into account, real-time scheduling algorithms must adopt a discrete-time model in which both timing requirements of tasks and their time allocations have to be integer multiples of the system time unit. That is, tasks cannot be executed for less than one time unit, which implies that they always have to achieve a minimum amount of work before they can be preempted. Assuming such a discrete-time model, the authors of Zhu et al. (Proceedings of the 24th IEEE international real-time systems symposium (RTSS 2003), 2003, J Parallel Distrib Comput 71(10):1411–1425, 2011) proposed an efficient “boundary fair” algorithm (named BF) and proved its optimality for the scheduling of periodic tasks while achieving full system utilization. However, BF cannot handle sporadic tasks due to their inherent irregular and unpredictable job release patterns. In this paper, we propose an optimal boundary-fair scheduling algorithm for sporadic tasks (named BF TeX ), which follows the same principle as BF by making scheduling decisions only at the job arrival times and (expected) task deadlines. This new algorithm was implemented in Linux and we show through experiments conducted upon a multicore machine that BF TeX outperforms the state-of-the-art discrete-time optimal scheduler (PD TeX ), benefiting from much less scheduling overheads. Furthermore, it appears from these experimental results that BF TeX is barely dependent on the length of the system time unit while PD TeX —the only other existing solution for the scheduling of sporadic tasks in discrete-time systems—sees its number of preemptions, migrations and the time spent to take scheduling decisions increasing linearly when improving the time resolution of the system.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Consider scheduling of real-time tasks on a multiprocessor where migration is forbidden. Specifically, consider the problem of determining a task-to-processor assignment for a given collection of implicit-deadline sporadic tasks upon a multiprocessor platform in which there are two distinct types of processors. For this problem, we propose a new algorithm, LPC (task assignment based on solving a Linear Program with Cutting planes). The algorithm offers the following guarantee: for a given task set and a platform, if there exists a feasible task-to-processor assignment, then LPC succeeds in finding such a feasible task-to-processor assignment as well but on a platform in which each processor is 1.5 × faster and has three additional processors. For systems with a large number of processors, LPC has a better approximation ratio than state-of-the-art algorithms. To the best of our knowledge, this is the first work that develops a provably good real-time task assignment algorithm using cutting planes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Consider the problem of scheduling a task set τ of implicit-deadline sporadic tasks to meet all deadlines on a t-type heterogeneous multiprocessor platform where tasks may access multiple shared resources. The multiprocessor platform has m k processors of type-k, where k∈{1,2,…,t}. The execution time of a task depends on the type of processor on which it executes. The set of shared resources is denoted by R. For each task τ i , there is a resource set R i ⊆R such that for each job of τ i , during one phase of its execution, the job requests to hold the resource set R i exclusively with the interpretation that (i) the job makes a single request to hold all the resources in the resource set R i and (ii) at all times, when a job of τ i holds R i , no other job holds any resource in R i . Each job of task τ i may request the resource set R i at most once during its execution. A job is allowed to migrate when it requests a resource set and when it releases the resource set but a job is not allowed to migrate at other times. Our goal is to design a scheduling algorithm for this problem and prove its performance. We propose an algorithm, LP-EE-vpr, which offers the guarantee that if an implicit-deadline sporadic task set is schedulable on a t-type heterogeneous multiprocessor platform by an optimal scheduling algorithm that allows a job to migrate only when it requests or releases a resource set, then our algorithm also meets the deadlines with the same restriction on job migration, if given processors 4×(1+MAXP×⌈|P|×MAXPmin{m1,m2,…,mt}⌉) times as fast. (Here MAXP and |P| are computed based on the resource sets that tasks request.) For the special case that each task requests at most one resource, the bound of LP-EE-vpr collapses to 4×(1+⌈|R|min{m1,m2,…,mt}⌉). To the best of our knowledge, LP-EE-vpr is the first algorithm with proven performance guarantee for real-time scheduling of sporadic tasks with resource sharing on t-type heterogeneous multiprocessors.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

6th Real-Time Scheduling Open Problems Seminar (RTSOPS 2015), Lund, Sweden.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

5th Brazilian Symposium on Computing Systems Engineering, SBESC 2015 (SBESC 2015). 3 to 6, Nov, 2015. Foz do Iguaçu, Brasil.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Presented at Work in Progress Session, IEEE Real-Time Systems Symposium (RTSS 2015). 1 to 3, Dec, 2015. San Antonio, U.S.A..

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Nos dias de hoje, os sistemas de tempo real crescem em importância e complexidade. Mediante a passagem do ambiente uniprocessador para multiprocessador, o trabalho realizado no primeiro não é completamente aplicável no segundo, dado que o nível de complexidade difere, principalmente devido à existência de múltiplos processadores no sistema. Cedo percebeu-se, que a complexidade do problema não cresce linearmente com a adição destes. Na verdade, esta complexidade apresenta-se como uma barreira ao avanço científico nesta área que, para já, se mantém desconhecida, e isto testemunha-se, essencialmente no caso de escalonamento de tarefas. A passagem para este novo ambiente, quer se trate de sistemas de tempo real ou não, promete gerar a oportunidade de realizar trabalho que no primeiro caso nunca seria possível, criando assim, novas garantias de desempenho, menos gastos monetários e menores consumos de energia. Este último fator, apresentou-se desde cedo, como, talvez, a maior barreira de desenvolvimento de novos processadores na área uniprocessador, dado que, à medida que novos eram lançados para o mercado, ao mesmo tempo que ofereciam maior performance, foram levando ao conhecimento de um limite de geração de calor que obrigou ao surgimento da área multiprocessador. No futuro, espera-se que o número de processadores num determinado chip venha a aumentar, e como é óbvio, novas técnicas de exploração das suas inerentes vantagens têm de ser desenvolvidas, e a área relacionada com os algoritmos de escalonamento não é exceção. Ao longo dos anos, diferentes categorias de algoritmos multiprocessador para dar resposta a este problema têm vindo a ser desenvolvidos, destacando-se principalmente estes: globais, particionados e semi-particionados. A perspectiva global, supõe a existência de uma fila global que é acessível por todos os processadores disponíveis. Este fato torna disponível a migração de tarefas, isto é, é possível parar a execução de uma tarefa e resumir a sua execução num processador distinto. Num dado instante, num grupo de tarefas, m, as tarefas de maior prioridade são selecionadas para execução. Este tipo promete limites de utilização altos, a custo elevado de preempções/migrações de tarefas. Em contraste, os algoritmos particionados, colocam as tarefas em partições, e estas, são atribuídas a um dos processadores disponíveis, isto é, para cada processador, é atribuída uma partição. Por essa razão, a migração de tarefas não é possível, acabando por fazer com que o limite de utilização não seja tão alto quando comparado com o caso anterior, mas o número de preempções de tarefas decresce significativamente. O esquema semi-particionado, é uma resposta de caráter hibrido entre os casos anteriores, pois existem tarefas que são particionadas, para serem executadas exclusivamente por um grupo de processadores, e outras que são atribuídas a apenas um processador. Com isto, resulta uma solução que é capaz de distribuir o trabalho a ser realizado de uma forma mais eficiente e balanceada. Infelizmente, para todos estes casos, existe uma discrepância entre a teoria e a prática, pois acaba-se por se assumir conceitos que não são aplicáveis na vida real. Para dar resposta a este problema, é necessário implementar estes algoritmos de escalonamento em sistemas operativos reais e averiguar a sua aplicabilidade, para caso isso não aconteça, as alterações necessárias sejam feitas, quer a nível teórico quer a nível prá

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper shows how a high level matrix programming language may be used to perform Monte Carlo simulation, bootstrapping, estimation by maximum likelihood and GMM, and kernel regression in parallel on symmetric multiprocessor computers or clusters of workstations. The implementation of parallelization is done in a way such that an investigator may use the programs without any knowledge of parallel programming. A bootable CD that allows rapid creation of a cluster for parallel computing is introduced. Examples show that parallelization can lead to important reductions in computational time. Detailed discussion of how the Monte Carlo problem was parallelized is included as an example for learning to write parallel programs for Octave.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Este trabajo analiza el rendimiento de cuatro nodos de cómputo multiprocesador de memoria compartida para resolver el problema N-body. Se paraleliza el algoritmo serie, y se codifica usando el lenguaje C extendido con OpenMP. El resultado son dos variantes que obedecen a dos criterios de optimización diferentes: minimizar los requisitos de memoria y minimizar el volumen de cómputo. Posteriormente, se realiza un proceso de análisis de las prestaciones del programa sobre los nodos de cómputo. Se modela el rendimiento de las variantes secuenciales y paralelas de la aplicación, y de los nodos de cómputo; se instrumentan y ejecutan los programas para obtener resultados en forma de varias métricas; finalmente se muestran e interpretan los resultados, proporcionando claves que explican ineficiencias y cuellos de botella en el rendimiento y posibles líneas de mejora. La experiencia de este estudio concreto ha permitido esbozar una incipiente metodología de análisis de rendimiento, identificación de problemas y sintonización de algoritmos a nodos de cómputo multiprocesador de memoria compartida.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

L’aparició d’un nou paradigma per al disseny de sistemes multiprocessador, les NoC; requereixen una manera d’adaptar els IP cores ja existents i permetre la seva connexió en xarxa. Aquest projecte presenta un disseny d’una interfície que aconsegueix adaptar un IP core existent, el LEON3; del protocol del bus AMBA al protocol de la xarxa. D’aquesta manera i basant-nos en idees d’interfícies discutides en l’estat de l’art, aconseguim desacoblar el processador del disseny i topologia de la xarxa.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Este trabajo analiza el rendimiento del algoritmo de alineamiento de secuencias conocido como Needleman-Wunsch, sobre 3 sistemas de cómputo multiprocesador diferentes. Se analiza y se codifica el algoritmo serie usando el lenguaje de programación C y se plantean una serie de optimizaciones con la finalidad de minimizar el volumen y el tiempo de cómputo. Posteriormente, se realiza un análisis de las prestaciones del programa sobre los diferentes sistemas de cómputo. En la segunda parte del trabajo, se paraleliza el algoritmo serie y se codifica ayudándonos de OpenMP. El resultado son dos variantes del programa que difieren en la relación entre la cantidad de cómputo y la de comunicación. En la primera variante, la comunicación entre procesadores es poco frecuente y se realiza tras largos periodos de ejecución (granularidad gruesa). En cambio, en la segunda variante las tareas individuales son relativamente pequeñas en término de tiempo de ejecución y la comunicación entre los procesadores es frecuente (granularidad fina). Ambas variantes se ejecutan y analizan en arquitecturas multicore que explotan el paralelismo a nivel de thread. Los resultados obtenidos muestran la importancia de entender y saber analizar el efecto del multicore y multithreading en el rendimiento.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The past few decades have seen a considerable increase in the number of parallel and distributed systems. With the development of more complex applications, the need for more powerful systems has emerged and various parallel and distributed environments have been designed and implemented. Each of the environments, including hardware and software, has unique strengths and weaknesses. There is no single parallel environment that can be identified as the best environment for all applications with respect to hardware and software properties. The main goal of this thesis is to provide a novel way of performing data-parallel computation in parallel and distributed environments by utilizing the best characteristics of difference aspects of parallel computing. For the purpose of this thesis, three aspects of parallel computing were identified and studied. First, three parallel environments (shared memory, distributed memory, and a network of workstations) are evaluated to quantify theirsuitability for different parallel applications. Due to the parallel and distributed nature of the environments, networks connecting the processors in these environments were investigated with respect to their performance characteristics. Second, scheduling algorithms are studied in order to make them more efficient and effective. A concept of application-specific information scheduling is introduced. The application- specific information is data about the workload extractedfrom an application, which is provided to a scheduling algorithm. Three scheduling algorithms are enhanced to utilize the application-specific information to further refine their scheduling properties. A more accurate description of the workload is especially important in cases where the workunits are heterogeneous and the parallel environment is heterogeneous and/or non-dedicated. The results obtained show that the additional information regarding the workload has a positive impact on the performance of applications. Third, a programming paradigm for networks of symmetric multiprocessor (SMP) workstations is introduced. The MPIT programming paradigm incorporates the Message Passing Interface (MPI) with threads to provide a methodology to write parallel applications that efficiently utilize the available resources and minimize the overhead. The MPIT allows for communication and computation to overlap by deploying a dedicated thread for communication. Furthermore, the programming paradigm implements an application-specific scheduling algorithm. The scheduling algorithm is executed by the communication thread. Thus, the scheduling does not affect the execution of the parallel application. Performance results achieved from the MPIT show that considerable improvements over conventional MPI applications are achieved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Universal Converter (UNICON) –projektin osana suunniteltiin sähkömoottorikäyttöjen ohjaukseen ja mittaukseen soveltuva digitaaliseen signaaliprosessoriin (DSP) pohjautuva sulautettu järjestelmä. Riittävän laskentatehon varmistamiseksi päädyttiin käyttämään moniprosessorijärjestelmää. Prosessorijärjestelmässä käytettävää DSP-piiriä valittaessa valintaperusteina olivat piirien tarjoama prosessointiteho ja moniprosessorituki. Analog Devices:n SHARC-sarjan DSP-piirit täyttivät parhaiten asetetut vaatimukset: Ne tarjoavat tehokkaan käskykannan lisäksi suuren sisäisen muistin ja sisäänrakennetun moniprosessorituen. Järjestelmän mittalaiteluonteisuudesta johtuen keskeinen suunnitteluparametri oli luoda nopeat tiedonsiirtoyhteydet mittausantureilta DSP-järjestelmään. Tämä toteutettiin käyttäen ohjelmointavia FPGA-logiikkapiirejä digitaalimuotoisen mittausdatan vastaanotossa ja esikäsittelyssä. Tiedonsiirtoyhteys PC-tietokoneelle toteutettiin käyttäen erityistä liityntäkorttia DSP-järjestelmän ja PC-tietokoneen välillä. Liityntäkortin päätehtävänä on puskuroida siirrettävä data. Järjestelyllä estetään PC-tietokoneen vaikutus DSP-järjestelmän toimintaan, jotta kyetään takaamaan järjestelmän reaaliaikainen toiminta kaikissa olosuhteissa.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

As technology geometries have shrunk to the deep submicron regime, the communication delay and power consumption of global interconnections in high performance Multi- Processor Systems-on-Chip (MPSoCs) are becoming a major bottleneck. The Network-on- Chip (NoC) architecture paradigm, based on a modular packet-switched mechanism, can address many of the on-chip communication issues such as performance limitations of long interconnects and integration of large number of Processing Elements (PEs) on a chip. The choice of routing protocol and NoC structure can have a significant impact on performance and power consumption in on-chip networks. In addition, building a high performance, area and energy efficient on-chip network for multicore architectures requires a novel on-chip router allowing a larger network to be integrated on a single die with reduced power consumption. On top of that, network interfaces are employed to decouple computation resources from communication resources, to provide the synchronization between them, and to achieve backward compatibility with existing IP cores. Three adaptive routing algorithms are presented as a part of this thesis. The first presented routing protocol is a congestion-aware adaptive routing algorithm for 2D mesh NoCs which does not support multicast (one-to-many) traffic while the other two protocols are adaptive routing models supporting both unicast (one-to-one) and multicast traffic. A streamlined on-chip router architecture is also presented for avoiding congested areas in 2D mesh NoCs via employing efficient input and output selection. The output selection utilizes an adaptive routing algorithm based on the congestion condition of neighboring routers while the input selection allows packets to be serviced from each input port according to its congestion level. Moreover, in order to increase memory parallelism and bring compatibility with existing IP cores in network-based multiprocessor architectures, adaptive network interface architectures are presented to use multiple SDRAMs which can be accessed simultaneously. In addition, a smart memory controller is integrated in the adaptive network interface to improve the memory utilization and reduce both memory and network latencies. Three Dimensional Integrated Circuits (3D ICs) have been emerging as a viable candidate to achieve better performance and package density as compared to traditional 2D ICs. In addition, combining the benefits of 3D IC and NoC schemes provides a significant performance gain for 3D architectures. In recent years, inter-layer communication across multiple stacked layers (vertical channel) has attracted a lot of interest. In this thesis, a novel adaptive pipeline bus structure is proposed for inter-layer communication to improve the performance by reducing the delay and complexity of traditional bus arbitration. In addition, two mesh-based topologies for 3D architectures are also introduced to mitigate the inter-layer footprint and power dissipation on each layer with a small performance penalty.