824 resultados para parallel scheduling


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We address the problem of scheduling a multiclass $M/M/m$ queue with Bernoulli feedback on $m$ parallel servers to minimize time-average linear holding costs. We analyze the performance of a heuristic priority-index rule, which extends Klimov's optimal solution to the single-server case: servers select preemptively customers with larger Klimov indices. We present closed-form suboptimality bounds (approximate optimality) for Klimov's rule, which imply that its suboptimality gap is uniformly bounded above with respect to (i) external arrival rates, as long as they stay within system capacity;and (ii) the number of servers. It follows that its relativesuboptimality gap vanishes in a heavy-traffic limit, as external arrival rates approach system capacity (heavy-traffic optimality). We obtain simpler expressions for the special no-feedback case, where the heuristic reduces to the classical $c \mu$ rule. Our analysis is based on comparing the expected cost of Klimov's ruleto the value of a strong linear programming (LP) relaxation of the system's region of achievable performance of mean queue lengths. In order to obtain this relaxation, we derive and exploit a new set ofwork decomposition laws for the parallel-server system. We further report on the results of a computational study on the quality of the $c \mu$ rule for parallel scheduling.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Dynamic parallel scheduling using work-stealing has gained popularity in academia and industry for its good performance, ease of implementation and theoretical bounds on space and time. Cores treat their own double-ended queues (deques) as a stack, pushing and popping threads from the bottom, but treat the deque of another randomly selected busy core as a queue, stealing threads only from the top, whenever they are idle. However, this standard approach cannot be directly applied to real-time systems, where the importance of parallelising tasks is increasing due to the limitations of multiprocessor scheduling theory regarding parallelism. Using one deque per core is obviously a source of priority inversion since high priority tasks may eventually be enqueued after lower priority tasks, possibly leading to deadline misses as in this case the lower priority tasks are the candidates when a stealing operation occurs. Our proposal is to replace the single non-priority deque of work-stealing with ordered per-processor priority deques of ready threads. The scheduling algorithm starts with a single deque per-core, but unlike traditional work-stealing, the total number of deques in the system may now exceed the number of processors. Instead of stealing randomly, cores steal from the highest priority deque.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This work demonstrates how partial evaluation can be put to practical use in the domain of high-performance numerical computation. I have developed a technique for performing partial evaluation by using placeholders to propagate intermediate results. For an important class of numerical programs, a compiler based on this technique improves performance by an order of magnitude over conventional compilation techniques. I show that by eliminating inherently sequential data-structure references, partial evaluation exposes the low-level parallelism inherent in a computation. I have implemented several parallel scheduling and analysis programs that study the tradeoffs involved in the design of an architecture that can effectively utilize this parallelism. I present these results using the 9- body gravitational attraction problem as an example.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Pós-graduação em Engenharia Mecânica - FEG

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The COSMIC-2 mission is a follow-on mission of the Constellation Observing System for Meteorology, Ionosphere, and Climate (COSMIC) with an upgraded payload for improved radio occultation (RO) applications. The objective of this paper is to develop a near-real-time (NRT) orbit determination system, called NRT National Chiao Tung University (NCTU) system, to support COSMIC-2 in atmospheric applications and verify the orbit product of COSMIC. The system is capable of automatic determinations of the NRT GPS clocks and LEO orbit and clock. To assess the NRT (NCTU) system, we use eight days of COSMIC data (March 24-31, 2011), which contain a total of 331 GPS observation sessions and 12 393 RO observable files. The parallel scheduling for independent GPS and LEO estimations and automatic time matching improves the computational efficiency by 64% compared to the sequential scheduling. Orbit difference analyses suggest a 10-cm accuracy for the COSMIC orbits from the NRT (NCTU) system, and it is consistent as the NRT University Corporation for Atmospheric Research (URCA) system. The mean velocity accuracy from the NRT orbits of COSMIC is 0.168 mm/s, corresponding to an error of about 0.051 μrad in the bending angle. The rms differences in the NRT COSMIC clock and in GPS clocks between the NRT (NCTU) and the postprocessing products are 3.742 and 1.427 ns. The GPS clocks determined from a partial ground GPS network [from NRT (NCTU)] and a full one [from NRT (UCAR)] result in mean rms frequency stabilities of 6.1E-12 and 2.7E-12, respectively, corresponding to range fluctuations of 5.5 and 2.4 cm and bending angle errors of 3.75 and 1.66 μrad .

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper proposes a global multiprocessor scheduling algorithm for the Linux kernel that combines the global EDF scheduler with a priority-aware work-stealing load balancing scheme, enabling parallel real-time tasks to be executed on more than one processor at a given time instant. We state that some priority inversion may actually be acceptable, provided it helps reduce contention, communication, synchronisation and coordination between parallel threads, while still guaranteeing the expected system’s predictability. Experimental results demonstrate the low scheduling overhead of the proposed approach comparatively to an existing real-time deadline-oriented scheduling class for the Linux kernel.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

High-level parallel languages offer a simple way for application programmers to specify parallelism in a form that easily scales with problem size, leaving the scheduling of the tasks onto processors to be performed at runtime. Therefore, if the underlying system cannot efficiently execute those applications on the available cores, the benefits will be lost. In this paper, we consider how to schedule highly heterogenous parallel applications that require real-time performance guarantees on multicore processors. The paper proposes a novel scheduling approach that combines the global Earliest Deadline First (EDF) scheduler with a priority-aware work-stealing load balancing scheme, which enables parallel realtime tasks to be executed on more than one processor at a given time instant. Experimental results demonstrate the better scalability and lower scheduling overhead of the proposed approach comparatively to an existing real-time deadline-oriented scheduling class for the Linux kernel.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Multicore platforms have transformed parallelism into a main concern. Parallel programming models are being put forward to provide a better approach for application programmers to expose the opportunities for parallelism by pointing out potentially parallel regions within tasks, leaving the actual and dynamic scheduling of these regions onto processors to be performed at runtime, exploiting the maximum amount of parallelism. It is in this context that this paper proposes a scheduling approach that combines the constant-bandwidth server abstraction with a priority-aware work-stealing load balancing scheme which, while ensuring isolation among tasks, enables parallel tasks to be executed on more than one processor at a given time instant.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper presents a methodology for multi-objective day-ahead energy resource scheduling for smart grids considering intensive use of distributed generation and Vehicle- To-Grid (V2G). The main focus is the application of weighted Pareto to a multi-objective parallel particle swarm approach aiming to solve the dual-objective V2G scheduling: minimizing total operation costs and maximizing V2G income. A realistic mathematical formulation, considering the network constraints and V2G charging and discharging efficiencies is presented and parallel computing is applied to the Pareto weights. AC power flow calculation is included in the metaheuristics approach to allow taking into account the network constraints. A case study with a 33-bus distribution network and 1800 V2G resources is used to illustrate the performance of the proposed method.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Paper/Poster presented in Work in Progress Session, 28th GI/ITG International Conference on Architecture of Computing Systems (ARCS 2015). 24 to 26, Mar, 2015. Porto, Portugal.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Poster presented in Work in Progress Session, 28th GI/ITG International Conference on Architecture of Computing Systems (ARCS 2015). 24 to 26, Mar, 2015. Porto, Portugal.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Face à estagnação da tecnologia uniprocessador registada na passada década, aos principais fabricantes de microprocessadores encontraram na tecnologia multi-core a resposta `as crescentes necessidades de processamento do mercado. Durante anos, os desenvolvedores de software viram as suas aplicações acompanhar os ganhos de performance conferidos por cada nova geração de processadores sequenciais, mas `a medida que a capacidade de processamento escala em função do número de processadores, a computação sequencial tem de ser decomposta em várias partes concorrentes que possam executar em paralelo, para que possam utilizar as unidades de processamento adicionais e completar mais rapidamente. A programação paralela implica um paradigma completamente distinto da programação sequencial. Ao contrário dos computadores sequenciais tipificados no modelo de Von Neumann, a heterogeneidade de arquiteturas paralelas requer modelos de programação paralela que abstraiam os programadores dos detalhes da arquitectura e simplifiquem o desenvolvimento de aplicações concorrentes. Os modelos de programação paralela mais populares incitam os programadores a identificar instruções concorrentes na sua lógica de programação, e a especificá-las sob a forma de tarefas que possam ser atribuídas a processadores distintos para executarem em simultâneo. Estas tarefas são tipicamente lançadas durante a execução, e atribuídas aos processadores pelo motor de execução subjacente. Como os requisitos de processamento costumam ser variáveis, e não são conhecidos a priori, o mapeamento de tarefas para processadores tem de ser determinado dinamicamente, em resposta a alterações imprevisíveis dos requisitos de execução. `A medida que o volume da computação cresce, torna-se cada vez menos viável garantir as suas restrições temporais em plataformas uniprocessador. Enquanto os sistemas de tempo real se começam a adaptar ao paradigma de computação paralela, há uma crescente aposta em integrar execuções de tempo real com aplicações interativas no mesmo hardware, num mundo em que a tecnologia se torna cada vez mais pequena, leve, ubíqua, e portável. Esta integração requer soluções de escalonamento que simultaneamente garantam os requisitos temporais das tarefas de tempo real e mantenham um nível aceitável de QoS para as restantes execuções. Para tal, torna-se imperativo que as aplicações de tempo real paralelizem, de forma a minimizar os seus tempos de resposta e maximizar a utilização dos recursos de processamento. Isto introduz uma nova dimensão ao problema do escalonamento, que tem de responder de forma correcta a novos requisitos de execução imprevisíveis e rapidamente conjeturar o mapeamento de tarefas que melhor beneficie os critérios de performance do sistema. A técnica de escalonamento baseado em servidores permite reservar uma fração da capacidade de processamento para a execução de tarefas de tempo real, e assegurar que os efeitos de latência na sua execução não afectam as reservas estipuladas para outras execuções. No caso de tarefas escalonadas pelo tempo de execução máximo, ou tarefas com tempos de execução variáveis, torna-se provável que a largura de banda estipulada não seja consumida por completo. Para melhorar a utilização do sistema, os algoritmos de partilha de largura de banda (capacity-sharing) doam a capacidade não utilizada para a execução de outras tarefas, mantendo as garantias de isolamento entre servidores. Com eficiência comprovada em termos de espaço, tempo, e comunicação, o mecanismo de work-stealing tem vindo a ganhar popularidade como metodologia para o escalonamento de tarefas com paralelismo dinâmico e irregular. O algoritmo p-CSWS combina escalonamento baseado em servidores com capacity-sharing e work-stealing para cobrir as necessidades de escalonamento dos sistemas abertos de tempo real. Enquanto o escalonamento em servidores permite partilhar os recursos de processamento sem interferências a nível dos atrasos, uma nova política de work-stealing que opera sobre o mecanismo de capacity-sharing aplica uma exploração de paralelismo que melhora os tempos de resposta das aplicações e melhora a utilização do sistema. Esta tese propõe uma implementação do algoritmo p-CSWS para o Linux. Em concordância com a estrutura modular do escalonador do Linux, ´e definida uma nova classe de escalonamento que visa avaliar a aplicabilidade da heurística p-CSWS em circunstâncias reais. Ultrapassados os obstáculos intrínsecos `a programação da kernel do Linux, os extensos testes experimentais provam que o p-CSWS ´e mais do que um conceito teórico atrativo, e que a exploração heurística de paralelismo proposta pelo algoritmo beneficia os tempos de resposta das aplicações de tempo real, bem como a performance e eficiência da plataforma multiprocessador.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Scheduling tasks to efficiently use the available processor resources is crucial to minimizing the runtime of applications on shared-memory parallel processors. One factor that contributes to poor processor utilization is the idle time caused by long latency operations, such as remote memory references or processor synchronization operations. One way of tolerating this latency is to use a processor with multiple hardware contexts that can rapidly switch to executing another thread of computation whenever a long latency operation occurs, thus increasing processor utilization by overlapping computation with communication. Although multiple contexts are effective for tolerating latency, this effectiveness can be limited by memory and network bandwidth, by cache interference effects among the multiple contexts, and by critical tasks sharing processor resources with less critical tasks. This thesis presents techniques that increase the effectiveness of multiple contexts by intelligently scheduling threads to make more efficient use of processor pipeline, bandwidth, and cache resources. This thesis proposes thread prioritization as a fundamental mechanism for directing the thread schedule on a multiple-context processor. A priority is assigned to each thread either statically or dynamically and is used by the thread scheduler to decide which threads to load in the contexts, and to decide which context to switch to on a context switch. We develop a multiple-context model that integrates both cache and network effects, and shows how thread prioritization can both maintain high processor utilization, and limit increases in critical path runtime caused by multithreading. The model also shows that in order to be effective in bandwidth limited applications, thread prioritization must be extended to prioritize memory requests. We show how simple hardware can prioritize the running of threads in the multiple contexts, and the issuing of requests to both the local memory and the network. Simulation experiments show how thread prioritization is used in a variety of applications. Thread prioritization can improve the performance of synchronization primitives by minimizing the number of processor cycles wasted in spinning and devoting more cycles to critical threads. Thread prioritization can be used in combination with other techniques to improve cache performance and minimize cache interference between different working sets in the cache. For applications that are critical path limited, thread prioritization can improve performance by allowing processor resources to be devoted preferentially to critical threads. These experimental results show that thread prioritization is a mechanism that can be used to implement a wide range of scheduling policies.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We present an optimal methodology for synchronized scheduling of production assembly with air transportation to achieve accurate delivery with minimized cost in consumer electronics supply chain (CESC). This problem was motivated by a major PC manufacturer in consumer electronics industry, where it is required to schedule the delivery requirements to meet the customer needs in different parts of South East Asia. The overall problem is decomposed into two sub-problems which consist of an air transportation allocation problem and an assembly scheduling problem. The air transportation allocation problem is formulated as a Linear Programming Problem with earliness tardiness penalties for job orders. For the assembly scheduling problem, it is basically required to sequence the job orders on the assembly stations to minimize their waiting times before they are shipped by flights to their destinations. Hence the second sub-problem is modelled as a scheduling problem with earliness penalties. The earliness penalties are assumed to be independent of the job orders.