824 resultados para parallel scheduling
Resumo:
Recent embedded processor architectures containing multiple heterogeneous cores and non-coherent caches renewed attention to the use of Software Transactional Memory (STM) as a building block for developing parallel applications. STM promises to ease concurrent and parallel software development, but relies on the possibility of abort conflicting transactions to maintain data consistency, which in turns affects the execution time of tasks carrying transactions. Because of this fact the timing behaviour of the task set may not be predictable, thus it is crucial to limit the execution time overheads resulting from aborts. In this paper we formalise a FIFO-based algorithm to order the sequence of commits of concurrent transactions. Then, we propose and evaluate two non-preemptive and one SRP-based fully-preemptive scheduling strategies, in order to avoid transaction starvation.
Resumo:
The recent technological advancements and market trends are causing an interesting phenomenon towards the convergence of High-Performance Computing (HPC) and Embedded Computing (EC) domains. On one side, new kinds of HPC applications are being required by markets needing huge amounts of information to be processed within a bounded amount of time. On the other side, EC systems are increasingly concerned with providing higher performance in real-time, challenging the performance capabilities of current architectures. The advent of next-generation many-core embedded platforms has the chance of intercepting this converging need for predictable high-performance, allowing HPC and EC applications to be executed on efficient and powerful heterogeneous architectures integrating general-purpose processors with many-core computing fabrics. To this end, it is of paramount importance to develop new techniques for exploiting the massively parallel computation capabilities of such platforms in a predictable way. P-SOCRATES will tackle this important challenge by merging leading research groups from the HPC and EC communities. The time-criticality and parallelisation challenges common to both areas will be addressed by proposing an integrated framework for executing workload-intensive applications with real-time requirements on top of next-generation commercial-off-the-shelf (COTS) platforms based on many-core accelerated architectures. The project will investigate new HPC techniques that fulfil real-time requirements. The main sources of indeterminism will be identified, proposing efficient mapping and scheduling algorithms, along with the associated timing and schedulability analysis, to guarantee the real-time and performance requirements of the applications.
Resumo:
Esta dissertação apresenta um estudo sobre os problemas de sequenciamento de tarefas de produção do tipo job shop scheduling. Os problemas de sequenciamento de tarefas de produção pretendem encontrar a melhor sequência para o processamento de uma lista de tarefas, o instante de início e término de cada tarefa e a afetação de máquinas para as tarefas. Entre estes, encontram-se os problemas com máquinas paralelas, os problemas job shop e flow shop. As medidas de desempenho mais comuns são o makespan (instante de término da execução de todas as tarefas), o tempo de fluxo total, a soma dos atrasos (tardiness), o atraso máximo, o número de tarefas que são completadas após a data limite, entre outros. Num problema do tipo job shop, as tarefas (jobs) consistem num conjunto de operações que têm de ser executadas numa máquina pré-determinada, obedecendo a um determinado sequenciamento com tempos pré-definidos. Estes ambientes permitem diferentes cenários de sequenciamento das tarefas. Normalmente, não são permitidas interrupções no processamento das tarefas (preemption) e pode ainda ser necessário considerar tempos de preparação dependentes da sequência (sequence dependent setup times) ou atribuir pesos (prioridades) diferentes em função da importância da tarefa ou do cliente. Pretende-se o estudo dos modelos matemáticos existentes para várias variantes dos problemas de sequenciamento de tarefas do tipo job shop e a comparação dos resultados das diversas medidas de desempenho da produção. Este trabalho contribui para demonstrar a importância que um bom sequenciamento da produção pode ter na sua eficiência e consequente impacto financeiro.
Resumo:
Combinatorial Optimization Problems occur in a wide variety of contexts and generally are NP-hard problems. At a corporate level solving this problems is of great importance since they contribute to the optimization of operational costs. In this thesis we propose to solve the Public Transport Bus Assignment problem considering an heterogeneous fleet and line exchanges, a variant of the Multi-Depot Vehicle Scheduling Problem in which additional constraints are enforced to model a real life scenario. The number of constraints involved and the large number of variables makes impracticable solving to optimality using complete search techniques. Therefore, we explore metaheuristics, that sacrifice optimality to produce solutions in feasible time. More concretely, we focus on the development of algorithms based on a sophisticated metaheuristic, Ant-Colony Optimization (ACO), which is based on a stochastic learning mechanism. For complex problems with a considerable number of constraints, sophisticated metaheuristics may fail to produce quality solutions in a reasonable amount of time. Thus, we developed parallel shared-memory (SM) synchronous ACO algorithms, however, synchronism originates the straggler problem. Therefore, we proposed three SM asynchronous algorithms that break the original algorithm semantics and differ on the degree of concurrency allowed while manipulating the learned information. Our results show that our sequential ACO algorithms produced better solutions than a Restarts metaheuristic, the ACO algorithms were able to learn and better solutions were achieved by increasing the amount of cooperation (number of search agents). Regarding parallel algorithms, our asynchronous ACO algorithms outperformed synchronous ones in terms of speedup and solution quality, achieving speedups of 17.6x. The cooperation scheme imposed by asynchronism also achieved a better learning rate than the original one.
Resumo:
This study looks at how increased memory utilisation affects throughput and energy consumption in scientific computing, especially in high-energy physics. Our aim is to minimise energy consumed by a set of jobs without increasing the processing time. The earlier tests indicated that, especially in data analysis, throughput can increase over 100% and energy consumption decrease 50% by processing multiple jobs in parallel per CPU core. Since jobs are heterogeneous, it is not possible to find an optimum value for the number of parallel jobs. A better solution is based on memory utilisation, but finding an optimum memory threshold is not straightforward. Therefore, a fuzzy logic-based algorithm was developed that can dynamically adapt the memory threshold based on the overall load. In this way, it is possible to keep memory consumption stable with different workloads while achieving significantly higher throughput and energy-efficiency than using a traditional fixed number of jobs or fixed memory threshold approaches.
Resumo:
Tutkimuksen päätavoite on arvioida, ovatko neljä ohjelmistovaihtoehtoa riittäviä tuotannon aikataulutuksen työkaluja ja mikä työkaluista sopii toimeksiantajayritykselle. Alatavoitteena on kuvata tuotannon aikataulutuksen nyky- ja tahtotila prosessimallinnuksen avulla, selvittää työkalun käyttäjätarpeet ja määritellä priorisoidut valintakriteerit työkalulle.Tutkimuksen teoriaosuudessa tutkitaan tuotannon aikataulutuksen logiikkaa ja haasteita. Työssä tarkastellaan aikataulutusohjelmiston valintaa rinnakkain prosessinmallinnuksen kanssa. Aikataulutusohjelmistovaihtoehdot ja metodit käyttäjätarpeiden selvittämiseksi käydään läpi. Empiriaosuudessa selvitetään tutkimuksen suhde toimeksiantajayrityksen strategiaan. Käyttäjätarpeet selvitetään haastattelujen avulla jaanalysoidaan QFD matriisin avulla. Toimeksiantajayrityksen tuotannon aikataulutuksen nyky- ja tahtotilaprosessit mallinnetaan, jotta ohjelmistojen sopivuutta, aikataulutusprosessia tukevana työkaluna voidaan arvioida.Tutkimustuloksena ovatpriorisoidut valintakriteerit aikataulutustyökalulle eli käyttäjätarpeista johdetut tärkeimmät toiminnalliset ominaisuudet, järjestelmätoimittaja-arvio sekä suositukset jatkotoimenpiteistä ja lisätutkimuksesta.
Resumo:
The past few decades have seen a considerable increase in the number of parallel and distributed systems. With the development of more complex applications, the need for more powerful systems has emerged and various parallel and distributed environments have been designed and implemented. Each of the environments, including hardware and software, has unique strengths and weaknesses. There is no single parallel environment that can be identified as the best environment for all applications with respect to hardware and software properties. The main goal of this thesis is to provide a novel way of performing data-parallel computation in parallel and distributed environments by utilizing the best characteristics of difference aspects of parallel computing. For the purpose of this thesis, three aspects of parallel computing were identified and studied. First, three parallel environments (shared memory, distributed memory, and a network of workstations) are evaluated to quantify theirsuitability for different parallel applications. Due to the parallel and distributed nature of the environments, networks connecting the processors in these environments were investigated with respect to their performance characteristics. Second, scheduling algorithms are studied in order to make them more efficient and effective. A concept of application-specific information scheduling is introduced. The application- specific information is data about the workload extractedfrom an application, which is provided to a scheduling algorithm. Three scheduling algorithms are enhanced to utilize the application-specific information to further refine their scheduling properties. A more accurate description of the workload is especially important in cases where the workunits are heterogeneous and the parallel environment is heterogeneous and/or non-dedicated. The results obtained show that the additional information regarding the workload has a positive impact on the performance of applications. Third, a programming paradigm for networks of symmetric multiprocessor (SMP) workstations is introduced. The MPIT programming paradigm incorporates the Message Passing Interface (MPI) with threads to provide a methodology to write parallel applications that efficiently utilize the available resources and minimize the overhead. The MPIT allows for communication and computation to overlap by deploying a dedicated thread for communication. Furthermore, the programming paradigm implements an application-specific scheduling algorithm. The scheduling algorithm is executed by the communication thread. Thus, the scheduling does not affect the execution of the parallel application. Performance results achieved from the MPIT show that considerable improvements over conventional MPI applications are achieved.
Resumo:
In this work, we present an integral scheduling system for non-dedicated clusters, termed CISNE-P, which ensures the performance required by the local applications, while simultaneously allocating cluster resources to parallel jobs. Our approach solves the problem efficiently by using a social contract technique. This kind of technique is based on reserving computational resources, preserving a predetermined response time to local users. CISNE-P is a middleware which includes both a previously developed space-sharing job scheduler and a dynamic coscheduling system, a time sharing scheduling component. The experimentation performed in a Linux cluster shows that these two scheduler components are complementary and a good coordination improves global performance significantly. We also compare two different CISNE-P implementations: one developed inside the kernel, and the other entirely implemented in the user space.
Resumo:
Formal methods provide a means of reasoning about computer programs in order to prove correctness criteria. One subtype of formal methods is based on the weakest precondition predicate transformer semantics and uses guarded commands as the basic modelling construct. Examples of such formalisms are Action Systems and Event-B. Guarded commands can intuitively be understood as actions that may be triggered when an associated guard condition holds. Guarded commands whose guards hold are nondeterministically chosen for execution, but no further control flow is present by default. Such a modelling approach is convenient for proving correctness, and the Refinement Calculus allows for a stepwise development method. It also has a parallel interpretation facilitating development of concurrent software, and it is suitable for describing event-driven scenarios. However, for many application areas, the execution paradigm traditionally used comprises more explicit control flow, which constitutes an obstacle for using the above mentioned formal methods. In this thesis, we study how guarded command based modelling approaches can be conveniently and efficiently scheduled in different scenarios. We first focus on the modelling of trust for transactions in a social networking setting. Due to the event-based nature of the scenario, the use of guarded commands turns out to be relatively straightforward. We continue by studying modelling of concurrent software, with particular focus on compute-intensive scenarios. We go from theoretical considerations to the feasibility of implementation by evaluating the performance and scalability of executing a case study model in parallel using automatic scheduling performed by a dedicated scheduler. Finally, we propose a more explicit and non-centralised approach in which the flow of each task is controlled by a schedule of its own. The schedules are expressed in a dedicated scheduling language, and patterns assist the developer in proving correctness of the scheduled model with respect to the original one.
Resumo:
With the shift towards many-core computer architectures, dataflow programming has been proposed as one potential solution for producing software that scales to a varying number of processor cores. Programming for parallel architectures is considered difficult as the current popular programming languages are inherently sequential and introducing parallelism is typically up to the programmer. Dataflow, however, is inherently parallel, describing an application as a directed graph, where nodes represent calculations and edges represent a data dependency in form of a queue. These queues are the only allowed communication between the nodes, making the dependencies between the nodes explicit and thereby also the parallelism. Once a node have the su cient inputs available, the node can, independently of any other node, perform calculations, consume inputs, and produce outputs. Data ow models have existed for several decades and have become popular for describing signal processing applications as the graph representation is a very natural representation within this eld. Digital lters are typically described with boxes and arrows also in textbooks. Data ow is also becoming more interesting in other domains, and in principle, any application working on an information stream ts the dataflow paradigm. Such applications are, among others, network protocols, cryptography, and multimedia applications. As an example, the MPEG group standardized a dataflow language called RVC-CAL to be use within reconfigurable video coding. Describing a video coder as a data ow network instead of with conventional programming languages, makes the coder more readable as it describes how the video dataflows through the different coding tools. While dataflow provides an intuitive representation for many applications, it also introduces some new problems that need to be solved in order for data ow to be more widely used. The explicit parallelism of a dataflow program is descriptive and enables an improved utilization of available processing units, however, the independent nodes also implies that some kind of scheduling is required. The need for efficient scheduling becomes even more evident when the number of nodes is larger than the number of processing units and several nodes are running concurrently on one processor core. There exist several data ow models of computation, with different trade-offs between expressiveness and analyzability. These vary from rather restricted but statically schedulable, with minimal scheduling overhead, to dynamic where each ring requires a ring rule to evaluated. The model used in this work, namely RVC-CAL, is a very expressive language, and in the general case it requires dynamic scheduling, however, the strong encapsulation of dataflow nodes enables analysis and the scheduling overhead can be reduced by using quasi-static, or piecewise static, scheduling techniques. The scheduling problem is concerned with nding the few scheduling decisions that must be run-time, while most decisions are pre-calculated. The result is then an, as small as possible, set of static schedules that are dynamically scheduled. To identify these dynamic decisions and to find the concrete schedules, this thesis shows how quasi-static scheduling can be represented as a model checking problem. This involves identifying the relevant information to generate a minimal but complete model to be used for model checking. The model must describe everything that may affect scheduling of the application while omitting everything else in order to avoid state space explosion. This kind of simplification is necessary to make the state space analysis feasible. For the model checker to nd the actual schedules, a set of scheduling strategies are de ned which are able to produce quasi-static schedulers for a wide range of applications. The results of this work show that actor composition with quasi-static scheduling can be used to transform data ow programs to t many different computer architecture with different type and number of cores. This in turn, enables dataflow to provide a more platform independent representation as one application can be fitted to a specific processor architecture without changing the actual program representation. Instead, the program representation is in the context of design space exploration optimized by the development tools to fit the target platform. This work focuses on representing the dataflow scheduling problem as a model checking problem and is implemented as part of a compiler infrastructure. The thesis also presents experimental results as evidence of the usefulness of the approach.
Resumo:
In 2006 the Route load balancing algorithm was proposed and compared to other techniques aiming at optimizing the process allocation in grid environments. This algorithm schedules tasks of parallel applications considering computer neighborhoods (where the distance is defined by the network latency). Route presents good results for large environments, although there are cases where neighbors do not have an enough computational capacity nor communication system capable of serving the application. In those situations the Route migrates tasks until they stabilize in a grid area with enough resources. This migration may take long time what reduces the overall performance. In order to improve such stabilization time, this paper proposes RouteGA (Route with Genetic Algorithm support) which considers historical information on parallel application behavior and also the computer capacities and load to optimize the scheduling. This information is extracted by using monitors and summarized in a knowledge base used to quantify the occupation of tasks. Afterwards, such information is used to parameterize a genetic algorithm responsible for optimizing the task allocation. Results confirm that RouteGA outperforms the load balancing carried out by the original Route, which had previously outperformed others scheduling algorithms from literature.
Resumo:
The problem of scheduling a parallel program presented by a weighted directed acyclic graph (DAG) to the set of homogeneous processors for minimizing the completion time of the program has been extensively studied as academic optimization problem which occurs in optimizing the execution time of parallel algorithm with parallel computer.In this paper, we propose an application of the Ant Colony Optimization (ACO) to a multiprocessor scheduling problem (MPSP). In the MPSP, no preemption is allowed and each operation demands a setup time on the machines. The problem seeks to compose a schedule that minimizes the total completion time.We therefore rely on heuristics to find solutions since solution methods are not feasible for most problems as such. This novel heuristic searching approach to the multiprocessor based on the ACO algorithm a collection of agents cooperate to effectively explore the search space.A computational experiment is conducted on a suit of benchmark application. By comparing our algorithm result obtained to that of previous heuristic algorithm, it is evince that the ACO algorithm exhibits competitive performance with small error ratio.
Resumo:
The multiprocessor task graph scheduling problem has been extensively studied asacademic optimization problem which occurs in optimizing the execution time of parallelalgorithm with parallel computer. The problem is already being known as one of the NPhardproblems. There are many good approaches made with many optimizing algorithmto find out the optimum solution for this problem with less computational time. One ofthem is branch and bound algorithm.In this paper, we propose a branch and bound algorithm for the multiprocessor schedulingproblem. We investigate the algorithm by comparing two different lower bounds withtheir computational costs and the size of the pruned tree.Several experiments are made with small set of problems and results are compared indifferent sections.
Resumo:
In order to achieve the high performance, we need to have an efficient scheduling of a parallelprogram onto the processors in multiprocessor systems that minimizes the entire executiontime. This problem of multiprocessor scheduling can be stated as finding a schedule for ageneral task graph to be executed on a multiprocessor system so that the schedule length can be minimize [10]. This scheduling problem is known to be NP- Hard.In multi processor task scheduling, we have a number of CPU’s on which a number of tasksare to be scheduled that the program’s execution time is minimized. According to [10], thetasks scheduling problem is a key factor for a parallel multiprocessor system to gain betterperformance. A task can be partitioned into a group of subtasks and represented as a DAG(Directed Acyclic Graph), so the problem can be stated as finding a schedule for a DAG to beexecuted in a parallel multiprocessor system so that the schedule can be minimized. Thishelps to reduce processing time and increase processor utilization. The aim of this thesis workis to check and compare the results obtained by Bee Colony algorithm with already generatedbest known results in multi processor task scheduling domain.
Resumo:
This chapter studies a two-level production planning problem where, on each level, a lot sizing and scheduling problem with parallel machines, capacity constraints and sequence-dependent setup costs and times must be solved. The problem can be found in soft drink companies where the production process involves two interdependent levels with decisions concerning raw material storage and soft drink bottling. Models and solution approaches proposed so far are surveyed and conceptually compared. Two different approaches have been selected to perform a series of computational comparisons: an evolutionary technique comprising a genetic algorithm and its memetic version, and a decomposition and relaxation approach. © 2008 Springer-Verlag Berlin Heidelberg.