997 resultados para Multiprocessor systems
Resumo:
In order to achieve the high performance, we need to have an efficient scheduling of a parallelprogram onto the processors in multiprocessor systems that minimizes the entire executiontime. This problem of multiprocessor scheduling can be stated as finding a schedule for ageneral task graph to be executed on a multiprocessor system so that the schedule length can be minimize [10]. This scheduling problem is known to be NP- Hard.In multi processor task scheduling, we have a number of CPU’s on which a number of tasksare to be scheduled that the program’s execution time is minimized. According to [10], thetasks scheduling problem is a key factor for a parallel multiprocessor system to gain betterperformance. A task can be partitioned into a group of subtasks and represented as a DAG(Directed Acyclic Graph), so the problem can be stated as finding a schedule for a DAG to beexecuted in a parallel multiprocessor system so that the schedule can be minimized. Thishelps to reduce processing time and increase processor utilization. The aim of this thesis workis to check and compare the results obtained by Bee Colony algorithm with already generatedbest known results in multi processor task scheduling domain.
Resumo:
Hard real- time multiprocessor scheduling has seen, in recent years, the flourishing of semi-partitioned scheduling algorithms. This category of scheduling schemes combines elements of partitioned and global scheduling for the purposes of achieving efficient utilization of the system’s processing resources with strong schedulability guarantees and with low dispatching overheads. The sub-class of slot-based “task-splitting” scheduling algorithms, in particular, offers very good trade-offs between schedulability guarantees (in the form of high utilization bounds) and the number of preemptions/migrations involved. However, so far there did not exist unified scheduling theory for such algorithms; each one was formulated in its own accompanying analysis. This article changes this fragmented landscape by formulating a more unified schedulability theory covering the two state-of-the-art slot-based semi-partitioned algorithms, S-EKG and NPS-F (both fixed job-priority based). This new theory is based on exact schedulability tests, thus also overcoming many sources of pessimism in existing analysis. In turn, since schedulability testing guides the task assignment under the schemes in consideration, we also formulate an improved task assignment procedure. As the other main contribution of this article, and as a response to the fact that many unrealistic assumptions, present in the original theory, tend to undermine the theoretical potential of such scheduling schemes, we identified and modelled into the new analysis all overheads incurred by the algorithms in consideration. The outcome is a new overhead-aware schedulability analysis that permits increased efficiency and reliability. The merits of this new theory are evaluated by an extensive set of experiments.
Resumo:
Nos dias de hoje, os sistemas de tempo real crescem em importância e complexidade. Mediante a passagem do ambiente uniprocessador para multiprocessador, o trabalho realizado no primeiro não é completamente aplicável no segundo, dado que o nível de complexidade difere, principalmente devido à existência de múltiplos processadores no sistema. Cedo percebeu-se, que a complexidade do problema não cresce linearmente com a adição destes. Na verdade, esta complexidade apresenta-se como uma barreira ao avanço científico nesta área que, para já, se mantém desconhecida, e isto testemunha-se, essencialmente no caso de escalonamento de tarefas. A passagem para este novo ambiente, quer se trate de sistemas de tempo real ou não, promete gerar a oportunidade de realizar trabalho que no primeiro caso nunca seria possível, criando assim, novas garantias de desempenho, menos gastos monetários e menores consumos de energia. Este último fator, apresentou-se desde cedo, como, talvez, a maior barreira de desenvolvimento de novos processadores na área uniprocessador, dado que, à medida que novos eram lançados para o mercado, ao mesmo tempo que ofereciam maior performance, foram levando ao conhecimento de um limite de geração de calor que obrigou ao surgimento da área multiprocessador. No futuro, espera-se que o número de processadores num determinado chip venha a aumentar, e como é óbvio, novas técnicas de exploração das suas inerentes vantagens têm de ser desenvolvidas, e a área relacionada com os algoritmos de escalonamento não é exceção. Ao longo dos anos, diferentes categorias de algoritmos multiprocessador para dar resposta a este problema têm vindo a ser desenvolvidos, destacando-se principalmente estes: globais, particionados e semi-particionados. A perspectiva global, supõe a existência de uma fila global que é acessível por todos os processadores disponíveis. Este fato torna disponível a migração de tarefas, isto é, é possível parar a execução de uma tarefa e resumir a sua execução num processador distinto. Num dado instante, num grupo de tarefas, m, as tarefas de maior prioridade são selecionadas para execução. Este tipo promete limites de utilização altos, a custo elevado de preempções/migrações de tarefas. Em contraste, os algoritmos particionados, colocam as tarefas em partições, e estas, são atribuídas a um dos processadores disponíveis, isto é, para cada processador, é atribuída uma partição. Por essa razão, a migração de tarefas não é possível, acabando por fazer com que o limite de utilização não seja tão alto quando comparado com o caso anterior, mas o número de preempções de tarefas decresce significativamente. O esquema semi-particionado, é uma resposta de caráter hibrido entre os casos anteriores, pois existem tarefas que são particionadas, para serem executadas exclusivamente por um grupo de processadores, e outras que são atribuídas a apenas um processador. Com isto, resulta uma solução que é capaz de distribuir o trabalho a ser realizado de uma forma mais eficiente e balanceada. Infelizmente, para todos estes casos, existe uma discrepância entre a teoria e a prática, pois acaba-se por se assumir conceitos que não são aplicáveis na vida real. Para dar resposta a este problema, é necessário implementar estes algoritmos de escalonamento em sistemas operativos reais e averiguar a sua aplicabilidade, para caso isso não aconteça, as alterações necessárias sejam feitas, quer a nível teórico quer a nível prá
Resumo:
L’aparició d’un nou paradigma per al disseny de sistemes multiprocessador, les NoC; requereixen una manera d’adaptar els IP cores ja existents i permetre la seva connexió en xarxa. Aquest projecte presenta un disseny d’una interfície que aconsegueix adaptar un IP core existent, el LEON3; del protocol del bus AMBA al protocol de la xarxa. D’aquesta manera i basant-nos en idees d’interfícies discutides en l’estat de l’art, aconseguim desacoblar el processador del disseny i topologia de la xarxa.
Resumo:
Rapid ongoing evolution of multiprocessors will lead to systems with hundreds of processing cores integrated in a single chip. An emerging challenge is the implementation of reliable and efficient interconnection between these cores as well as other components in the systems. Network-on-Chip is an interconnection approach which is intended to solve the performance bottleneck caused by traditional, poorly scalable communication structures such as buses. However, a large on-chip network involves issues related to congestion problems and system control, for instance. Additionally, faults can cause problems in multiprocessor systems. These faults can be transient faults, permanent manufacturing faults, or they can appear due to aging. To solve the emerging traffic management, controllability issues and to maintain system operation regardless of faults a monitoring system is needed. The monitoring system should be dynamically applicable to various purposes and it should fully cover the system under observation. In a large multiprocessor the distances between components can be relatively long. Therefore, the system should be designed so that the amount of energy-inefficient long-distance communication is minimized. This thesis presents a dynamically clustered distributed monitoring structure. The monitoring is distributed so that no centralized control is required for basic tasks such as traffic management and task mapping. To enable extensive analysis of different Network-on-Chip architectures, an in-house SystemC based simulation environment was implemented. It allows transaction level analysis without time consuming circuit level implementations during early design phases of novel architectures and features. The presented analysis shows that the dynamically clustered monitoring structure can be efficiently utilized for traffic management in faulty and congested Network-on-Chip-based multiprocessor systems. The monitoring structure can be also successfully applied for task mapping purposes. Furthermore, the analysis shows that the presented in-house simulation environment is flexible and practical tool for extensive Network-on-Chip architecture analysis.
Resumo:
The capabilities and thus, design complexity of VLSI-based embedded systems have increased tremendously in recent years, riding the wave of Moore’s law. The time-to-market requirements are also shrinking, imposing challenges to the designers, which in turn, seek to adopt new design methods to increase their productivity. As an answer to these new pressures, modern day systems have moved towards on-chip multiprocessing technologies. New architectures have emerged in on-chip multiprocessing in order to utilize the tremendous advances of fabrication technology. Platform-based design is a possible solution in addressing these challenges. The principle behind the approach is to separate the functionality of an application from the organization and communication architecture of hardware platform at several levels of abstraction. The existing design methodologies pertaining to platform-based design approach don’t provide full automation at every level of the design processes, and sometimes, the co-design of platform-based systems lead to sub-optimal systems. In addition, the design productivity gap in multiprocessor systems remain a key challenge due to existing design methodologies. This thesis addresses the aforementioned challenges and discusses the creation of a development framework for a platform-based system design, in the context of the SegBus platform - a distributed communication architecture. This research aims to provide automated procedures for platform design and application mapping. Structural verification support is also featured thus ensuring correct-by-design platforms. The solution is based on a model-based process. Both the platform and the application are modeled using the Unified Modeling Language. This thesis develops a Domain Specific Language to support platform modeling based on a corresponding UML profile. Object Constraint Language constraints are used to support structurally correct platform construction. An emulator is thus introduced to allow as much as possible accurate performance estimation of the solution, at high abstraction levels. VHDL code is automatically generated, in the form of “snippets” to be employed in the arbiter modules of the platform, as required by the application. The resulting framework is applied in building an actual design solution for an MP3 stereo audio decoder application.
Resumo:
Inferring population admixture from genetic data and quantifying it is a difficult but crucial task in evolutionary and conservation biology. Unfortunately state-of-the-art probabilistic approaches are computationally demanding. Effectively exploiting the computational power of modern multiprocessor systems can thus have a positive impact to Monte Carlo-based simulation of admixture modeling. A novel parallel approach is briefly described and promising results on its message passing interface (MPI)-based C implementation are reported.
Resumo:
This paper presents a paralleled Two-Pass Hexagonal (TPA) algorithm constituted by Linear Hashtable Motion Estimation Algorithm (LHMEA) and Hexagonal Search (HEXBS) for motion estimation. In the TPA., Motion Vectors (MV) are generated from the first-pass LHMEA and are used as predictors for second-pass HEXBS motion estimation, which only searches a small number of Macroblocks (MBs). We introduced hashtable into video processing and completed parallel implementation. We propose and evaluate parallel implementations of the LHMEA of TPA on clusters of workstations for real time video compression. It discusses how parallel video coding on load balanced multiprocessor systems can help, especially on motion estimation. The effect of load balancing for improved performance is discussed. The performance or the algorithm is evaluated by using standard video sequences and the results are compared to current algorithms.
Resumo:
This paper presents a paralleled Two-Pass Hexagonal (TPA) algorithm constituted by Linear Hashtable Motion Estimation Algorithm (LHMEA) and Hexagonal Search (HEXBS) for motion estimation. In the TPA, Motion Vectors (MV) are generated from the first-pass LHMEA and are used as predictors for second-pass HEXBS motion estimation, which only searches a small number of Macroblocks (MBs). We introduced hashtable into video processing and completed parallel implementation. We propose and evaluate parallel implementations of the LHMEA of TPA on clusters of workstations for real time video compression. It discusses how parallel video coding on load balanced multiprocessor systems can help, especially on motion estimation. The effect of load balancing for improved performance is discussed. The performance of the algorithm is evaluated by using standard video sequences and the results are compared to current algorithms.
Resumo:
This paper presents a paralleled Two-Pass Hexagonal (TPA) algorithm constituted by Linear Hashtable Motion Estimation Algorithm (LHMEA) and Hexagonal Search (HEXBS) for motion estimation. In the TPA, Motion Vectors (MV) are generated from the first-pass LHMEA and are used as predictors for second-pass HEXBS motion estimation, which only searches a small number of Macroblocks (MBs). We introduced hashtable into video processing and completed parallel implementation. We propose and evaluate parallel implementations of the LHMEA of TPA on clusters of workstations for real time video compression. It discusses how parallel video coding on load balanced multiprocessor systems can help, especially on motion estimation. The effect of load balancing for improved performance is discussed. The performance of the algorithm is evaluated by using standard video sequences and the results are compared to current algorithms.
Resumo:
Software transaction memory (STM) systems have been used as an approach to improve performance, by allowing the concurrent execution of atomic blocks. However, under high-contention workloads, STM-based systems can considerably degrade performance, as transaction conflict rate increases. Contention management policies have been used as a way to select which transaction to abort when a conflict occurs. In general, contention managers are not capable of avoiding conflicts, as they can only select which transaction to abort and the moment it should restart. Since contention managers act only after a conflict is detected, it becomes harder to effectively increase transaction throughput. More proactive approaches have emerged, aiming at predicting when a transaction is likely to abort, postponing its execution. Nevertheless, most of the proposed proactive techniques are limited, as they do not replace the doomed transaction by another or, when they do, they rely on the operating system for that, having little or no control on which transaction to run. This article proposes LUTS, a lightweight user-level transaction scheduler. Unlike other techniques, LUTS provides the means for selecting another transaction to run in parallel, thus improving system throughput. We discuss LUTS design and propose a dynamic conflict-avoidance heuristic built around its scheduling capabilities. Experimental results, conducted with the STAMP and STMBench7 benchmark suites, running on TinySTM and SwissTM, show how our conflict-avoidance heuristic can effectively improve STM performance on high contention applications. © 2012 Springer Science+Business Media, LLC.
Resumo:
An investigation is carried out into the design of a small local computer network for eventual implementation on the University of Aston campus. Microprocessors are investigated as a possible choice for use as a node controller for reasons of cost and reliability. Since the network will be local, high speed lines of megabit order are proposed. After an introduction to several well known networks, various aspects of networks are discussed including packet switching, functions of a node and host-node protocol. Chapter three develops the network philosophy with an introduction to microprocessors. Various organisations of microprocessors into multicomputer and multiprocessor systems are discussed, together with methods of achieving reliabls computing. Chapter four presents the simulation model and its implentation as a computer program. The major modelling effort is to study the behaviour of messages queueing for access to the network and the message delay experienced on the network. Use is made of spectral analysis to determine the sampling frequency while Sxponentially Weighted Noving Averages are used for data smoothing.
Resumo:
This research was partially supported by the Serbian Ministry of Science and Ecology under project 144007. The authors are grateful to Ivana Ljubić for help in testing and to Vladimir Filipović for useful suggestions and comments.
Resumo:
In this paper a variable neighborhood search (VNS) approach for the task assignment problem (TAP) is considered. An appropriate neighborhood scheme along with a shaking operator and local search procedure are constructed specifically for this problem. The computational results are presented for the instances from the literature, and compared to optimal solutions obtained by the CPLEX solver and heuristic solutions generated by the genetic algorithm. It can be seen that the proposed VNS approach reaches all optimal solutions in a quite short amount of computational time.
Resumo:
The difficulties encountered in implementing large scale CM codes on multiprocessor systems are now fairly well understood. Despite the claims of shared memory architecture manufacturers to provide effective parallelizing compilers, these have not proved to be adequate for large or complex programs. Significant programmer effort is usually required to achieve reasonable parallel efficiencies on significant numbers of processors. The paradigm of Single Program Multi Data (SPMD) domain decomposition with message passing, where each processor runs the same code on a subdomain of the problem, communicating through exchange of messages, has for some time been demonstrated to provide the required level of efficiency, scalability, and portability across both shared and distributed memory systems, without the need to re-author the code into a new language or even to support differing message passing implementations. Extension of the methods into three dimensions has been enabled through the engineering of PHYSICA, a framework for supporting 3D, unstructured mesh and continuum mechanics modeling. In PHYSICA, six inspectors are used. Part of the challenge for automation of parallelization is being able to prove the equivalence of inspectors so that they can be merged into as few as possible.