998 resultados para Multiprocessor systems


Relevância:

60.00% 60.00%

Publicador:

Resumo:

The difficulties encountered in implementing large scale CM codes on multiprocessor systems are now fairly well understood. Despite the claims of shared memory architecture manufacturers to provide effective parallelizing compilers, these have not proved to be adequate for large or complex programs. Significant programmer effort is usually required to achieve reasonable parallel efficiencies on significant numbers of processors. The paradigm of Single Program Multi Data (SPMD) domain decomposition with message passing, where each processor runs the same code on a subdomain of the problem, communicating through exchange of messages, has for some time been demonstrated to provide the required level of efficiency, scalability, and portability across both shared and distributed memory systems, without the need to re-author the code into a new language or even to support differing message passing implementations. Extension of the methods into three dimensions has been enabled through the engineering of PHYSICA, a framework for supporting 3D, unstructured mesh and continuum mechanics modeling. In PHYSICA, six inspectors are used. Part of the challenge for automation of parallelization is being able to prove the equivalence of inspectors so that they can be merged into as few as possible.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

"UIUCDCS-R-75-724"

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Simultaneous consideration of both performance and reliability issues is important in the choice of computer architectures for real-time aerospace applications. One of the requirements for such a fault-tolerant computer system is the characteristic of graceful degradation. A shared and replicated resources computing system represents such an architecture. In this paper, a combinatorial model is used for the evaluation of the instruction execution rate of a degradable, replicated resources computing system such as a modular multiprocessor system. Next, a method is presented to evaluate the computation reliability of such a system utilizing a reliability graph model and the instruction execution rate. Finally, this computation reliability measure, which simultaneously describes both performance and reliability, is applied as a constraint in an architecture optimization model for such computing systems. Index Terms-Architecture optimization, computation

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The fault-tolerant multiprocessor (ftmp) is a bus-based multiprocessor architecture with real-time and fault- tolerance features and is used in critical aerospace applications. A preliminary performance evaluation is of crucial importance in the design of such systems. In this paper, we review stochastic Petri nets (spn) and developspn-based performance models forftmp. These performance models enable efficient computation of important performance measures such as processing power, bus contention, bus utilization, and waiting times.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In modern wireline and wireless communication systems, Viterbi decoder is one of the most compute intensive and essential elements. Each standard requires a different configuration of Viterbi decoder. Hence there is a need to design a flexible reconfigurable Viterbi decoder to support different configurations on a single platform. In this paper we present a reconfigurable Viterbi decoder which can be reconfigured for standards such as WCDMA, CDMA2000, IEEE 802.11, DAB, DVB, and GSM. Different parameters like code rate, constraint length, polynomials and truncation length can be configured to map any of the above mentioned standards. Our design provides higher throughput and scalable power consumption in various configuration of the reconfigurable Viterbi decoder. The power and throughput can also be optimized for different standards.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes the architecture of a multiprocessor system which we call the Broadcast Cube System (BCS) for solving important computation intensive problems such as systems of linear algebraic equations and Partial Differential Equations (PDEs), and highlights its features. Further, this paper presents an analytical performance study of the BCS, and it describes the main details of the design and implementation of the simulator for the BCS.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The emergence of programmable logic devices as processing platforms for digital signal processing applications poses challenges concerning rapid implementation and high level optimization of algorithms on these platforms. This paper describes Abhainn, a rapid implementation methodology and toolsuite for translating an algorithmic expression of the system to a working implementation on a heterogeneous multiprocessor/field programmable gate array platform, or a standalone system on programmable chip solution. Two particular focuses for Abhainn are the automated but configurable realisation of inter-processor communuication fabrics, and the establishment of novel dedicated hardware component design methodologies allowing algorithm level transformation for system optimization. This paper outlines the approaches employed in both these particular instances.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Scalability and efficiency of on-chip communication of emerging Multiprocessor System-on-Chip (MPSoC) are critical design considerations. Conventional bus based interconnection schemes no longer fit for MPSoC with a large number of cores. Networks-on-Chip (NoC) is widely accepted as the next generation interconnection scheme for large scale MPSoC. The increase of MPSoC complexity requires fast and accurate system-level modeling techniques for rapid modeling and veri-fication of emerging MPSoCs. However, the existing modeling methods are limited in delivering the essentials of timing accuracy and simulation speed. This paper proposes a novel system-level Networks-on-Chip (NoC) modeling method, which is based on SystemC and TLM2.0 and capable of delivering timing accuracy close to cycle accurate modeling techniques at a significantly lower simulation cost. Experimental results are presented to demonstrate the proposed method. ©2010 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

LLF (Least Laxity First) scheduling, which assigns a higher priority to a task with a smaller laxity, has been known as an optimal preemptive scheduling algorithm on a single processor platform. However, little work has been made to illuminate its characteristics upon multiprocessor platforms. In this paper, we identify the dynamics of laxity from the system’s viewpoint and translate the dynamics into LLF multiprocessor schedulability analysis. More specifically, we first characterize laxity properties under LLF scheduling, focusing on laxity dynamics associated with a deadline miss. These laxity dynamics describe a lower bound, which leads to the deadline miss, on the number of tasks of certain laxity values at certain time instants. This lower bound is significant because it represents invariants for highly dynamic system parameters (laxity values). Since the laxity of a task is dependent of the amount of interference of higher-priority tasks, we can then derive a set of conditions to check whether a given task system can go into the laxity dynamics towards a deadline miss. This way, to the author’s best knowledge, we propose the first LLF multiprocessor schedulability test based on its own laxity properties. We also develop an improved schedulability test that exploits slack values. We mathematically prove that the proposed LLF tests dominate the state-of-the-art EDZL tests. We also present simulation results to evaluate schedulability performance of both the original and improved LLF tests in a quantitative manner.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

It is widely assumed that scheduling real-time tasks becomes more difficult as their deadlines get shorter. With deadlines shorter, however, tasks potentially compete less with each other for processors, and this could produce more contention-free slots at which the number of competing tasks is smaller than or equal to the number of available processors. This paper presents a policy (called CF policy) that utilizes such contention-free slots effectively. This policy can be employed by any work-conserving, preemptive scheduling algorithm, and we show that any algorithm extended with this policy dominates the original algorithm in terms of schedulability. We also present improved schedulability tests for algorithms that employ this policy, based on the observation that interference from tasks is reduced when their executions are postponed to contention-free slots. Finally, using the properties of the CF policy, we derive a counter-intuitive claim that shortening of task deadlines can help improve schedulability of task systems. We present heuristics that effectively reduce task deadlines for better scheduability without performing any exhaustive search.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Consider the problem of scheduling a set of sporadic tasks on a multiprocessor system to meet deadlines using a task-splitting scheduling algorithm. Task-splitting (also called semi-partitioning) scheduling algorithms assign most tasks to just one processor but a few tasks are assigned to two or more processors, and they are dispatched in a way that ensures that a task never executes on two or more processors simultaneously. A particular type of task-splitting algorithms, called slot-based task-splitting dispatching, is of particular interest because of its ability to schedule tasks with high processor utilizations. Unfortunately, no slot-based task-splitting algorithm has been implemented in a real operating system so far. In this paper we discuss and propose some modifications to the slot-based task-splitting algorithm driven by implementation concerns, and we report the first implementation of this family of algorithms in a real operating system running Linux kernel version 2.6.34. We have also conducted an extensive range of experiments on a 4-core multicore desktop PC running task-sets with utilizations of up to 88%. The results show that the behavior of our implementation is in line with the theoretical framework behind it.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Scheduling of constrained deadline sporadic task systems on multiprocessor platforms is an area which has received much attention in the recent past. It is widely believed that finding an optimal scheduler is hard, and therefore most studies have focused on developing algorithms with good processor utilization bounds. These algorithms can be broadly classified into two categories: partitioned scheduling in which tasks are statically assigned to individual processors, and global scheduling in which each task is allowed to execute on any processor in the platform. In this paper we consider a third, more general, approach called cluster-based scheduling. In this approach each task is statically assigned to a processor cluster, tasks in each cluster are globally scheduled among themselves, and clusters in turn are scheduled on the multiprocessor platform. We develop techniques to support such cluster-based scheduling algorithms, and also consider properties that minimize total processor utilization of individual clusters. In the last part of this paper, we develop new virtual cluster-based scheduling algorithms. For implicit deadline sporadic task systems, we develop an optimal scheduling algorithm that is neither Pfair nor ERfair. We also show that the processor utilization bound of us-edf{m/(2m−1)} can be improved by using virtual clustering. Since neither partitioned nor global strategies dominate over the other, cluster-based scheduling is a natural direction for research towards achieving improved processor utilization bounds.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a 12(1 + 3R/(4m)) competitive algorithm for scheduling implicit-deadline sporadic tasks on a platform comprising m processors, where a task may request one of R shared resources.