Biblioteca Digital

58 resultados para On-Chip Multiprocessor (OCM)

Application mapping in NoC-Based many-cores

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many-core platforms based on Network-on-Chip (NoC [Benini and De Micheli 2002]) present an emerging technology in the real-time embedded domain. Although the idea to group the applications previously executed on separated single-core devices, and accommodate them on an individual many-core chip offers various options for power savings, cost reductions and contributes to the overall system flexibility, its implementation is a non-trivial task. In this paper we address the issue of application mapping onto a NoCbased many-core platform when considering fundamentals and trends of current many-core operating systems, specifically, we elaborate on a limited migrative application model encompassing a message-passing paradigm as a communication primitive. As the main contribution, we formulate the problem of real-time application mapping, and propose a three-stage process to efficiently solve it. Through analysis it is assured that derived solutions guarantee the fulfilment of posed time constraints regarding worst-case communication latencies, and at the same time provide an environment to perform load balancing for e.g. thermal, energy, fault tolerance or performance reasons.We also propose several constraints regarding the topological structure of the application mapping, as well as the inter- and intra-application communication patterns, which efficiently solve the issues of pessimism and/or intractability when performing the analysis.

Practical aspects of slot-based task- splitting dispatching in its schedulability analysis

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Consider the problem of scheduling a set of sporadic tasks on a multiprocessor system to meet deadlines using a tasksplitting scheduling algorithm. Task-splitting (also called semipartitioning) scheduling algorithms assign most tasks to just one processor but a few tasks are assigned to two or more processors, and they are dispatched in a way that ensures that a task never executes on two or more processors simultaneously. A certain type of task-splitting algorithms, called slot-based task-splitting, is of particular interest because of its ability to schedule tasks at high processor utilizations. We present a new schedulability analysis for slot-based task-splitting scheduling algorithms that takes the overhead into account and also a new task assignment algorithm.

A tighter analysis of the worst-case endto- end communication delay in massive multicores

Relevância:

100.00% 100.00%

Publicador:

Resumo:

"Many-core” systems based on the Network-on- Chip (NoC) architecture have brought into the fore-front various opportunities and challenges for the deployment of real-time systems. Such real-time systems need timing guarantees to be fulfilled. Therefore, calculating upper-bounds on the end-to-end communication delay between system components is of primary interest. In this work, we identify the limitations of an existing approach proposed by [1] and propose different techniques to overcome these limitations.

An embedded 1149.4 extension to support mixed-signal debugging

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Debugging electronic circuits is traditionally done with bench equipment directly connected to the circuit under debug. In the digital domain, the difficulties associated with the direct physical access to circuit nodes led to the inclusion of resources providing support to that activity, first at the printed circuit level, and then at the integrated circuit level. The experience acquired with those solutions led to the emergence of dedicated infrastructures for debugging cores at the system-on-chip level. However, all these developments had a small impact in the analog and mixed-signal domain, where debugging still depends, to a large extent, on direct physical access to circuit nodes. As a consequence, when analog and mixed-signal circuits are integrated as cores inside a system-on-chip, the difficulties associated with debugging increase, which cause the time-to-market and the prototype verification costs to also increase. The present work considers the IEEE1149.4 infrastructure as a means to support the debugging of mixed-signal circuits, namely to access the circuit nodes and also an embedded debug mechanism named mixed-signal condition detector, necessary for watch-/breakpoints and real-time analysis operations. One of the main advantages associated with the proposed solution is the seamless migration to the system-on-chip level, as the access is done through electronic means, thus easing debugging operations at different hierarchical levels.

Real time fault injection using a modified debugging infrastructure

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dependability is a critical factor in computer systems, requiring high quality validation & verification procedures in the development stage. At the same time, digital devices are getting smaller and access to their internal signals and registers is increasingly complex, requiring innovative debugging methodologies. To address this issue, most recent microprocessors include an on-chip debug (OCD) infrastructure to facilitate common debugging operations. This paper proposes an enhanced OCD infrastructure with the objective of supporting the verification of fault-tolerant mechanisms through fault injection campaigns. This upgraded on-chip debug and fault injection (OCD-FI) infrastructure provides an efficient fault injection mechanism with improved capabilities and dynamic behavior. Preliminary results show that this solution provides flexibility in terms of fault triggering and allows high speed real-time fault injection in memory elements

Real time fault injection using enhanced OCD -- a performance analysis

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fault injection is frequently used for the verification and validation of dependable systems. When targeting real time microprocessor based systems the process becomes significantly more complex. This paper proposes two complementary solutions to improve real time fault injection campaign execution, both in terms of performance and capabilities. The methodology is based on the use of the on-chip debug mechanisms present in modern electronic devices. The main objective is the injection of faults in microprocessor memory elements with minimum delay and intrusiveness. Different configurations were implemented and compared in terms of performance gain and logic overhead.

A modified debugging infrastructure to assist real time fault injection campaigns

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fault injection is frequently used for the verification and validation of the fault tolerant features of microprocessors. This paper proposes the modification of a common on-chip debugging (OCD) infrastructure to add fault injection capabilities and improve performance. The proposed solution imposes a very low logic overhead and provides a flexible and efficient mechanism for the execution of fault injection campaigns, being applicable to different target system architectures.

Provably Good Task Assignment for Two-Type Heterogeneous Multiprocessors Using Cutting Planes

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Consider scheduling of real-time tasks on a multiprocessor where migration is forbidden. Specifically, consider the problem of determining a task-to-processor assignment for a given collection of implicit-deadline sporadic tasks upon a multiprocessor platform in which there are two distinct types of processors. For this problem, we propose a new algorithm, LPC (task assignment based on solving a Linear Program with Cutting planes). The algorithm offers the following guarantee: for a given task set and a platform, if there exists a feasible task-to-processor assignment, then LPC succeeds in finding such a feasible task-to-processor assignment as well but on a platform in which each processor is 1.5 × faster and has three additional processors. For systems with a large number of processors, LPC has a better approximation ratio than state-of-the-art algorithms. To the best of our knowledge, this is the first work that develops a provably good real-time task assignment algorithm using cutting planes.

NoC contention analysis using a branch-and-prune algorithm

Relevância:

100.00% 100.00%

Publicador:

Resumo:

“Many-core” systems based on a Network-on-Chip (NoC) architecture offer various opportunities in terms of performance and computing capabilities, but at the same time they pose many challenges for the deployment of real-time systems, which must fulfill specific timing requirements at runtime. It is therefore essential to identify, at design time, the parameters that have an impact on the execution time of the tasks deployed on these systems and the upper bounds on the other key parameters. The focus of this work is to determine an upper bound on the traversal time of a packet when it is transmitted over the NoC infrastructure. Towards this aim, we first identify and explore some limitations in the existing recursive-calculus-based approaches to compute the Worst-Case Traversal Time (WCTT) of a packet. Then, we extend the existing model by integrating the characteristics of the tasks that generate the packets. For this extended model, we propose an algorithm called “Branch and Prune” (BP). Our proposed method provides tighter and safe estimates than the existing recursive-calculus-based approaches. Finally, we introduce a more general approach, namely “Branch, Prune and Collapse” (BPC) which offers a configurable parameter that provides a flexible trade-off between the computational complexity and the tightness of the computed estimate. The recursive-calculus methods and BP present two special cases of BPC when a trade-off parameter is 1 or ∞, respectively. Through simulations, we analyze this trade-off, reason about the implications of certain choices, and also provide some case studies to observe the impact of task parameters on the WCTT estimates.

Laxity dynamics and LLF schedulability analysis on multiprocessor platforms

Relevância:

40.00% 40.00%

Publicador:

Resumo:

LLF (Least Laxity First) scheduling, which assigns a higher priority to a task with a smaller laxity, has been known as an optimal preemptive scheduling algorithm on a single processor platform. However, little work has been made to illuminate its characteristics upon multiprocessor platforms. In this paper, we identify the dynamics of laxity from the system’s viewpoint and translate the dynamics into LLF multiprocessor schedulability analysis. More specifically, we first characterize laxity properties under LLF scheduling, focusing on laxity dynamics associated with a deadline miss. These laxity dynamics describe a lower bound, which leads to the deadline miss, on the number of tasks of certain laxity values at certain time instants. This lower bound is significant because it represents invariants for highly dynamic system parameters (laxity values). Since the laxity of a task is dependent of the amount of interference of higher-priority tasks, we can then derive a set of conditions to check whether a given task system can go into the laxity dynamics towards a deadline miss. This way, to the author’s best knowledge, we propose the first LLF multiprocessor schedulability test based on its own laxity properties. We also develop an improved schedulability test that exploits slack values. We mathematically prove that the proposed LLF tests dominate the state-of-the-art EDZL tests. We also present simulation results to evaluate schedulability performance of both the original and improved LLF tests in a quantitative manner.

Makespan computation for GPU threads running on a single streaming multiprocessor

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Graphics processors were originally developed for rendering graphics but have recently evolved towards being an architecture for general-purpose computations. They are also expected to become important parts of embedded systems hardware -- not just for graphics. However, this necessitates the development of appropriate timing analysis techniques which would be required because techniques developed for CPU scheduling are not applicable. The reason is that we are not interested in how long it takes for any given GPU thread to complete, but rather how long it takes for all of them to complete. We therefore develop a simple method for finding an upper bound on the makespan of a group of GPU threads executing the same program and competing for the resources of a single streaming multiprocessor (whose architecture is based on NVIDIA Fermi, with some simplifying assunptions). We then build upon this method to formulate the derivation of the exact worst-case makespan (and corresponding schedule) as an optimization problem. Addressing the issue of tractability, we also present a technique for efficiently computing a safe estimate of the worstcase makespan with minimal pessimism, which may be used when finding an exact value would take too long.

Preemption-light multiprocessor scheduling of sporadic tasks with high utilisation bound

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Known algorithms capable of scheduling implicit-deadline sporadic tasks over identical processors at up to 100% utilisation invariably involve numerous preemptions and migrations. To the challenge of devising a scheduling scheme with as few preemptions and migrations as possible, for a given guaranteed utilisation bound, we respond with the algorithm NPS-F. It is configurable with a parameter, trading off guaranteed schedulable utilisation (up to 100%) vs preemptions. For any possible configuration, NPS-F introduces fewer preemptions than any other known algorithm matching its utilisation bound. A clustered variant of the algorithm, for systems made of multicore chips, eliminates (costly) off-chip task migrations, by dividing processors into disjoint clusters, formed by cores on the same chip (with the cluster size being a parameter). Clusters are independently scheduled (each, using non-clustered NPS-F). The utilisation bound is only moderately affected. We also formulate an important extension (applicable to both clustered and non-clustered NPS-F) which optimises the supply of processing time to executing tasks and makes it more granular. This reduces processing capacity requirements for schedulability without increasing preemptions.

Provably good scheduling of sporadic tasks with resource sharing on a two-type heterogeneous multiprocessor platform

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Consider the problem of scheduling a set of implicit-deadline sporadic tasks to meet all deadlines on a two-type heterogeneous multiprocessor platform where a task may request at most one of |R| shared resources. There are m1 processors of type-1 and m2 processors of type-2. Tasks may migrate only when requesting or releasing resources. We present a new algorithm, FF-3C-vpr, which offers a guarantee that if a task set is schedulable to meet deadlines by an optimal task assignment scheme that only allows tasks to migrate when requesting or releasing a resource, then FF-3Cvpr also meets deadlines if given processors 4+6*ceil(|R|/min(m1,m2)) times as fast. As far as we know, it is the first result for resource sharing on heterogeneous platforms with provable performance.

Intra-type migrative scheduling of implicit-deadline sporadic tasks on two- type heterogeneous multiprocessor

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Consider the problem of scheduling a set of implicit-deadline sporadic tasks to meet all deadlines on a two-type heterogeneous multiprocessor platform. Each processor is either of type-1 or type-2 with each task having different execution time on each processor type. Jobs can migrate between processors of same type (referred to as intra-type migration) but cannot migrate between processors of different types. We present a new scheduling algorithm namely, LP-Relax(THR) which offers a guarantee that if a task set can be scheduled to meet deadlines by an optimal task assignment scheme that allows intra-type migration then LP-Relax(THR) meets deadlines as well with intra-type migration if given processors 1/THR as fast (referred to as speed competitive ratio) where THR <= 2/3.

Partitioned scheduling of multimode systems on multiprocessor platforms: when to do the mode transition?

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Systems composed of distinct operational modes are a common necessity for embedded applications with strict timing requirements. With the emergence of multi-core platforms protocols to handle these systems are required in order to provide this basic functionality.In this work a description on the problems of creating an effective mode-transition protocol are presented and it is proven that in some cases previous single-core protocols can not be extended to handle the mode-transition in multi-core.

«
1
2
3
4
»