22 resultados para contention
em Instituto Politécnico do Porto, Portugal
Resumo:
The use of multicores is becoming widespread inthe field of embedded systems, many of which have real-time requirements. Hence, ensuring that real-time applications meet their timing constraints is a pre-requisite before deploying them on these systems. This necessitates the consideration of the impact of the contention due to shared lowlevel hardware resources like the front-side bus (FSB) on the Worst-CaseExecution Time (WCET) of the tasks. Towards this aim, this paper proposes a method to determine an upper bound on the number of bus requests that tasks executing on a core can generate in a given time interval. We show that our method yields tighter upper bounds in comparison with the state of-the-art. We then apply our method to compute the extra contention delay incurred by tasks, when they are co-scheduled on different cores and access the shared main memory, using a shared bus, access to which is granted using a round-robin arbitration (RR) protocol.
Resumo:
The usage of COTS-based multicores is becoming widespread in the field of embedded systems. Providing realtime guarantees at design-time is a pre-requisite to deploy real-time systems on these multicores. This necessitates the consideration of the impact of the contention due to shared low-level hardware resources on the Worst-Case Execution Time (WCET) of the tasks. As a step towards this aim, this paper first identifies the different factors that make the WCET analysis a challenging problem in a typical COTS-based multicore system. Then, we propose and prove, a mathematically correct method to determine tight upper bounds on the WCET of the tasks, when they are co-scheduled on different cores.
Resumo:
The current industry trend is towards using Commercially available Off-The-Shelf (COTS) based multicores for developing real time embedded systems, as opposed to the usage of custom-made hardware. In typical implementation of such COTS-based multicores, multiple cores access the main memory via a shared bus. This often leads to contention on this shared channel, which results in an increase of the response time of the tasks. Analyzing this increased response time, considering the contention on the shared bus, is challenging on COTS-based systems mainly because bus arbitration protocols are often undocumented and the exact instants at which the shared bus is accessed by tasks are not explicitly controlled by the operating system scheduler; they are instead a result of cache misses. This paper makes three contributions towards analyzing tasks scheduled on COTS-based multicores. Firstly, we describe a method to model the memory access patterns of a task. Secondly, we apply this model to analyze the worst case response time for a set of tasks. Although the required parameters to obtain the request profile can be obtained by static analysis, we provide an alternative method to experimentally obtain them by using performance monitoring counters (PMCs). We also compare our work against an existing approach and show that our approach outperforms it by providing tighter upper-bound on the number of bus requests generated by a task.
Resumo:
It is widely assumed that scheduling real-time tasks becomes more difficult as their deadlines get shorter. With deadlines shorter, however, tasks potentially compete less with each other for processors, and this could produce more contention-free slots at which the number of competing tasks is smaller than or equal to the number of available processors. This paper presents a policy (called CF policy) that utilizes such contention-free slots effectively. This policy can be employed by any work-conserving, preemptive scheduling algorithm, and we show that any algorithm extended with this policy dominates the original algorithm in terms of schedulability. We also present improved schedulability tests for algorithms that employ this policy, based on the observation that interference from tasks is reduced when their executions are postponed to contention-free slots. Finally, using the properties of the CF policy, we derive a counter-intuitive claim that shortening of task deadlines can help improve schedulability of task systems. We present heuristics that effectively reduce task deadlines for better scheduability without performing any exhaustive search.
Resumo:
Contention on the memory bus in COTS based multicore systems is becoming a major determining factor of the execution time of a task. Analyzing this extra execution time is non-trivial because (i) bus arbitration protocols in such systems are often undocumented and (ii) the times when the memory bus is requested to be used are not explicitly controlled by the operating system scheduler; they are instead a result of cache misses. We present a method for finding an upper bound on the extra execution time of a task due to contention on the memory bus in COTS based multicore systems. This method makes no assumptions on the bus arbitration protocol (other than assuming that it is work-conserving).
Resumo:
The foreseen evolution of chip architectures to higher number of, heterogeneous, cores, with non-uniform memory and non-coherent caches, brings renewed attention to the use of Software Transactional Memory (STM) as an alternative to lock-based synchronisation. However, STM relies on the possibility of aborting conflicting transactions to maintain data consistency, which impacts on the responsiveness and timing guarantees required by real-time systems. In these systems, contention delays must be (efficiently) limited so that the response times of tasks executing transactions are upperbounded and task sets can be feasibly scheduled. In this paper we defend the role of the transaction contention manager to reduce the number of transaction retries and to help the real-time scheduler assuring schedulability. For such purpose, the contention management policy should be aware of on-line scheduling information.
Resumo:
Many-core systems based on a Network-on-Chip (NoC) architecture offer various opportunities in terms of performance and computing capabilities, but at the same time they pose many challenges for the deployment of real-time systems, which must fulfill specific timing requirements at runtime. It is therefore essential to identify, at design time, the parameters that have an impact on the execution time of the tasks deployed on these systems and the upper bounds on the other key parameters. The focus of this work is to determine an upper bound on the traversal time of a packet when it is transmitted over the NoC infrastructure. Towards this aim, we first identify and explore some limitations in the existing recursive-calculus-based approaches to compute the Worst-Case Traversal Time (WCTT) of a packet. Then, we extend the existing model by integrating the characteristics of the tasks that generate the packets. For this extended model, we propose an algorithm called Branch and Prune (BP). Our proposed method provides tighter and safe estimates than the existing recursive-calculus-based approaches. Finally, we introduce a more general approach, namely Branch, Prune and Collapse (BPC) which offers a configurable parameter that provides a flexible trade-off between the computational complexity and the tightness of the computed estimate. The recursive-calculus methods and BP present two special cases of BPC when a trade-off parameter is 1 or , respectively. Through simulations, we analyze this trade-off, reason about the implications of certain choices, and also provide some case studies to observe the impact of task parameters on the WCTT estimates.
Resumo:
The last decade has witnessed a major shift towards the deployment of embedded applications on multi-core platforms. However, real-time applications have not been able to fully benefit from this transition, as the computational gains offered by multi-cores are often offset by performance degradation due to shared resources, such as main memory. To efficiently use multi-core platforms for real-time systems, it is hence essential to tightly bound the interference when accessing shared resources. Although there has been much recent work in this area, a remaining key problem is to address the diversity of memory arbiters in the analysis to make it applicable to a wide range of systems. This work handles diverse arbiters by proposing a general framework to compute the maximum interference caused by the shared memory bus and its impact on the execution time of the tasks running on the cores, considering different bus arbiters. Our novel approach clearly demarcates the arbiter-dependent and independent stages in the analysis of these upper bounds. The arbiter-dependent phase takes the arbiter and the task memory-traffic pattern as inputs and produces a model of the availability of the bus to a given task. Then, based on the availability of the bus, the arbiter-independent phase determines the worst-case request-release scenario that maximizes the interference experienced by the tasks due to the contention for the bus. We show that the framework addresses the diversity problem by applying it to a memory bus shared by a fixed-priority arbiter, a time-division multiplexing (TDM) arbiter, and an unspecified work-conserving arbiter using applications from the MediaBench test suite. We also experimentally evaluate the quality of the analysis by comparison with a state-of-the-art TDM analysis approach and consistently showing a considerable reduction in maximum interference.
Resumo:
IEEE International Conference on Communications (IEEE ICC 2015). 8 to 12, Jun, 2015, IEEE ICC 2015 - Communications QoS, Reliability and Modeling, London, United Kingdom.
Resumo:
Os sistemas de tempo real modernos geram, cada vez mais, cargas computacionais pesadas e dinmicas, comeando-se a tornar pouco expectvel que sejam implementados em sistemas uniprocessador. Na verdade, a mudana de sistemas com um nico processador para sistemas multi- processador pode ser vista, tanto no domnio geral, como no de sistemas embebidos, como uma forma eficiente, em termos energticos, de melhorar a performance das aplicaes. Simultaneamente, a proliferao das plataformas multi-processador transformaram a programao paralela num tpico de elevado interesse, levando o paralelismo dinmico a ganhar rapidamente popularidade como um modelo de programao. A ideia, por detrs deste modelo, encorajar os programadores a exporem todas as oportunidades de paralelismo atravs da simples indicao de potenciais regies paralelas dentro das aplicaes. Todas estas anotaes so encaradas pelo sistema unicamente como sugestes, podendo estas serem ignoradas e substitudas, por construtores sequenciais equivalentes, pela prpria linguagem. Assim, o modo como a computao na realidade subdividida, e mapeada nos vrios processadores, da responsabilidade do compilador e do sistema computacional subjacente. Ao retirar este fardo do programador, a complexidade da programao consideravelmente reduzida, o que normalmente se traduz num aumento de produtividade. Todavia, se o mecanismo de escalonamento subjacente no for simples e rpido, de modo a manter o overhead geral em nveis reduzidos, os benefcios da gerao de um paralelismo com uma granularidade to fina sero meramente hipotticos. Nesta perspetiva de escalonamento, os algoritmos que empregam uma poltica de workstealing so cada vez mais populares, com uma eficincia comprovada em termos de tempo, espao e necessidades de comunicao. Contudo, estes algoritmos no contemplam restries temporais, nem outra qualquer forma de atribuio de prioridades s tarefas, o que impossibilita que sejam diretamente aplicados a sistemas de tempo real. Alm disso, so tradicionalmente implementados no runtime da linguagem, criando assim um sistema de escalonamento com dois nveis, onde a previsibilidade, essencial a um sistema de tempo real, no pode ser assegurada. Nesta tese, descrita a forma como a abordagem de work-stealing pode ser resenhada para cumprir os requisitos de tempo real, mantendo, ao mesmo tempo, os seus princpios fundamentais que to bons resultados tm demonstrado. Muito resumidamente, a nica fila de gesto de processos convencional (deque) substituda por uma fila de deques, ordenada de forma crescente por prioridade das tarefas. De seguida, aplicamos por cima o conhecido algoritmo de escalonamento dinmico G-EDF, misturamos as regras de ambos, e assim nasce a nossa proposta: o algoritmo de escalonamento RTWS. Tirando partido da modularidade oferecida pelo escalonador do Linux, o RTWS adicionado como uma nova classe de escalonamento, de forma a avaliar na prtica se o algoritmo proposto vivel, ou seja, se garante a eficincia e escalonabilidade desejadas. Modificar o ncleo do Linux uma tarefa complicada, devido complexidade das suas funes internas e s fortes interdependncias entre os vrios subsistemas. No obstante, um dos objetivos desta tese era ter a certeza que o RTWS mais do que um conceito interessante. Assim, uma parte significativa deste documento dedicada discusso sobre a implementao do RTWS e exposio de situaes problemticas, muitas delas no consideradas em teoria, como o caso do desfasamento entre vrios mecanismo de sincronizao. Os resultados experimentais mostram que o RTWS, em comparao com outro trabalho prtico de escalonamento dinmico de tarefas com restries temporais, reduz significativamente o overhead de escalonamento atravs de um controlo de migraes, e mudanas de contexto, eficiente e escalvel (pelo menos at 8 CPUs), ao mesmo tempo que alcana um bom balanceamento dinmico da carga do sistema, at mesmo de uma forma no custosa. Contudo, durante a avaliao realizada foi detetada uma falha na implementao do RTWS, pela forma como facilmente desiste de roubar trabalho, o que origina perodos de inatividade, no CPU em questo, quando a utilizao geral do sistema baixa. Embora o trabalho realizado se tenha focado em manter o custo de escalonamento baixo e em alcanar boa localidade dos dados, a escalonabilidade do sistema nunca foi negligenciada. Na verdade, o algoritmo de escalonamento proposto provou ser bastante robusto, no falhando qualquer meta temporal nas experincias realizadas. Portanto, podemos afirmar que alguma inverso de prioridades, causada pela sub-poltica de roubo BAS, no compromete os objetivos de escalonabilidade, e at ajuda a reduzir a conteno nas estruturas de dados. Mesmo assim, o RTWS tambm suporta uma sub-poltica de roubo determinstica: PAS. A avaliao experimental, porm, no ajudou a ter uma noo clara do impacto de uma e de outra. No entanto, de uma maneira geral, podemos concluir que o RTWS uma soluo promissora para um escalonamento eficiente de tarefas paralelas com restries temporais.
Resumo:
A recent trend in distributed computer-controlled systems (DCCS) is to interconnect the distributed computing elements by means of multi-point broadcast networks. Since the network medium is shared between a number of network nodes, access contention exists and must be solved by a medium access control (MAC) protocol. Usually, DCCS impose real-time constraints. In essence, by real-time constraints we mean that traffic must be sent and received within a bounded interval, otherwise a timing fault is said to occur. This motivates the use of communication networks with a MAC protocol that guarantees bounded access and response times to message requests. PROFIBUS is a communication network in which the MAC protocol is based on a simplified version of the timed-token protocol. In this paper we address the cycle time properties of the PROFIBUS MAC protocol, since the knowledge of these properties is of paramount importance for guaranteeing the real-time behaviour of a distributed computer-controlled system which is supported by this type of network.
Resumo:
It is generally challenging to determine end-to-end delays of applications for maximizing the aggregate system utility subject to timing constraints. Many practical approaches suggest the use of intermediate deadline of tasks in order to control and upper-bound their end-to-end delays. This paper proposes a unified framework for different time-sensitive, global optimization problems, and solves them in a distributed manner using Lagrangian duality. The framework uses global viewpoints to assign intermediate deadlines, taking resource contention among tasks into consideration. For soft real-time tasks, the proposed framework effectively addresses the deadline assignment problem while maximizing the aggregate quality of service. For hard real-time tasks, we show that existing heuristic solutions to the deadline assignment problem can be incorporated into the proposed framework, enriching their mathematical interpretation.
Resumo:
This paper proposes a global multiprocessor scheduling algorithm for the Linux kernel that combines the global EDF scheduler with a priority-aware work-stealing load balancing scheme, enabling parallel real-time tasks to be executed on more than one processor at a given time instant. We state that some priority inversion may actually be acceptable, provided it helps reduce contention, communication, synchronisation and coordination between parallel threads, while still guaranteeing the expected systems predictability. Experimental results demonstrate the low scheduling overhead of the proposed approach comparatively to an existing real-time deadline-oriented scheduling class for the Linux kernel.
Resumo:
The recent trends of chip architectures with higher number of heterogeneous cores, and non-uniform memory/non-coherent caches, brings renewed attention to the use of Software Transactional Memory (STM) as a fundamental building block for developing parallel applications. Nevertheless, although STM promises to ease concurrent and parallel software development, it relies on the possibility of aborting conflicting transactions to maintain data consistency, which impacts on the responsiveness and timing guarantees required by embedded real-time systems. In these systems, contention delays must be (efficiently) limited so that the response times of tasks executing transactions are upper-bounded and task sets can be feasibly scheduled. In this paper we assess the use of STM in the development of embedded real-time software, defending that the amount of contention can be reduced if read-only transactions access recent consistent data snapshots, progressing in a wait-free manner. We show how the required number of versions of a shared object can be calculated for a set of tasks. We also outline an algorithm to manage conflicts between update transactions that prevents starvation.
Resumo:
The IEEE 802.15.4 standard provides appealing features to simultaneously support real-time and non realtime traffic, but it is only capable of supporting real-time communications from at most seven devices. Additionally, it cannot guarantee delay bounds lower than the superframe duration. Motivated by this problem, in this paper we propose an Explicit Guaranteed time slot Sharing and Allocation scheme (EGSA) for beacon-enabled IEEE 802.15.4 networks. This scheme is capable of providing tighter delay bounds for real-time communications by splitting the Contention Free access Period (CFP) into smaller mini time slots and by means of a new guaranteed bandwidth allocation scheme for a set of devices with periodic messages. At the same the novel bandwidth allocation scheme can maximize the duration of the CFP for non real-time communications. Performance analysis results show that the EGSA scheme works efficiently and outperforms competitor schemes both in terms of guaranteed delay and bandwidth utilization.