978 resultados para Intra-task parallelism
Resumo:
Os sistemas de tempo real modernos geram, cada vez mais, cargas computacionais pesadas e dinâmicas, começando-se a tornar pouco expectável que sejam implementados em sistemas uniprocessador. Na verdade, a mudança de sistemas com um único processador para sistemas multi- processador pode ser vista, tanto no domínio geral, como no de sistemas embebidos, como uma forma eficiente, em termos energéticos, de melhorar a performance das aplicações. Simultaneamente, a proliferação das plataformas multi-processador transformaram a programação paralela num tópico de elevado interesse, levando o paralelismo dinâmico a ganhar rapidamente popularidade como um modelo de programação. A ideia, por detrás deste modelo, é encorajar os programadores a exporem todas as oportunidades de paralelismo através da simples indicação de potenciais regiões paralelas dentro das aplicações. Todas estas anotações são encaradas pelo sistema unicamente como sugestões, podendo estas serem ignoradas e substituídas, por construtores sequenciais equivalentes, pela própria linguagem. Assim, o modo como a computação é na realidade subdividida, e mapeada nos vários processadores, é da responsabilidade do compilador e do sistema computacional subjacente. Ao retirar este fardo do programador, a complexidade da programação é consideravelmente reduzida, o que normalmente se traduz num aumento de produtividade. Todavia, se o mecanismo de escalonamento subjacente não for simples e rápido, de modo a manter o overhead geral em níveis reduzidos, os benefícios da geração de um paralelismo com uma granularidade tão fina serão meramente hipotéticos. Nesta perspetiva de escalonamento, os algoritmos que empregam uma política de workstealing são cada vez mais populares, com uma eficiência comprovada em termos de tempo, espaço e necessidades de comunicação. Contudo, estes algoritmos não contemplam restrições temporais, nem outra qualquer forma de atribuição de prioridades às tarefas, o que impossibilita que sejam diretamente aplicados a sistemas de tempo real. Além disso, são tradicionalmente implementados no runtime da linguagem, criando assim um sistema de escalonamento com dois níveis, onde a previsibilidade, essencial a um sistema de tempo real, não pode ser assegurada. Nesta tese, é descrita a forma como a abordagem de work-stealing pode ser resenhada para cumprir os requisitos de tempo real, mantendo, ao mesmo tempo, os seus princípios fundamentais que tão bons resultados têm demonstrado. Muito resumidamente, a única fila de gestão de processos convencional (deque) é substituída por uma fila de deques, ordenada de forma crescente por prioridade das tarefas. De seguida, aplicamos por cima o conhecido algoritmo de escalonamento dinâmico G-EDF, misturamos as regras de ambos, e assim nasce a nossa proposta: o algoritmo de escalonamento RTWS. Tirando partido da modularidade oferecida pelo escalonador do Linux, o RTWS é adicionado como uma nova classe de escalonamento, de forma a avaliar na prática se o algoritmo proposto é viável, ou seja, se garante a eficiência e escalonabilidade desejadas. Modificar o núcleo do Linux é uma tarefa complicada, devido à complexidade das suas funções internas e às fortes interdependências entre os vários subsistemas. Não obstante, um dos objetivos desta tese era ter a certeza que o RTWS é mais do que um conceito interessante. Assim, uma parte significativa deste documento é dedicada à discussão sobre a implementação do RTWS e à exposição de situações problemáticas, muitas delas não consideradas em teoria, como é o caso do desfasamento entre vários mecanismo de sincronização. Os resultados experimentais mostram que o RTWS, em comparação com outro trabalho prático de escalonamento dinâmico de tarefas com restrições temporais, reduz significativamente o overhead de escalonamento através de um controlo de migrações, e mudanças de contexto, eficiente e escalável (pelo menos até 8 CPUs), ao mesmo tempo que alcança um bom balanceamento dinâmico da carga do sistema, até mesmo de uma forma não custosa. Contudo, durante a avaliação realizada foi detetada uma falha na implementação do RTWS, pela forma como facilmente desiste de roubar trabalho, o que origina períodos de inatividade, no CPU em questão, quando a utilização geral do sistema é baixa. Embora o trabalho realizado se tenha focado em manter o custo de escalonamento baixo e em alcançar boa localidade dos dados, a escalonabilidade do sistema nunca foi negligenciada. Na verdade, o algoritmo de escalonamento proposto provou ser bastante robusto, não falhando qualquer meta temporal nas experiências realizadas. Portanto, podemos afirmar que alguma inversão de prioridades, causada pela sub-política de roubo BAS, não compromete os objetivos de escalonabilidade, e até ajuda a reduzir a contenção nas estruturas de dados. Mesmo assim, o RTWS também suporta uma sub-política de roubo determinística: PAS. A avaliação experimental, porém, não ajudou a ter uma noção clara do impacto de uma e de outra. No entanto, de uma maneira geral, podemos concluir que o RTWS é uma solução promissora para um escalonamento eficiente de tarefas paralelas com restrições temporais.
Resumo:
A large part of power dissipation in a system is generated by I/O devices. Increasingly these devices provide power saving mechanisms, inter alia to enhance battery life. While I/O device scheduling has been studied in the past for realtime systems, the use of energy resources by these scheduling algorithms may be improved. These approaches are crafted considering a very large overhead of device transitions. Technology enhancements have allowed the hardware vendors to reduce the device transition overhead and energy consumption. We propose an intra-task device scheduling algorithm for real time systems that allows to shut-down devices while ensuring system schedulability. Our results show an energy gain of up to 90% when compared to the techniques proposed in the state-of-the-art.
Resumo:
An ever increasing need for extra functionality in a single embedded system demands for extra Input/Output (I/O) devices, which are usually connected externally and are expensive in terms of energy consumption. To reduce their energy consumption, these devices are equipped with power saving mechanisms. While I/O device scheduling for real-time (RT) systems with such power saving features has been studied in the past, the use of energy resources by these scheduling algorithms may be improved. Technology enhancements in the semiconductor industry have allowed the hardware vendors to reduce the device transition and energy overheads. The decrease in overhead of sleep transitions has opened new opportunities to further reduce the device energy consumption. In this research effort, we propose an intra-task device scheduling algorithm for real-time systems that wakes up a device on demand and reduces its active time while ensuring system schedulability. This intra-task device scheduling algorithm is extended for devices with multiple sleep states to further minimise the overall device energy consumption of the system. The proposed algorithms have less complexity when compared to the conservative inter-task device scheduling algorithms. The system model used relaxes some of the assumptions commonly made in the state-of-the-art that restrict their practical relevance. Apart from the aforementioned advantages, the proposed algorithms are shown to demonstrate the substantial energy savings.
Resumo:
This work presents synopsis of efficient strategies used in power managements for achieving the most economical power and energy consumption in multicore systems, FPGA and NoC Platforms. In this work, a practical approach was taken, in an effort to validate the significance of the proposed Adaptive Power Management Algorithm (APMA), proposed for system developed, for this thesis project. This system comprise arithmetic and logic unit, up and down counters, adder, state machine and multiplexer. The essence of carrying this project firstly, is to develop a system that will be used for this power management project. Secondly, to perform area and power synopsis of the system on these various scalable technology platforms, UMC 90nm nanotechnology 1.2v, UMC 90nm nanotechnology 1.32v and UMC 0.18 μmNanotechnology 1.80v, in order to examine the difference in area and power consumption of the system on the platforms. Thirdly, to explore various strategies that can be used to reducing system’s power consumption and to propose an adaptive power management algorithm that can be used to reduce the power consumption of the system. The strategies introduced in this work comprise Dynamic Voltage Frequency Scaling (DVFS) and task parallelism. After the system development, it was run on FPGA board, basically NoC Platforms and on these various technology platforms UMC 90nm nanotechnology1.2v, UMC 90nm nanotechnology 1.32v and UMC180 nm nanotechnology 1.80v, the system synthesis was successfully accomplished, the simulated result analysis shows that the system meets all functional requirements, the power consumption and the area utilization were recorded and analyzed in chapter 7 of this work. This work extensively reviewed various strategies for managing power consumption which were quantitative research works by many researchers and companies, it's a mixture of study analysis and experimented lab works, it condensed and presents the whole basic concepts of power management strategy from quality technical papers.
Resumo:
Face à estagnação da tecnologia uniprocessador registada na passada década, aos principais fabricantes de microprocessadores encontraram na tecnologia multi-core a resposta `as crescentes necessidades de processamento do mercado. Durante anos, os desenvolvedores de software viram as suas aplicações acompanhar os ganhos de performance conferidos por cada nova geração de processadores sequenciais, mas `a medida que a capacidade de processamento escala em função do número de processadores, a computação sequencial tem de ser decomposta em várias partes concorrentes que possam executar em paralelo, para que possam utilizar as unidades de processamento adicionais e completar mais rapidamente. A programação paralela implica um paradigma completamente distinto da programação sequencial. Ao contrário dos computadores sequenciais tipificados no modelo de Von Neumann, a heterogeneidade de arquiteturas paralelas requer modelos de programação paralela que abstraiam os programadores dos detalhes da arquitectura e simplifiquem o desenvolvimento de aplicações concorrentes. Os modelos de programação paralela mais populares incitam os programadores a identificar instruções concorrentes na sua lógica de programação, e a especificá-las sob a forma de tarefas que possam ser atribuídas a processadores distintos para executarem em simultâneo. Estas tarefas são tipicamente lançadas durante a execução, e atribuídas aos processadores pelo motor de execução subjacente. Como os requisitos de processamento costumam ser variáveis, e não são conhecidos a priori, o mapeamento de tarefas para processadores tem de ser determinado dinamicamente, em resposta a alterações imprevisíveis dos requisitos de execução. `A medida que o volume da computação cresce, torna-se cada vez menos viável garantir as suas restrições temporais em plataformas uniprocessador. Enquanto os sistemas de tempo real se começam a adaptar ao paradigma de computação paralela, há uma crescente aposta em integrar execuções de tempo real com aplicações interativas no mesmo hardware, num mundo em que a tecnologia se torna cada vez mais pequena, leve, ubíqua, e portável. Esta integração requer soluções de escalonamento que simultaneamente garantam os requisitos temporais das tarefas de tempo real e mantenham um nível aceitável de QoS para as restantes execuções. Para tal, torna-se imperativo que as aplicações de tempo real paralelizem, de forma a minimizar os seus tempos de resposta e maximizar a utilização dos recursos de processamento. Isto introduz uma nova dimensão ao problema do escalonamento, que tem de responder de forma correcta a novos requisitos de execução imprevisíveis e rapidamente conjeturar o mapeamento de tarefas que melhor beneficie os critérios de performance do sistema. A técnica de escalonamento baseado em servidores permite reservar uma fração da capacidade de processamento para a execução de tarefas de tempo real, e assegurar que os efeitos de latência na sua execução não afectam as reservas estipuladas para outras execuções. No caso de tarefas escalonadas pelo tempo de execução máximo, ou tarefas com tempos de execução variáveis, torna-se provável que a largura de banda estipulada não seja consumida por completo. Para melhorar a utilização do sistema, os algoritmos de partilha de largura de banda (capacity-sharing) doam a capacidade não utilizada para a execução de outras tarefas, mantendo as garantias de isolamento entre servidores. Com eficiência comprovada em termos de espaço, tempo, e comunicação, o mecanismo de work-stealing tem vindo a ganhar popularidade como metodologia para o escalonamento de tarefas com paralelismo dinâmico e irregular. O algoritmo p-CSWS combina escalonamento baseado em servidores com capacity-sharing e work-stealing para cobrir as necessidades de escalonamento dos sistemas abertos de tempo real. Enquanto o escalonamento em servidores permite partilhar os recursos de processamento sem interferências a nível dos atrasos, uma nova política de work-stealing que opera sobre o mecanismo de capacity-sharing aplica uma exploração de paralelismo que melhora os tempos de resposta das aplicações e melhora a utilização do sistema. Esta tese propõe uma implementação do algoritmo p-CSWS para o Linux. Em concordância com a estrutura modular do escalonador do Linux, ´e definida uma nova classe de escalonamento que visa avaliar a aplicabilidade da heurística p-CSWS em circunstâncias reais. Ultrapassados os obstáculos intrínsecos `a programação da kernel do Linux, os extensos testes experimentais provam que o p-CSWS ´e mais do que um conceito teórico atrativo, e que a exploração heurística de paralelismo proposta pelo algoritmo beneficia os tempos de resposta das aplicações de tempo real, bem como a performance e eficiência da plataforma multiprocessador.
Resumo:
A large part of power dissipation in a system is generated by I/O devices. Increasingly these devices provide power saving mechanisms to inter alia enhance battery life. While I/O device scheduling has been studied in the past for realtime systems, the use of energy resources by these scheduling algorithms may be improved. These approaches are crafted considering a huge overhead of device transition. The technology enhancement has allowed the hardware vendors to reduce the device transition overhead and energy consumption. We propose an intra-task device scheduling algorithm for real time systems that allows to shut-down devices while ensuring the system schedulability. Our results show an energy gain of up to 90% in the best case when compared to the state-of-the-art.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-03
Resumo:
Healthy young adults demonstrate a group-level, systematic preference for stimuli presented in the left side of space relative to the right (‘pseudoneglect’) (Bowers & Heilman, 1980). This results in an overestimation of features such as size, brightness, numerosity and spatial frequency in the left hemispace, probably as a result of right cerebral hemisphere dominance for visuospatial attention. This spatial attention asymmetry is reduced in the healthy older population, and can be shifted entirely into right hemispace under certain conditions. Although this rightward shift has been consistently documented in behavioural experiments, there is very little neuroimaging evidence to explain this effect at a neuroanatomical level. In this thesis, I used behavioural methodology and electroencephalography (EEG) to map spatial attention asymmetries in young and older adults. I then use transcranial direct current stimulation (tDCS) to modulate these spatial biases, with the aim of assessing age-related differences in response to tDCS. In the first of three experiments presented in this thesis, I report in Chapter Two that five different spatial attention tasks provide consistent intra-task measures of spatial bias in young adults across two testing days. There were, however, no inter-task correlations between the five tasks, indicating that pseudoneglect is at least partially driven by task-dependent patterns of neural activity. In Chapter Three, anodal tDCS was applied separately to the left (P5) and right (P6) posterior parietal cortex (PPC) in young and older adults, with an aim to improve the detection of stimuli appearing in the contralateral visual field. There were no age differences in response to tDCS, but there were significant differences depending on baseline performance. Relative to a sham tDCS protocol, tDCS applied to the right PPC resulted in maintained visual detection across both visual fields in adults who were good at the task at baseline. In contrast, left PPC tDCS resulted in reduced detection sensitivity across both visual fields in poor performers. Finally, in Chapter Four, I report a right-hemisphere lateralisation of EEG activity in young adults that was present for long (but not short) landmark task lines. In contrast, older adults demonstrated no lateralised activity for either line length, thus providing novel evidence of an age-related reduction of hemispheric asymmetry in older adults. The results of this thesis provide evidence of a highly complex set of factors that underlie spatial attention asymmetries in healthy young and older adults.
Resumo:
1. The co-ordination between respiratory and postural functions of the diaphragm was investigated during repetitive upper Limb movement. It was hypothesised that diaphragm activity would occur either tonically or phasically in association with the forces from each movement and that this activity would combine with phasic respiratory activity. 2. Movements of the upper limb and ribcage were measured while standing subjects performed repetitive upper limb movements 'as fast as possible'. Electromyographic (EMG) recordings of the costal diaphragm were made using intramuscular electrodes in four subjects. Surface electrodes were placed over the deltoid and erector spinae muscles. 3. In contrast to standing at rest, diaphragm activity was present throughout expiration at 78 +/- 17% (mean +/- S.D.) of its peak inspiratory magnitude during repeated upper limb movement. 4. Bursts of deltoid and erector spinae EMG activity occurred at the Limb movement frequency (similar to 2.9 Hz). Although the majority of diaphragm EMG power was at the respiratory frequency (similar to 0.4 Hz), a peak was also present at the movement frequency. This finding was corroborated by averaged EMG activity triggered from upper limb movement. In addition, diaphragm EMG activity was coherent with ribcage motion at the respiratory frequency and with upper limb movement at the movement frequency. 5. The diaphragm response was similar when movement was performed while sitting. In addition, when subjects moved with increasing frequency the peak upper limb acceleration correlated with diaphragm EMG amplitude. These findings support the argument that diaphragm contraction is related to trunk control. 6. The results indicate that activity of human phrenic motoneurones is organised such that it contributes to both posture and respiration during a task which repetitively challenges trunk posture.
Resumo:
Dynamic parallel scheduling using work-stealing has gained popularity in academia and industry for its good performance, ease of implementation and theoretical bounds on space and time. Cores treat their own double-ended queues (deques) as a stack, pushing and popping threads from the bottom, but treat the deque of another randomly selected busy core as a queue, stealing threads only from the top, whenever they are idle. However, this standard approach cannot be directly applied to real-time systems, where the importance of parallelising tasks is increasing due to the limitations of multiprocessor scheduling theory regarding parallelism. Using one deque per core is obviously a source of priority inversion since high priority tasks may eventually be enqueued after lower priority tasks, possibly leading to deadline misses as in this case the lower priority tasks are the candidates when a stealing operation occurs. Our proposal is to replace the single non-priority deque of work-stealing with ordered per-processor priority deques of ready threads. The scheduling algorithm starts with a single deque per-core, but unlike traditional work-stealing, the total number of deques in the system may now exceed the number of processors. Instead of stealing randomly, cores steal from the highest priority deque.
Resumo:
Consider the problem of scheduling a set of implicit-deadline sporadic tasks to meet all deadlines on a two-type heterogeneous multiprocessor platform. Each processor is either of type-1 or type-2 with each task having different execution time on each processor type. Jobs can migrate between processors of same type (referred to as intra-type migration) but cannot migrate between processors of different types. We present a new scheduling algorithm namely, LP-Relax(THR) which offers a guarantee that if a task set can be scheduled to meet deadlines by an optimal task assignment scheme that allows intra-type migration then LP-Relax(THR) meets deadlines as well with intra-type migration if given processors 1/THR as fast (referred to as speed competitive ratio) where THR <= 2/3.
Resumo:
Developing an efficient server-based real-time scheduling solution that supports dynamic task-level parallelism is now relevant to even the desktop and embedded domains and no longer only to the high performance computing market niche. This paper proposes a novel approach that combines the constantbandwidth server abstraction with a work-stealing load balancing scheme which, while ensuring isolation among tasks, enables a task to be executed on more than one processor at a given time instant.
Resumo:
Consider the problem of assigning implicit-deadline sporadic tasks on a heterogeneous multiprocessor platform comprising two different types of processors—such a platform is referred to as two-type platform. We present two low degree polynomial time-complexity algorithms, SA and SA-P, each providing the following guarantee. For a given two-type platform and a task set, if there exists a task assignment such that tasks can be scheduled to meet deadlines by allowing them to migrate only between processors of the same type (intra-migrative), then (i) using SA, it is guaranteed to find such an assignment where the same restriction on task migration applies but given a platform in which processors are 1+α/2 times faster and (ii) SA-P succeeds in finding a task assignment where tasks are not allowed to migrate between processors (non-migrative) but given a platform in which processors are 1+α times faster. The parameter 0<α≤1 is a property of the task set; it is the maximum of all the task utilizations that are no greater than 1. We evaluate average-case performance of both the algorithms by generating task sets randomly and measuring how much faster processors the algorithms need (which is upper bounded by 1+α/2 for SA and 1+α for SA-P) in order to output a feasible task assignment (intra-migrative for SA and non-migrative for SA-P). In our evaluations, for the vast majority of task sets, these algorithms require significantly smaller processor speedup than indicated by their theoretical bounds. Finally, we consider a special case where no task utilization in the given task set can exceed one and for this case, we (re-)prove the performance guarantees of SA and SA-P. We show, for both of the algorithms, that changing the adversary from intra-migrative to a more powerful one, namely fully-migrative, in which tasks can migrate between processors of any type, does not deteriorate the performance guarantees. For this special case, we compare the average-case performance of SA-P and a state-of-the-art algorithm by generating task sets randomly. In our evaluations, SA-P outperforms the state-of-the-art by requiring much smaller processor speedup and by running orders of magnitude faster.
Resumo:
Consider the problem of assigning implicit-deadline sporadic tasks on a heterogeneous multiprocessor platform comprising a constant number (denoted by t) of distinct types of processors—such a platform is referred to as a t-type platform. We present two algorithms, LPGIM and LPGNM, each providing the following guarantee. For a given t-type platform and a task set, if there exists a task assignment such that tasks can be scheduled to meet their deadlines by allowing them to migrate only between processors of the same type (intra-migrative), then: (i) LPGIM succeeds in finding such an assignment where the same restriction on task migration applies (intra-migrative) but given a platform in which only one processor of each type is 1 + α × t-1/t times faster and (ii) LPGNM succeeds in finding a task assignment where tasks are not allowed to migrate between processors (non-migrative) but given a platform in which every processor is 1 + α times faster. The parameter α is a property of the task set; it is the maximum of all the task utilizations that are no greater than one. To the best of our knowledge, for t-type heterogeneous multiprocessors: (i) for the problem of intra-migrative task assignment, no previous algorithm exists with a proven bound and hence our algorithm, LPGIM, is the first of its kind and (ii) for the problem of non-migrative task assignment, our algorithm, LPGNM, has superior performance compared to state-of-the-art.
Resumo:
Intra-amygdala infusion of the non-N-methyl-D-aspartate (NMDA) receptor antagonist 6-cyano-7-nitroquinoxaline-2,3-dione (CNQX) prior to testing impairs inhibitory avoidance retention test performance. Increased training attenuates the impairing effects of amygdala lesions and intra-amygdala infusions of CNQX. The objective of the present study was to determine the effects of additional training on the impairing effects of intra-amygdala CNQX on expression of the inhibitory avoidance task. Adult female Wistar rats bilaterally implanted with cannulae into the border between the central and the basolateral nuclei of the amygdala were submitted to a single session or to three training sessions (0.2 mA, 24-h interval between sessions) in a step-down inhibitory avoidance task. A retention test session was held 48 h after the last training. Ten minutes prior to the retention test session, the animals received a 0.5-µl infusion of CNQX (0.5 µg) or its vehicle (25% dimethylsulfoxide in saline). The CNQX infusion impaired, but did not block, retention test performance in animals submitted to a single training session. Additional training prevented the impairing effect of CNQX. The results suggest that amygdaloid non-NMDA receptors may not be critical for memory expression in animals given increased training.