936 resultados para Thread safe parallel run-time


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents some fundamental properties of independent and-parallelism and extends its applicability by enlarging the class of goals eligible for parallel execution. A simple model of (independent) and-parallel execution is proposed and issues of correctness and efficiency discussed in the light of this model. Two conditions, "strict" and "non-strict" independence, are defined and then proved sufficient to ensure correctness and efñciency of parallel execution: if goals which meet these conditions are executed in parallel the solutions obtained are the same as those produced by standard sequential execution. Also, in absence of failure, the parallel proof procedure does not genérate any additional work (with respect to standard SLD-resolution) while the actual execution time is reduced. Finally, in case of failure of any of the goals no slow down will occur. For strict independence the results are shown to hold independently of whether the parallel goals execute in the same environment or in sepárate environments. In addition, a formal basis is given for the automatic compile-time generation of independent and-parallelism: compile-time conditions to efficiently check goal independence at run-time are proposed and proved sufficient. Also, rules are given for constructing simpler conditions if information regarding the binding context of the goals to be executed in parallel is available to the compiler.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a technique to estimate accurate speedups for parallel logic programs with relative independence from characteristics of a given implementation or underlying parallel hardware. The proposed technique is based on gathering accurate data describing one execution at run-time, which is fed to a simulator. Alternative schedulings are then simulated and estimates computed for the corresponding speedups. A tool implementing the aforementioned techniques is presented, and its predictions are compared to the performance of real systems, showing good correlation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There has been significant interest in parallel execution models for logic programs which exploit Independent And-Parallelism (IAP). In these models, it is necessary to determine which goals are independent and therefore eligible for parallel execution and which goals have to wait for which others during execution. Although this can be done at run-time, it can imply a very heavy overhead. In this paper, we present three algorithms for automatic compiletime parallelization of logic programs using IAP. This is done by converting a clause into a graph-based computational form and then transforming this graph into linear expressions based on &-Prolog, a language for IAP. We also present an algorithm which, given a clause, determines if there is any loss of parallelism due to linearization, for the case in which only unconditional parallelism is desired. Finally, the performance of these annotation algorithms is discussed for some benchmark programs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Although the sequential execution speed of logic programs has been greatly improved by the concepts introduced in the Warren Abstract Machine (WAM), parallel execution represents the only way to increase this speed beyond the natural limits of sequential systems. However, most proposed parallel logic programming execution models lack the performance optimizations and storage efficiency of sequential systems. This paper presents a parallel abstract machine which is an extension of the WAM and is thus capable of supporting ANDParallelism without giving up the optimizations present in sequential implementations. A suitable instruction set, which can be used as a target by a variety of logic programming languages, is also included. Special instructions are provided to support a generalized version of "Restricted AND-Parallelism" (RAP), a technique which reduces the overhead traditionally associated with the run-time management of variable binding conflicts to a series of simple run-time checks, which select one out of a series of compiled execution graphs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Goal-level Independent and-parallelism (IAP) is exploited by scheduling for simultaneous execution two or more goals which will not interfere with each other at run time. This can be done safely even if such goals can produce multiple answers. The most successful IAP implementations to date have used recomputation of answers and sequentially ordered backtracking. While in principle simplifying the implementation, recomputation can be very inefficient if the granularity of the parallel goals is large enough and they produce several answers, while sequentially ordered backtracking limits parallelism. And, despite the expected simplification, the implementation of the classic schemes has proved to involve complex engineering, with the consequent difficulty for system maintenance and expansion, and still frequently run into the well-known trapped goal and garbage slot problems. This work presents ideas about an alternative parallel backtracking model for IAP and a simulation studio. The model features parallel out-of-order backtracking and relies on answer memoization to reuse and combine answers. Whenever a parallel goal backtracks, its siblings also perform backtracking, but after storing the bindings generated by previous answers. The bindings are then reinstalled when combining answers. In order not to unnecessarily penalize forward execution, non-speculative and-parallel goals which have not been executed yet take precedence over sibling goals which could be backtracked over. Using a simulator, we show that this approach can bring significant performance advantages over classical approaches.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This research is motivated by a practical application observed at a printed circuit board (PCB) manufacturing facility. After assembly, the PCBs (or jobs) are tested in environmental stress screening (ESS) chambers (or batch processing machines) to detect early failures. Several PCBs can be simultaneously tested as long as the total size of all the PCBs in the batch does not violate the chamber capacity. PCBs from different production lines arrive dynamically to a queue in front of a set of identical ESS chambers, where they are grouped into batches for testing. Each line delivers PCBs that vary in size and require different testing (or processing) times. Once a batch is formed, its processing time is the longest processing time among the PCBs in the batch, and its ready time is given by the PCB arriving last to the batch. ESS chambers are expensive and a bottleneck. Consequently, its makespan has to be minimized. ^ A mixed-integer formulation is proposed for the problem under study and compared to a formulation recently published. The proposed formulation is better in terms of the number of decision variables, linear constraints and run time. A procedure to compute the lower bound is proposed. For sparse problems (i.e. when job ready times are dispersed widely), the lower bounds are close to optimum. ^ The problem under study is NP-hard. Consequently, five heuristics, two metaheuristics (i.e. simulated annealing (SA) and greedy randomized adaptive search procedure (GRASP)), and a decomposition approach (i.e. column generation) are proposed—especially to solve problem instances which require prohibitively long run times when a commercial solver is used. Extensive experimental study was conducted to evaluate the different solution approaches based on the solution quality and run time. ^ The decomposition approach improved the lower bounds (or linear relaxation solution) of the mixed-integer formulation. At least one of the proposed heuristic outperforms the Modified Delay heuristic from the literature. For sparse problems, almost all the heuristics report a solution close to optimum. GRASP outperforms SA at a higher computational cost. The proposed approaches are viable to implement as the run time is very short. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As the complexity of parallel applications increase, the performance limitations resulting from computational load imbalance become dominant. Mapping the problem space to the processors in a parallel machine in a manner that balances the workload of each processors will typically reduce the run-time. In many cases the computation time required for a given calculation cannot be predetermined even at run-time and so static partition of the problem returns poor performance. For problems in which the computational load across the discretisation is dynamic and inhomogeneous, for example multi-physics problems involving fluid and solid mechanics with phase changes, the workload for a static subdomain will change over the course of a computation and cannot be estimated beforehand. For such applications the mapping of loads to process is required to change dynamically, at run-time in order to maintain reasonable efficiency. The issue of dynamic load balancing are examined in the context of PHYSICA, a three dimensional unstructured mesh multi-physics continuum mechanics computational modelling code.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A poster of this paper will be presented at the 25th International Conference on Parallel Architecture and Compilation Technology (PACT ’16), September 11-15, 2016, Haifa, Israel.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nanorap is a new nanotechnological formulation for topical anesthesia composed of lidocaine (2.5%) and prilocaine (2.5%). The present study evaluated the pharmacokinetics (PK) of Nanorap. For the determination of lidocaine and prilocaine in human plasma a new method using high-performance liquid-chromatography coupled to tandem mass spectrometry was developed. Nanorap pharmacodynamic (PD) and its physical proprieties were also evaluated. Nanorap was administered by topical application of 2g to healthy volunteers and blood samples were collected for the PK analysis. The drugs were extracted from plasma by liquid-liquid extraction with ether/hexane (80/20, v/v). The chromatography separation was performed on a Genesis C18 analytical column 4 µm (100 x 2.1 mm i.d.) with a mobile phase of methanol/acetonitrile/water (40/30/30, for lidocaine, and 50/30/20, for prilocaine, v/v/v) + 2 mM of ammonium acetate and ropivacaine as internal standard. The drugs were quantified using a mass spectrometer with an electrospray source in the ESI positive mode (ES+) configured for multiple reaction monitoring. The PD of Nanorap was evaluated with the use of a visual analogue scale. Nanorap was characterized by cryofracture. The chromatography run time was 5.5 min for lidocaine and 3.3 min for prilocaine and the lower limit of quantification was 0.05 ng/mL for both drugs. Mean Cmax was 6.62 and 1.72 ng/mL for lidocaine and prilocaine, respectively. Median Tmax was 6.5 hours for both drugs. Nanocapsules had a mean size of 88nm and mean drug association of 92.5% and 89% for lidocaine and prilocaine, respectively. The PD study showed that Nanorap has a sufficient analgesic effect (>30% reduction in pain) after 10 minutes of application. A new simple, selective and sensitive method for determination of lidocaine and prilocaine in human plasma was developed. Nanorap generated safe plasma levels of the drugs and satisfactory analgesic effect.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Os sistemas de tempo real modernos geram, cada vez mais, cargas computacionais pesadas e dinâmicas, começando-se a tornar pouco expectável que sejam implementados em sistemas uniprocessador. Na verdade, a mudança de sistemas com um único processador para sistemas multi- processador pode ser vista, tanto no domínio geral, como no de sistemas embebidos, como uma forma eficiente, em termos energéticos, de melhorar a performance das aplicações. Simultaneamente, a proliferação das plataformas multi-processador transformaram a programação paralela num tópico de elevado interesse, levando o paralelismo dinâmico a ganhar rapidamente popularidade como um modelo de programação. A ideia, por detrás deste modelo, é encorajar os programadores a exporem todas as oportunidades de paralelismo através da simples indicação de potenciais regiões paralelas dentro das aplicações. Todas estas anotações são encaradas pelo sistema unicamente como sugestões, podendo estas serem ignoradas e substituídas, por construtores sequenciais equivalentes, pela própria linguagem. Assim, o modo como a computação é na realidade subdividida, e mapeada nos vários processadores, é da responsabilidade do compilador e do sistema computacional subjacente. Ao retirar este fardo do programador, a complexidade da programação é consideravelmente reduzida, o que normalmente se traduz num aumento de produtividade. Todavia, se o mecanismo de escalonamento subjacente não for simples e rápido, de modo a manter o overhead geral em níveis reduzidos, os benefícios da geração de um paralelismo com uma granularidade tão fina serão meramente hipotéticos. Nesta perspetiva de escalonamento, os algoritmos que empregam uma política de workstealing são cada vez mais populares, com uma eficiência comprovada em termos de tempo, espaço e necessidades de comunicação. Contudo, estes algoritmos não contemplam restrições temporais, nem outra qualquer forma de atribuição de prioridades às tarefas, o que impossibilita que sejam diretamente aplicados a sistemas de tempo real. Além disso, são tradicionalmente implementados no runtime da linguagem, criando assim um sistema de escalonamento com dois níveis, onde a previsibilidade, essencial a um sistema de tempo real, não pode ser assegurada. Nesta tese, é descrita a forma como a abordagem de work-stealing pode ser resenhada para cumprir os requisitos de tempo real, mantendo, ao mesmo tempo, os seus princípios fundamentais que tão bons resultados têm demonstrado. Muito resumidamente, a única fila de gestão de processos convencional (deque) é substituída por uma fila de deques, ordenada de forma crescente por prioridade das tarefas. De seguida, aplicamos por cima o conhecido algoritmo de escalonamento dinâmico G-EDF, misturamos as regras de ambos, e assim nasce a nossa proposta: o algoritmo de escalonamento RTWS. Tirando partido da modularidade oferecida pelo escalonador do Linux, o RTWS é adicionado como uma nova classe de escalonamento, de forma a avaliar na prática se o algoritmo proposto é viável, ou seja, se garante a eficiência e escalonabilidade desejadas. Modificar o núcleo do Linux é uma tarefa complicada, devido à complexidade das suas funções internas e às fortes interdependências entre os vários subsistemas. Não obstante, um dos objetivos desta tese era ter a certeza que o RTWS é mais do que um conceito interessante. Assim, uma parte significativa deste documento é dedicada à discussão sobre a implementação do RTWS e à exposição de situações problemáticas, muitas delas não consideradas em teoria, como é o caso do desfasamento entre vários mecanismo de sincronização. Os resultados experimentais mostram que o RTWS, em comparação com outro trabalho prático de escalonamento dinâmico de tarefas com restrições temporais, reduz significativamente o overhead de escalonamento através de um controlo de migrações, e mudanças de contexto, eficiente e escalável (pelo menos até 8 CPUs), ao mesmo tempo que alcança um bom balanceamento dinâmico da carga do sistema, até mesmo de uma forma não custosa. Contudo, durante a avaliação realizada foi detetada uma falha na implementação do RTWS, pela forma como facilmente desiste de roubar trabalho, o que origina períodos de inatividade, no CPU em questão, quando a utilização geral do sistema é baixa. Embora o trabalho realizado se tenha focado em manter o custo de escalonamento baixo e em alcançar boa localidade dos dados, a escalonabilidade do sistema nunca foi negligenciada. Na verdade, o algoritmo de escalonamento proposto provou ser bastante robusto, não falhando qualquer meta temporal nas experiências realizadas. Portanto, podemos afirmar que alguma inversão de prioridades, causada pela sub-política de roubo BAS, não compromete os objetivos de escalonabilidade, e até ajuda a reduzir a contenção nas estruturas de dados. Mesmo assim, o RTWS também suporta uma sub-política de roubo determinística: PAS. A avaliação experimental, porém, não ajudou a ter uma noção clara do impacto de uma e de outra. No entanto, de uma maneira geral, podemos concluir que o RTWS é uma solução promissora para um escalonamento eficiente de tarefas paralelas com restrições temporais.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fieldbus communication networks aim to interconnect sensors, actuators and controllers within process control applications. Therefore, they constitute the foundation upon which real-time distributed computer-controlled systems can be implemented. P-NET is a fieldbus communication standard, which uses a virtual token-passing medium-access-control mechanism. In this paper pre-run-time schedulability conditions for supporting real-time traffic with P-NET networks are established. Essentially, formulae to evaluate the upper bound of the end-to-end communication delay in P-NET messages are provided. Using this upper bound, a feasibility test is then provided to check the timing requirements for accessing remote process variables. This paper also shows how P-NET network segmentation can significantly reduce the end-to-end communication delays for messages with stringent timing requirements.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Real-time systems demand guaranteed and predictable run-time behaviour in order to ensure that no task has missed its deadline. Over the years we are witnessing an ever increasing demand for functionality enhancements in the embedded real-time systems. Along with the functionalities, the design itself grows more complex. Posed constraints, such as energy consumption, time, and space bounds, also require attention and proper handling. Additionally, efficient scheduling algorithms, as proven through analyses and simulations, often impose requirements that have significant run-time cost, specially in the context of multi-core systems. In order to further investigate the behaviour of such systems to quantify and compare these overheads involved, we have developed the SPARTS, a simulator of a generic embedded real- time device. The tasks in the simulator are described by externally visible parameters (e.g. minimum inter-arrival, sporadicity, WCET, BCET, etc.), rather than the code of the tasks. While our current implementation is primarily focused on our immediate needs in the area of power-aware scheduling, it is designed to be extensible to accommodate different task properties, scheduling algorithms and/or hardware models for the application in wide variety of simulations. The source code of the SPARTS is available for download at [1].

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Monitoring systems have traditionally been developed with rigid objectives and functionalities, and tied to specific languages, libraries and run-time environments. There is a need for more flexible monitoring systems which can be easily adapted to distinct requirements. On-line monitoring has been considered as increasingly important for observation and control of a distributed application. In this paper we discuss monitoring interfaces and architectures which support more extensible monitoring and control services. We describe our work on the development of a distributed monitoring infrastructure, and illustrate how it eases the implementation of a complex distributed debugging architecture. We also discuss several issues concerning support for tool interoperability and illustrate how the cooperation among multiple concurrent tools can ease the task of distributed debugging.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thesis presented in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the subject of Electrical and Computer Engineering

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática