977 resultados para asymmetric parallel-kinematics mechanisms
Resumo:
In this paper we address the real-time capabilities of P-NET, which is a multi-master fieldbus standard based on a virtual token passing scheme. We show how P-NET’s medium access control (MAC) protocol is able to guarantee a bounded access time to message requests. We then propose a model for implementing fixed prioritybased dispatching mechanisms at each master’s application level. In this way, we diminish the impact of the first-come-first-served (FCFS) policy that P-NET uses at the data link layer. The proposed model rises several issues well known within the real-time systems community: message release jitter; pre-run-time schedulability analysis in non pre-emptive contexts; non-independence of tasks at the application level. We identify these issues in the proposed model and show how results available for priority-based task dispatching can be adapted to encompass priority-based message dispatching in P-NET networks.
Resumo:
In the past few years Tabling has emerged as a powerful logic programming model. The integration of concurrent features into the implementation of Tabling systems is demanded by need to use recently developed tabling applications within distributed systems, where a process has to respond concurrently to several requests. The support for sharing of tables among the concurrent threads of a Tabling process is a desirable feature, to allow one of Tabling’s virtues, the re-use of computations by other threads and to allow efficient usage of available memory. However, the incremental completion of tables which are evaluated concurrently is not a trivial problem. In this dissertation we describe the integration of concurrency mechanisms, by the way of multi-threading, in a state of the art Tabling and Prolog system, XSB. We begin by reviewing the main concepts for a formal description of tabled computations, called SLG resolution and for the implementation of Tabling under the SLG-WAM, the abstract machine supported by XSB. We describe the different scheduling strategies provided by XSB and introduce some new properties of local scheduling, a scheduling strategy for SLG resolution. We proceed to describe our implementation work by describing the process of integrating multi-threading in a Prolog system supporting Tabling, without addressing the problem of shared tables. We describe the trade-offs and implementation decisions involved. We then describe an optimistic algorithm for the concurrent sharing of completed tables, Shared Completed Tables, which allows the sharing of tables without incurring in deadlocks, under local scheduling. This method relies on the execution properties of local scheduling and includes full support for negation. We provide a theoretical framework and discuss the implementation’s correctness and complexity. After that, we describe amethod for the sharing of tables among threads that allows parallelism in the computation of inter-dependent subgoals, which we name Concurrent Completion. We informally argue for the correctness of Concurrent Completion. We give detailed performance measurements of the multi-threaded XSB systems over a variety of machines and operating systems, for both the Shared Completed Tables and the Concurrent Completion implementations. We focus our measurements inthe overhead over the sequential engine and the scalability of the system. We finish with a comparison of XSB with other multi-threaded Prolog systems and we compare our approach to concurrent tabling with parallel and distributed methods for the evaluation of tabling. Finally, we identify future research directions.
Resumo:
This paper proposes a global multiprocessor scheduling algorithm for the Linux kernel that combines the global EDF scheduler with a priority-aware work-stealing load balancing scheme, enabling parallel real-time tasks to be executed on more than one processor at a given time instant. We state that some priority inversion may actually be acceptable, provided it helps reduce contention, communication, synchronisation and coordination between parallel threads, while still guaranteeing the expected system’s predictability. Experimental results demonstrate the low scheduling overhead of the proposed approach comparatively to an existing real-time deadline-oriented scheduling class for the Linux kernel.
Resumo:
Dynamic parallel scheduling using work-stealing has gained popularity in academia and industry for its good performance, ease of implementation and theoretical bounds on space and time. Cores treat their own double-ended queues (deques) as a stack, pushing and popping threads from the bottom, but treat the deque of another randomly selected busy core as a queue, stealing threads only from the top, whenever they are idle. However, this standard approach cannot be directly applied to real-time systems, where the importance of parallelising tasks is increasing due to the limitations of multiprocessor scheduling theory regarding parallelism. Using one deque per core is obviously a source of priority inversion since high priority tasks may eventually be enqueued after lower priority tasks, possibly leading to deadline misses as in this case the lower priority tasks are the candidates when a stealing operation occurs. Our proposal is to replace the single non-priority deque of work-stealing with ordered per-processor priority deques of ready threads. The scheduling algorithm starts with a single deque per-core, but unlike traditional work-stealing, the total number of deques in the system may now exceed the number of processors. Instead of stealing randomly, cores steal from the highest priority deque.
Resumo:
Real-time embedded applications require to process large amounts of data within small time windows. Parallelize and distribute workloads adaptively is suitable solution for computational demanding applications. The purpose of the Parallel Real-Time Framework for distributed adaptive embedded systems is to guarantee local and distributed processing of real-time applications. This work identifies some promising research directions for parallel/distributed real-time embedded applications.
Resumo:
Embedded real-time applications increasingly present high computation requirements, which need to be completed within specific deadlines, but that present highly variable patterns, depending on the set of data available in a determined instant. The current trend to provide parallel processing in the embedded domain allows providing higher processing power; however, it does not address the variability in the processing pattern. Dimensioning each device for its worst-case scenario implies lower average utilization, and increased available, but unusable, processing in the overall system. A solution for this problem is to extend the parallel execution of the applications, allowing networked nodes to distribute the workload, on peak situations, to neighbour nodes. In this context, this report proposes a framework to develop parallel and distributed real-time embedded applications, transparently using OpenMP and Message Passing Interface (MPI), within a programming model based on OpenMP. The technical report also devises an integrated timing model, which enables the structured reasoning on the timing behaviour of these hybrid architectures.
Resumo:
High-level parallel languages offer a simple way for application programmers to specify parallelism in a form that easily scales with problem size, leaving the scheduling of the tasks onto processors to be performed at runtime. Therefore, if the underlying system cannot efficiently execute those applications on the available cores, the benefits will be lost. In this paper, we consider how to schedule highly heterogenous parallel applications that require real-time performance guarantees on multicore processors. The paper proposes a novel scheduling approach that combines the global Earliest Deadline First (EDF) scheduler with a priority-aware work-stealing load balancing scheme, which enables parallel realtime tasks to be executed on more than one processor at a given time instant. Experimental results demonstrate the better scalability and lower scheduling overhead of the proposed approach comparatively to an existing real-time deadline-oriented scheduling class for the Linux kernel.
Resumo:
Multicore platforms have transformed parallelism into a main concern. Parallel programming models are being put forward to provide a better approach for application programmers to expose the opportunities for parallelism by pointing out potentially parallel regions within tasks, leaving the actual and dynamic scheduling of these regions onto processors to be performed at runtime, exploiting the maximum amount of parallelism. It is in this context that this paper proposes a scheduling approach that combines the constant-bandwidth server abstraction with a priority-aware work-stealing load balancing scheme which, while ensuring isolation among tasks, enables parallel tasks to be executed on more than one processor at a given time instant.
Resumo:
The recent trends of chip architectures with higher number of heterogeneous cores, and non-uniform memory/non-coherent caches, brings renewed attention to the use of Software Transactional Memory (STM) as a fundamental building block for developing parallel applications. Nevertheless, although STM promises to ease concurrent and parallel software development, it relies on the possibility of aborting conflicting transactions to maintain data consistency, which impacts on the responsiveness and timing guarantees required by embedded real-time systems. In these systems, contention delays must be (efficiently) limited so that the response times of tasks executing transactions are upper-bounded and task sets can be feasibly scheduled. In this paper we assess the use of STM in the development of embedded real-time software, defending that the amount of contention can be reduced if read-only transactions access recent consistent data snapshots, progressing in a wait-free manner. We show how the required number of versions of a shared object can be calculated for a set of tasks. We also outline an algorithm to manage conflicts between update transactions that prevents starvation.
Resumo:
Over the last three decades, computer architects have been able to achieve an increase in performance for single processors by, e.g., increasing clock speed, introducing cache memories and using instruction level parallelism. However, because of power consumption and heat dissipation constraints, this trend is going to cease. In recent times, hardware engineers have instead moved to new chip architectures with multiple processor cores on a single chip. With multi-core processors, applications can complete more total work than with one core alone. To take advantage of multi-core processors, parallel programming models are proposed as promising solutions for more effectively using multi-core processors. This paper discusses some of the existent models and frameworks for parallel programming, leading to outline a draft parallel programming model for Ada.
Resumo:
Applications with soft real-time requirements can benefit from code mobility mechanisms, as long as those mechanisms support the timing and Quality of Service requirements of applications. In this paper, a generic model for code mobility mechanisms is presented. The proposed model gives system designers the necessary tools to perform a statistical timing analysis on the execution of the mobility mechanisms that can be used to determine the impact of code mobility in distributed real-time applications.
Resumo:
Superoxide reductase is a 14 kDa metalloprotein containing a catalytic nonhaem iron centre [Fe(His)4Cys]. It is involved in defence mechanisms against oxygen toxicity, scavenging superoxide radicals from the cell. The oxidized form of Treponema pallidum superoxide reductase was crystallized in the presence of polyethylene glycol and magnesium chloride. Two crystal forms were obtained depending on the oxidizing agents used after purification: crystals grown in the presence of K3Fe(CN)6 belonged to space group P21 (unit-cell parameters a = 60.3, b = 59.9, c = 64.8 A ° , = 106.9 ) and diffracted beyond 1.60 A ° resolution, while crystals grown in the presence of Na2IrCl6 belonged to space group C2 (a = 119.4, b = 60.1, c = 65.6 A ° , = 104.9 ) and diffracted beyond 1.55 A ° . A highly redundant X-ray diffraction data set from the C2 crystal form collected on a copper rotating-anode generator ( = 1.542 A ° ) clearly defined the positions of the four Fe atoms present in the asymmetric unit by SAD methods. A MAD experiment at the iron absorption edge confirmed the positions of the previously determined iron sites and provided better phases for model building and refinement. Molecular replacement using the P21 data set was successful using a preliminary trace as a search model. A similar arrangement of the four protein molecules could be observed.
Resumo:
The recently standardized IEEE 802.15.4/Zigbee protocol stack offers great potentials for ubiquitous and pervasive computing, namely for Wireless Sensor Networks (WSNs). However, there are still some open and ambiguous issues that turn its practical use a challenging task. One of those issues is how to build a synchronized multi-hop cluster-tree network, which is quite suitable for QoS support in WSNs. In fact, the current IEEE 802.15.4/Zigbee specifications restrict the synchronization in the beacon-enabled mode (by the generation of periodic beacon frames) to star-based networks, while it supports multi-hop networking using the peer-to-peer mesh topology, but with no synchronization. Even though both specifications mention the possible use of cluster-tree topologies, which combine multi-hop and synchronization features, the description on how to effectively construct such a network topology is missing. This paper tackles this problem, unveils the ambiguities regarding the use of the cluster-tree topology and proposes two collision-free beacon frame scheduling schemes. We strongly believe that the results provided in this paper trigger a significant step towards the practical and efficient use of IEEE 802.15.4/Zigbee cluster-tree networks.
Resumo:
This letter presents a new parallel method for hyperspectral unmixing composed by the efficient combination of two popular methods: vertex component analysis (VCA) and sparse unmixing by variable splitting and augmented Lagrangian (SUNSAL). First, VCA extracts the endmember signatures, and then, SUNSAL is used to estimate the abundance fractions. Both techniques are highly parallelizable, which significantly reduces the computing time. A design for the commodity graphics processing units of the two methods is presented and evaluated. Experimental results obtained for simulated and real hyperspectral data sets reveal speedups up to 100 times, which grants real-time response required by many remotely sensed hyperspectral applications.
Resumo:
Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para obtenção do grau de Mestre em Engenharia Electrotécnica e de Computadores