960 resultados para Multi-threaded parallel tasks
Resumo:
Femtosecond laser microfabrication has emerged over the last decade as a 3D flexible technology in photonics. Numerical simulations provide an important insight into spatial and temporal beam and pulse shaping during the course of extremely intricate nonlinear propagation (see e.g. [1,2]). Electromagnetics of such propagation is typically described in the form of the generalized Non-Linear Schrdinger Equation (NLSE) coupled with Drude model for plasma [3]. In this paper we consider a multi-threaded parallel numerical solution for a specific model which describes femtosecond laser pulse propagation in transparent media [4, 5]. However our approach can be extended to similar models. The numerical code is implemented in NVIDIA Graphics Processing Unit (GPU) which provides an effitient hardware platform for multi-threded computing. We compare the performance of the described below parallel code implementated for GPU using CUDA programming interface [3] with a serial CPU version used in our previous papers [4,5]. © 2011 IEEE.
Resumo:
We describe a parallel multi-threaded approach for high performance modelling of wide class of phenomena in ultrafast nonlinear optics. Specific implementation has been performed using the highly parallel capabilities of a programmable graphics processor. © 2011 SPIE.
Resumo:
In this paper, we propose the Distributed using Optimal Priority Assignment (DOPA) heuristic that finds a feasible partitioning and priority assignment for distributed applications based on the linear transactional model. DOPA partitions the tasks and messages in the distributed system, and makes use of the Optimal Priority Assignment (OPA) algorithm known as Audsley’s algorithm, to find the priorities for that partition. The experimental results show how the use of the OPA algorithm increases in average the number of schedulable tasks and messages in a distributed system when compared to the use of Deadline Monotonic (DM) usually favoured in other works. Afterwards, we extend these results to the assignment of Parallel/Distributed applications and present a second heuristic named Parallel-DOPA (P-DOPA). In that case, we show how the partitioning process can be simplified by using the Distributed Stretch Transformation (DST), a parallel transaction transformation algorithm introduced in [1].
Resumo:
Poster presented in Work in Progress Session, 28th GI/ITG International Conference on Architecture of Computing Systems (ARCS 2015). 24 to 26, Mar, 2015. Porto, Portugal.
Resumo:
Diplomityö tarkastelee säikeistettyä ohjelmointia rinnakkaisohjelmoinnin ylemmällä hierarkiatasolla tarkastellen erityisesti hypersäikeistysteknologiaa. Työssä tarkastellaan hypersäikeistyksen hyviä ja huonoja puolia sekä sen vaikutuksia rinnakkaisalgoritmeihin. Työn tavoitteena oli ymmärtää Intel Pentium 4 prosessorin hypersäikeistyksen toteutus ja mahdollistaa sen hyödyntäminen, missä se tuo suorituskyvyllistä etua. Työssä kerättiin ja analysoitiin suorituskykytietoa ajamalla suuri joukko suorituskykytestejä eri olosuhteissa (muistin käsittely, kääntäjän asetukset, ympäristömuuttujat...). Työssä tarkasteltiin kahdentyyppisiä algoritmeja: matriisioperaatioita ja lajittelua. Näissä sovelluksissa on säännöllinen muistinkäyttökuvio, mikä on kaksiteräinen miekka. Se on etu aritmeettis-loogisissa prosessoinnissa, mutta toisaalta huonontaa muistin suorituskykyä. Syynä siihen on nykyaikaisten prosessorien erittäin hyvä raaka suorituskyky säännöllistä dataa käsiteltäessä, mutta muistiarkkitehtuuria rajoittaa välimuistien koko ja useat puskurit. Kun ongelman koko ylittää tietyn rajan, todellinen suorituskyky voi pudota murto-osaan huippusuorituskyvystä.
Resumo:
In this paper we describe a distributed object oriented logic programming language in which an object is a collection of threads deductively accessing and updating a shared logic program. The key features of the language, such as static and dynamic object methods and multiple inheritance, are illustrated through a series of small examples. We show how we can implement object servers, allowing remote spawning of objects, which we can use as staging posts for mobile agents. We give as an example an information gathering mobile agent that can be queried about the information it has so far gathered whilst it is gathering new information. Finally we define a class of co-operative reasoning agents that can do resource bounded inference for full first order predicate logic, handling multiple queries and information updates concurrently. We believe that the combination of the concurrent OO and the LP programming paradigms produces a powerful tool for quickly implementing rational multi-agent applications on the internet.
Resumo:
This paper proposes a global multiprocessor scheduling algorithm for the Linux kernel that combines the global EDF scheduler with a priority-aware work-stealing load balancing scheme, enabling parallel real-time tasks to be executed on more than one processor at a given time instant. We state that some priority inversion may actually be acceptable, provided it helps reduce contention, communication, synchronisation and coordination between parallel threads, while still guaranteeing the expected system’s predictability. Experimental results demonstrate the low scheduling overhead of the proposed approach comparatively to an existing real-time deadline-oriented scheduling class for the Linux kernel.
Resumo:
This paper presents a methodology for multi-objective day-ahead energy resource scheduling for smart grids considering intensive use of distributed generation and Vehicle- To-Grid (V2G). The main focus is the application of weighted Pareto to a multi-objective parallel particle swarm approach aiming to solve the dual-objective V2G scheduling: minimizing total operation costs and maximizing V2G income. A realistic mathematical formulation, considering the network constraints and V2G charging and discharging efficiencies is presented and parallel computing is applied to the Pareto weights. AC power flow calculation is included in the metaheuristics approach to allow taking into account the network constraints. A case study with a 33-bus distribution network and 1800 V2G resources is used to illustrate the performance of the proposed method.
Resumo:
Paper/Poster presented in Work in Progress Session, 28th GI/ITG International Conference on Architecture of Computing Systems (ARCS 2015). 24 to 26, Mar, 2015. Porto, Portugal.
Resumo:
In this tutorial paper we summarise the key features of the multi-threaded Qu-Prolog language for implementing multi-threaded communicating agent applications. Internal threads of an agent communicate using the shared dynamic database used as a generalisation of Linda tuple store. Threads in different agents, perhaps on different hosts, communicate using either a thread-to-thread store and forward communication system, or by a publish and subscribe mechanism in which messages are routed to their destinations based on content test subscriptions. We illustrate the features using an auction house application. This is fully distributed with multiple auctioneers and bidders which participate in simultaneous auctions. The application makes essential use of the three forms of inter-thread communication of Qu-Prolog. The agent bidding behaviour is specified graphically as a finite state automaton and its implementation is essentially the execution of its state transition function. The paper assumes familiarity with Prolog and the basic concepts of multi-agent systems.
Resumo:
The focus of this study is development of parallelised version of severely sequential and iterative numerical algorithms based on multi-threaded parallel platform such as a graphics processing unit. This requires design and development of a platform-specific numerical solution that can benefit from the parallel capabilities of the chosen platform. Graphics processing unit was chosen as a parallel platform for design and development of a numerical solution for a specific physical model in non-linear optics. This problem appears in describing ultra-short pulse propagation in bulk transparent media that has recently been subject to several theoretical and numerical studies. The mathematical model describing this phenomenon is a challenging and complex problem and its numerical modeling limited on current modern workstations. Numerical modeling of this problem requires a parallelisation of an essentially serial algorithms and elimination of numerical bottlenecks. The main challenge to overcome is parallelisation of the globally non-local mathematical model. This thesis presents a numerical solution for elimination of numerical bottleneck associated with the non-local nature of the mathematical model. The accuracy and performance of the parallel code is identified by back-to-back testing with a similar serial version.
Resumo:
The increasing demand for cheaper-faster-better services anytime and anywhere has made radio network optimisation much more complex than ever before. In order to dynamically optimise the serving network, Dynamic Network Optimisation (DNO), is proposed as the ultimate solution and future trend. The realization of DNO, however, has been hindered by a significant bottleneck of the optimisation speed as the network complexity grows. This paper presents a multi-threaded parallel solution to accelerate complicated proprietary network optimisation algorithms, under a rigid condition of numerical consistency. ariesoACP product from Arieso Ltd serves as the platform for parallelisation. This parallel solution has been benchmarked and results exhibit a high scalability and a run-time reduction by 11% to 42% based on the technology, subscriber density and blocking rate of a given network in comparison with the original version. Further, it is highly essential that the parallel version produces equivalent optimisation quality in terms of identical optimisation outputs.
Resumo:
Distributed real-time systems such as automotive applications are becoming larger and more complex, thus, requiring the use of more powerful hardware and software architectures. Furthermore, those distributed applications commonly have stringent real-time constraints. This implies that such applications would gain in flexibility if they were parallelized and distributed over the system. In this paper, we consider the problem of allocating fixed-priority fork-join Parallel/Distributed real-time tasks onto distributed multi-core nodes connected through a Flexible Time Triggered Switched Ethernet network. We analyze the system requirements and present a set of formulations based on a constraint programming approach. Constraint programming allows us to express the relations between variables in the form of constraints. Our approach is guaranteed to find a feasible solution, if one exists, in contrast to other approaches based on heuristics. Furthermore, approaches based on constraint programming have shown to obtain solutions for these type of formulations in reasonable time.
Resumo:
The 30th ACM/SIGAPP Symposium On Applied Computing (SAC 2015). 13 to 17, Apr, 2015, Embedded Systems. Salamanca, Spain.