1000 resultados para multi-threading
Resumo:
This paper presents the multi-threading and internet message communication capabilities of Qu-Prolog. Message addresses are symbolic and the communications package provides high-level support that completely hides details of IP addresses and port numbers as well as the underlying TCP/IP transport layer. The combination of the multi-threads and the high level inter-thread message communications provide simple, powerful support for implementing internet distributed intelligent applications.
Resumo:
Los procesadores multi-core y el multi-threading por hardware permiten aumentar el rendimiento de las aplicaciones. Por un lado, los procesadores multi-core combinan 2 o más procesadores en un mismo chip. Por otro lado, el multi-threading por hardware es una técnica que incrementa la utilización de los recursos del procesador. Este trabajo presenta un análisis de rendimiento de los resultados obtenidos en dos aplicaciones, multiplicación de matrices densas y transformada rápida de Fourier. Ambas aplicaciones se han ejecutado en arquitecturas multi-core que explotan el paralelismo a nivel de thread pero con un modelo de multi-threading diferente. Los resultados obtenidos muestran la importancia de entender y saber analizar el efecto del multi-core y multi-threading en el rendimiento.
Resumo:
Diplomityö tarkastelee säikeistettyä ohjelmointia rinnakkaisohjelmoinnin ylemmällä hierarkiatasolla tarkastellen erityisesti hypersäikeistysteknologiaa. Työssä tarkastellaan hypersäikeistyksen hyviä ja huonoja puolia sekä sen vaikutuksia rinnakkaisalgoritmeihin. Työn tavoitteena oli ymmärtää Intel Pentium 4 prosessorin hypersäikeistyksen toteutus ja mahdollistaa sen hyödyntäminen, missä se tuo suorituskyvyllistä etua. Työssä kerättiin ja analysoitiin suorituskykytietoa ajamalla suuri joukko suorituskykytestejä eri olosuhteissa (muistin käsittely, kääntäjän asetukset, ympäristömuuttujat...). Työssä tarkasteltiin kahdentyyppisiä algoritmeja: matriisioperaatioita ja lajittelua. Näissä sovelluksissa on säännöllinen muistinkäyttökuvio, mikä on kaksiteräinen miekka. Se on etu aritmeettis-loogisissa prosessoinnissa, mutta toisaalta huonontaa muistin suorituskykyä. Syynä siihen on nykyaikaisten prosessorien erittäin hyvä raaka suorituskyky säännöllistä dataa käsiteltäessä, mutta muistiarkkitehtuuria rajoittaa välimuistien koko ja useat puskurit. Kun ongelman koko ylittää tietyn rajan, todellinen suorituskyky voi pudota murto-osaan huippusuorituskyvystä.
Resumo:
We present in this paper several contributions on the collision detection optimization centered on hardware performance. We focus on the broad phase which is the first step of the collision detection process and propose three new ways of parallelization of the well-known Sweep and Prune algorithm. We first developed a multi-core model takes into account the number of available cores. Multi-core architecture enables us to distribute geometric computations with use of multi-threading. Critical writing section and threads idling have been minimized by introducing new data structures for each thread. Programming with directives, like OpenMP, appears to be a good compromise for code portability. We then proposed a new GPU-based algorithm also based on the "Sweep and Prune" that has been adapted to multi-GPU architectures. Our technique is based on a spatial subdivision method used to distribute computations among GPUs. Results show that significant speed-up can be obtained by passing from 1 to 4 GPUs in a large-scale environment.
Resumo:
This paper presents the unique collection of additional features of Qu-Prolog, a variant of the Al programming language Prolog, and illustrates how they can be used for implementing DAI applications. By this we mean applications comprising communicating information servers, expert systems, or agents, with sophisticated reasoning capabilities and internal concurrency. Such an application exploits the key features of Qu-Prolog: support for the programming of sound non-clausal inference systems, multi-threading, and high level inter-thread message communication between Qu-Prolog query threads anywhere on the internet. The inter-thread communication uses email style symbolic names for threads, allowing easy construction of distributed applications using public names for threads. How threads react to received messages is specified by a disjunction of reaction rules which the thread periodically executes. A communications API allows smooth integration of components written in C, which to Qu-Prolog, look like remote query threads.
Resumo:
In the past few years Tabling has emerged as a powerful logic programming model. The integration of concurrent features into the implementation of Tabling systems is demanded by need to use recently developed tabling applications within distributed systems, where a process has to respond concurrently to several requests. The support for sharing of tables among the concurrent threads of a Tabling process is a desirable feature, to allow one of Tabling’s virtues, the re-use of computations by other threads and to allow efficient usage of available memory. However, the incremental completion of tables which are evaluated concurrently is not a trivial problem. In this dissertation we describe the integration of concurrency mechanisms, by the way of multi-threading, in a state of the art Tabling and Prolog system, XSB. We begin by reviewing the main concepts for a formal description of tabled computations, called SLG resolution and for the implementation of Tabling under the SLG-WAM, the abstract machine supported by XSB. We describe the different scheduling strategies provided by XSB and introduce some new properties of local scheduling, a scheduling strategy for SLG resolution. We proceed to describe our implementation work by describing the process of integrating multi-threading in a Prolog system supporting Tabling, without addressing the problem of shared tables. We describe the trade-offs and implementation decisions involved. We then describe an optimistic algorithm for the concurrent sharing of completed tables, Shared Completed Tables, which allows the sharing of tables without incurring in deadlocks, under local scheduling. This method relies on the execution properties of local scheduling and includes full support for negation. We provide a theoretical framework and discuss the implementation’s correctness and complexity. After that, we describe amethod for the sharing of tables among threads that allows parallelism in the computation of inter-dependent subgoals, which we name Concurrent Completion. We informally argue for the correctness of Concurrent Completion. We give detailed performance measurements of the multi-threaded XSB systems over a variety of machines and operating systems, for both the Shared Completed Tables and the Concurrent Completion implementations. We focus our measurements inthe overhead over the sequential engine and the scalability of the system. We finish with a comparison of XSB with other multi-threaded Prolog systems and we compare our approach to concurrent tabling with parallel and distributed methods for the evaluation of tabling. Finally, we identify future research directions.
Resumo:
Remote sensing spatial, spectral, and temporal resolutions of images, acquired over a reasonably sized image extent, result in imagery that can be processed to represent land cover over large areas with an amount of spatial detail that is very attractive for monitoring, management, and scienti c activities. With Moore's Law alive and well, more and more parallelism is introduced into all computing platforms, at all levels of integration and programming to achieve higher performance and energy e ciency. Being the geometric calibration process one of the most time consuming processes when using remote sensing images, the aim of this work is to accelerate this process by taking advantage of new computing architectures and technologies, specially focusing in exploiting computation over shared memory multi-threading hardware. A parallel implementation of the most time consuming process in the remote sensing geometric correction has been implemented using OpenMP directives. This work compares the performance of the original serial binary versus the parallelized implementation, using several multi-threaded modern CPU architectures, discussing about the approach to nd the optimum hardware for a cost-e ective execution.
Resumo:
A key capability of data-race detectors is to determine whether one thread executes logically in parallel with another or whether the threads must operate in series. This paper provides two algorithms, one serial and one parallel, to maintain series-parallel (SP) relationships "on the fly" for fork-join multithreaded programs. The serial SP-order algorithm runs in O(1) amortized time per operation. In contrast, the previously best algorithm requires a time per operation that is proportional to Tarjan’s functional inverse of Ackermann’s function. SP-order employs an order-maintenance data structure that allows us to implement a more efficient "English-Hebrew" labeling scheme than was used in earlier race detectors, which immediately yields an improved determinacy-race detector. In particular, any fork-join program running in T₁ time on a single processor can be checked on the fly for determinacy races in O(T₁) time. Corresponding improved bounds can also be obtained for more sophisticated data-race detectors, for example, those that use locks. By combining SP-order with Feng and Leiserson’s serial SP-bags algorithm, we obtain a parallel SP-maintenance algorithm, called SP-hybrid. Suppose that a fork-join program has n threads, T₁ work, and a critical-path length of T[subscript â]. When executed on P processors, we prove that SP-hybrid runs in O((T₁/P + PT[subscript â]) lg n) expected time. To understand this bound, consider that the original program obtains linear speed-up over a 1-processor execution when P = O(T₁/T[subscript â]). In contrast, SP-hybrid obtains linear speed-up when P = O(√T₁/T[subscript â]), but the work is increased by a factor of O(lg n).
Resumo:
In the 1990s the Message Passing Interface Forum defined MPI bindings for Fortran, C, and C++. With the success of MPI these relatively conservative languages have continued to dominate in the parallel computing community. There are compelling arguments in favour of more modern languages like Java. These include portability, better runtime error checking, modularity, and multi-threading. But these arguments have not converted many HPC programmers, perhaps due to the scarcity of full-scale scientific Java codes, and the lack of evidence for performance competitive with C or Fortran. This paper tries to redress this situation by porting two scientific applications to Java. Both of these applications are parallelized using our thread-safe Java messaging system—MPJ Express. The first application is the Gadget-2 code, which is a massively parallel structure formation code for cosmological simulations. The second application uses the finite-domain time-difference method for simulations in the area of computational electromagnetics. We evaluate and compare the performance of the Java and C versions of these two scientific applications, and demonstrate that the Java codes can achieve performance comparable with legacy applications written in conventional HPC languages. Copyright © 2009 John Wiley & Sons, Ltd.
Resumo:
The past decade has witnessed explosive growth of mobile subscribers and services. With the purpose of providing better-swifter-cheaper services, radio network optimisation plays a crucial role but faces enormous challenges. The concept of Dynamic Network Optimisation (DNO), therefore, has been introduced to optimally and continuously adjust network configurations, in response to changes in network conditions and traffic. However, the realization of DNO has been seriously hindered by the bottleneck of optimisation speed performance. An advanced distributed parallel solution is presented in this paper, as to bridge the gap by accelerating the sophisticated proprietary network optimisation algorithm, while maintaining the optimisation quality and numerical consistency. The ariesoACP product from Arieso Ltd serves as the main platform for acceleration. This solution has been prototyped, implemented and tested. Real-project based results exhibit a high scalability and substantial acceleration at an average speed-up of 2.5, 4.9 and 6.1 on a distributed 5-core, 9-core and 16-core system, respectively. This significantly outperforms other parallel solutions such as multi-threading. Furthermore, augmented optimisation outcome, alongside high correctness and self-consistency, have also been fulfilled. Overall, this is a breakthrough towards the realization of DNO.
Resumo:
Aiming to ensure greater reliability and consistency of data stored in the database, the data cleaning stage is set early in the process of Knowledge Discovery in Databases (KDD) and is responsible for eliminating problems and adjust the data for the later stages, especially for the stage of data mining. Such problems occur in the instance level and schema, namely, missing values, null values, duplicate tuples, values outside the domain, among others. Several algorithms were developed to perform the cleaning step in databases, some of them were developed specifically to work with the phonetics of words, since a word can be written in different ways. Within this perspective, this work presents as original contribution an optimization of algorithm for the detection of duplicate tuples in databases through phonetic based on multithreading without the need for trained data, as well as an independent environment of language to be supported for this. © 2011 IEEE.
Resumo:
Il Web nel corso della sua esistenza ha subito un mutamento dovuto in parte dalle richieste del mercato, ma soprattutto dall’evoluzione e la nascita costante delle numerose tecnologie coinvolte in esso. Si è passati da un’iniziale semplice diffusione di contenuti statici, ad una successiva collezione di siti web, dapprima con limitate presenze di dinamicità e interattività (a causa dei limiti tecnologici), ma successivamente poi evoluti alle attuali applicazioni web moderne che hanno colmato il gap con le applicazioni desktop, sia a livello tecnologico, che a livello di diffusione effettiva sul mercato. Tali applicazioni web moderne possono presentare un grado di complessità paragonabile in tutto e per tutto ai sistemi software desktop tradizionali; le tecnologie web hanno subito nel tempo un evoluzione legata ai cambiamenti del web stesso e tra le tecnologie più diffuse troviamo JavaScript, un linguaggio di scripting nato per dare dinamicità ai siti web che si ritrova tutt’ora ad essere utilizzato come linguaggio di programmazione di applicazioni altamente strutturate. Nel corso degli anni la comunità di sviluppo che ruota intorno a JavaScript ha prodotto numerose librerie al supporto del linguaggio dotando così gli sviluppatori di un linguaggio completo in grado di far realizzare applicazioni web avanzate. Le recenti evoluzioni dei motori javascript presenti nei browser hanno inoltre incrementato le prestazioni del linguaggio consacrandone la sua leadership nei confronti dei linguaggi concorrenti. Negli ultimi anni a causa della crescita della complessità delle applicazioni web, javascript è stato messo molto in discussione in quanto come linguaggio non offre le classiche astrazioni consolidate nel tempo per la programmazione altamente strutturata; per questo motivo sono nati linguaggi orientati alla programmazione ad oggetti per il web che si pongono come obiettivo la risoluzione di questo problema: tra questi si trovano linguaggi che hanno l’ambizione di soppiantare JavaScript come ad esempio Dart creato da Google, oppure altri che invece sfruttano JavaScript come linguaggio base al quale aggiungono le caratteristiche mancanti e, mediante il processo di compilazione, producono codice JavaScript puro compatibile con i motori JavaScript presenti nei browser. JavaScript storicamente fu introdotto come linguaggio sia per la programmazione client-side, che per la controparte server-side, ma per vari motivi (la forte concorrenza, basse performance, etc.) ebbe successo solo come linguaggio per la programmazione client; le recenti evoluzioni del linguaggio lo hanno però riportato in auge anche per la programmazione server-side, soprattutto per i miglioramenti delle performance, ma anche per la sua naturale predisposizione per la programmazione event-driven, paradigma alternativo al multi-threading per la programmazione concorrente. Un’applicazione web di elevata complessità al giorno d’oggi può quindi essere interamente sviluppata utilizzando il linguaggio JavaScript, acquisendone sia i suoi vantaggi che gli svantaggi; le nuove tecnologie introdotte ambiscono quindi a diventare la soluzione per i problemi presenti in JavaScript e di conseguenza si propongono come potenziali nuovi linguaggi completi per la programmazione web del futuro, anticipando anche le prossime evoluzioni delle tecnologie già esistenti preannunciate dagli enti standard della programmazione web, il W3C ed ECMAScript. In questa tesi saranno affrontate le tematiche appena introdotte confrontando tra loro le tecnologie in gioco con lo scopo di ottenere un’ampia panoramica delle soluzioni che uno sviluppatore web dovrà prendere in considerazione per realizzare un sistema di importanti dimensioni; in particolare sarà approfondito il linguaggio TypeScript proposto da Microsoft, il quale è nato in successione a Dart apparentemente con lo stesso scopo, ma grazie alla compatibilità con JavaScript e soprattutto con il vasto mondo di librerie legate ad esso nate in questi ultimi anni, si presenta nel mercato come tecnologia facile da apprendere per tutti gli sviluppatori che già da tempo hanno sviluppato abilità nella programmazione JavaScript.
Resumo:
An overview is given of the lessons learned from the introduction of multi-threading using OpenMP in tmLQCD. In particular, programming style, performance measurements, cache misses, scaling, thread distribution for hybrid codes, race conditions, the overlapping of communication and computation and the measurement and reduction of certain overheads are discussed. Performance measurements and sampling profiles are given for different implementations of the hopping matrix computational kernel.
Resumo:
We present a framework for the analysis of the decoding delay in multiview video coding (MVC). We show that in real-time applications, an accurate estimation of the decoding delay is essential to achieve a minimum communication latency. As opposed to single-view codecs, the complexity of the multiview prediction structure and the parallel decoding of several views requires a systematic analysis of this decoding delay, which we solve using graph theory and a model of the decoder hardware architecture. Our framework assumes a decoder implementation in general purpose multi-core processors with multi-threading capabilities. For this hardware model, we show that frame processing times depend on the computational load of the decoder and we provide an iterative algorithm to compute jointly frame processing times and decoding delay. Finally, we show that decoding delay analysis can be applied to design decoders with the objective of minimizing the communication latency of the MVC system.
Resumo:
In this paper, we present a formal model of Java concurrency using the Object-Z specification language. This model captures the Java thread synchronization concepts of locking, blocking, waiting and notification. In the model, we take a viewpoints approach, first capturing the role of the objects and threads, and then taking a system view where we capture the way the objects and threads cooperate and communicate. As a simple illustration of how the model can, in general be applied, we use Object-Z inheritance to integrate the model with the classical producer-consumer system to create a specification directly incorporating the Java concurrency constructs.