992 resultados para Software Transactional Memory (STM)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Software transactional memory (STM) has been proposed as a promising programming paradigm for shared memory multi-threaded programs as an alternative to conventional lock based synchronization primitives. Typical STM implementations employ a conflict detection scheme, which works with uniform access granularity, tracking shared data accesses either at word/cache line or at object level. It is well known that a single fixed access tracking granularity cannot meet the conflicting goals of reducing false conflicts without impacting concurrency adversely. A fine grained granularity while improving concurrency can have an adverse impact on performance due to lock aliasing, lock validation overheads, and additional cache pressure. On the other hand, a coarse grained granularity can impact performance due to reduced concurrency. Thus, in general, a fixed or uniform granularity access tracking (UGAT) scheme is application-unaware and rarely matches the access patterns of individual application or parts of an application, leading to sub-optimal performance for different parts of the application(s). In order to mitigate the disadvantages associated with UGAT scheme, we propose a Variable Granularity Access Tracking (VGAT) scheme in this paper. We propose a compiler based approach wherein the compiler uses inter-procedural whole program static analysis to select the access tracking granularity for different shared data structures of the application based on the application's data access pattern. We describe our prototype VGAT scheme, using TL2 as our STM implementation. Our experimental results reveal that VGAT-STM scheme can improve the application performance of STAMP benchmarks from 1.87% to up to 21.2%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The recent trends of chip architectures with higher number of heterogeneous cores, and non-uniform memory/non-coherent caches, brings renewed attention to the use of Software Transactional Memory (STM) as a fundamental building block for developing parallel applications. Nevertheless, although STM promises to ease concurrent and parallel software development, it relies on the possibility of aborting conflicting transactions to maintain data consistency, which impacts on the responsiveness and timing guarantees required by embedded real-time systems. In these systems, contention delays must be (efficiently) limited so that the response times of tasks executing transactions are upper-bounded and task sets can be feasibly scheduled. In this paper we assess the use of STM in the development of embedded real-time software, defending that the amount of contention can be reduced if read-only transactions access recent consistent data snapshots, progressing in a wait-free manner. We show how the required number of versions of a shared object can be calculated for a set of tasks. We also outline an algorithm to manage conflicts between update transactions that prevents starvation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Transactional memory (TM) is a new synchronization mechanism devised to simplify parallel programming, thereby helping programmers to unleash the power of current multicore processors. Although software implementations of TM (STM) have been extensively analyzed in terms of runtime performance, little attention has been paid to an equally important constraint faced by nearly all computer systems: energy consumption. In this work we conduct a comprehensive study of energy and runtime tradeoff sin software transactional memory systems. We characterize the behavior of three state-of-the-art lock-based STM algorithms, along with three different conflict resolution schemes. As a result of this characterization, we propose a DVFS-based technique that can be integrated into the resolution policies so as to improve the energy-delay product (EDP). Experimental results show that our DVFS-enhanced policies are indeed beneficial for applications with high contention levels. Improvements of up to 59% in EDP can be observed in this scenario, with an average EDP reduction of 16% across the STAMP workloads. © 2012 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Poster for IRP project 'Side-effects in Software Transactional Memory: Extending Deuce with TwilightSTM'

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hardware vendors make an important effort creating low-power CPUs that keep battery duration and durability above acceptable levels. In order to achieve this goal and provide good performance-energy for a wide variety of applications, ARM designed the big.LITTLE architecture. This heterogeneous multi-core architecture features two different types of cores: big cores oriented to performance and little cores, slower and aimed to save energy consumption. As all the cores have access to the same memory, multi-threaded applications must resort to some mutual exclusion mechanism to coordinate the access to shared data by the concurrent threads. Transactional Memory (TM) represents an optimistic approach for shared-memory synchronization. To take full advantage of the features offered by software TM, but also benefit from the characteristics of the heterogeneous big.LITTLE architectures, our focus is to propose TM solutions that take into account the power/performance requirements of the application and what it is offered by the architecture. In order to understand the current state-of-the-art and obtain useful information for future power-aware software TM solutions, we have performed an analysis of a popular TM library running on top of an ARM big.LITTLE processor. Experiments show, in general, better scalability for the LITTLE cores for most of the applications except for one, which requires the computing performance that the big cores offer.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Software transactional memory has the potential to greatly simplify development of concurrent software, by supporting safe composition of concurrent shared-state abstractions. However, STM semantics are defined in terms of low-level reads and writes on individual memory locations, so implementations are unable to take advantage of the properties of user-defined abstractions. Consequently, the performance of transactions over some structures can be disappointing. ----- ----- We present Modular Transactional Memory, our framework which allows programmers to extend STM with concurrency control algorithms tailored to the data structures they use in concurrent programs. We describe our implementation in Concurrent Haskell, and two example structures: a finite map which allows concurrent transactions to operate on disjoint sets of keys, and a non-deterministic channel which supports concurrent sources and sinks. ----- ----- Our approach is based on previous work by others on boosted and open-nested transactions, with one significant development: transactions are given types which denote the concurrency control algorithms they employ. Typed transactions offer a higher level of assurance for programmers reusing transactional code, and allow more flexible abstract concurrency control.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Software transactional memory (STM) is a promising programming paradigm for shared memory multithreaded programs as an alternative to traditional lock based synchronization. However adoption of STM in mainstream software has been quite low due to its considerable overheads and its poor cache/memory performance. In this paper, we perform a detailed study of the cache behavior of STM applications and quantify the impact of different STM factors on the cache misses experienced by the applications. Based on our analysis, we propose a compiler driven Lock-Data Colocation (LDC), targeted at reducing the cache overheads on STM. We show that LDC is effective in improving the cache behavior of STM applications by reducing the dcache miss latency and improving execution time performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Software transactional memory(STM) is a promising programming paradigm for shared memory multithreaded programs. While STM offers the promise of being less error-prone and more programmer friendly compared to traditional lock-based synchronization, it also needs to be competitive in performance in order for it to be adopted in mainstream software. A major source of performance overheads in STM is transactional aborts. Conflict resolution and aborting a transaction typically happens at the transaction level which has the advantage that it is automatic and application agnostic. However it has a substantial disadvantage in that STM declares the entire transaction as conflicting and hence aborts it and re-executes it fully, instead of partially re-executing only those part(s) of the transaction, which have been affected due to the conflict. This "Re-execute Everything" approach has a significant adverse impact on STM performance. In order to mitigate the abort overheads, we propose a compiler aided Selective Reconciliation STM (SR-STM) scheme, wherein certain transactional conflicts can be reconciled by performing partial re-execution of the transaction. Ours is a selective hybrid approach which uses compiler analysis to identify those data accesses which are legal and profitable candidates for reconciliation and applies partial re-execution only to these candidates selectively while other conflicting data accesses are handled by the default STM approach of abort and full re-execution. We describe the compiler analysis and code transformations required for supporting selective reconciliation. We find that SR-STM is effective in reducing the transactional abort overheads by improving the performance for a set of five STAMP benchmarks by 12.58% on an average and up to 22.34%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Software transactional memory (STM) is a promising programming paradigm for shared memory multithreaded programs. In order for STMs to be adopted widely for performance critical software, understanding and improving the cache performance of applications running on STM becomes increasingly crucial, as the performance gap between processor and memory continues to grow. In this paper, we present the most detailed experimental evaluation to date, of the cache behavior of STM applications and quantify the impact of the different STM factors on the cache misses experienced by the applications. We find that STMs are not cache friendly, with the data cache stall cycles contributing to more than 50% of the execution cycles in a majority of the benchmarks. We find that on an average, misses occurring inside the STM account for 62% of total data cache miss latency cycles experienced by the applications and the cache performance is impacted adversely due to certain inherent characteristics of the STM itself. The above observations motivate us to propose a set of specific compiler transformations targeted at making the STMs cache friendly. We find that STM's fine grained and application unaware locking is a major contributor to its poor cache behavior. Hence we propose selective Lock Data co-location (LDC) and Redundant Lock Access Removal (RLAR) to address the lock access misses. We find that even transactions that are completely disjoint access parallel, suffer from costly coherence misses caused by the centralized global time stamp updates and hence we propose the Selective Per-Partition Time Stamp (SPTS) transformation to address this. We show that our transformations are effective in improving the cache behavior of STM applications by reducing the data cache miss latency by 20.15% to 37.14% and improving execution time by 18.32% to 33.12% in five of the 8 STAMP applications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Software Transactional Memory (STM) systems have poor performance under high contention scenarios. Since many transactions compete for the same data, most of them are aborted, wasting processor runtime. Contention management policies are typically used to avoid that, but they are passive approaches as they wait for an abort to happen so they can take action. More proactive approaches have emerged, trying to predict when a transaction is likely to abort so its execution can be delayed. Such techniques are limited, as they do not replace the doomed transaction by another or, when they do, they rely on the operating system for that, having little or no control on which transaction should run. In this paper we propose LUTS, a Lightweight User-Level Transaction Scheduler, which is based on an execution context record mechanism. Unlike other techniques, LUTS provides the means for selecting another transaction to run in parallel, thus improving system throughput. Moreover, it avoids most of the issues caused by pseudo parallelism, as it only launches as many system-level threads as the number of available processor cores. We discuss LUTS design and present three conflict-avoidance heuristics built around LUTS scheduling capabilities. Experimental results, conducted with STMBench7 and STAMP benchmark suites, show LUTS efficiency when running high contention applications and how conflict-avoidance heuristics can improve STM performance even more. In fact, our transaction scheduling techniques are capable of improving program performance even in overloaded scenarios. © 2011 Springer-Verlag.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La memoria transaccional (TM) constituye un paradigma de concurrencia optimista en arquitecturas multinúcleo que puede ser de utilidad en la explotación de paralelismo en aplicaciones irregulares, en las que la información sobre las dependencias de datos no está disponible hasta la ejecución. Este trabajo presenta y discute cómo aprovechar las características de un sistema STM (software transactio- nal memory) en patrones de computación que involucren operaciones de reducción, ligadas frecuentemente a aplicaciones irregulares. Con el fin de comparar el uso de enfoques STM en esta clase de patrones con otras soluciones más clásicas, se ha implementa do como prueba de concepto un sistema STM, que denominaremos ReduxSTM, que combina dos ideas: una consolidación (commit) ordenada de las transacciones, que asegura una equivalencia con la ejecución secuencial del código; y una extensión del mecanismo de privatización subyacente al sistema STM que contempla las operaciones de reducción.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Current industry proposals for Hardware Transactional Memory (HTM) focus on best-effort solutions (BE-HTM) where hardware limits are imposed on transactions. These designs may show a significant performance degradation due to high contention scenarios and different hardware and operating system limitations that abort transactions, e.g. cache overflows, hardware and software exceptions, etc. To deal with these events and to ensure forward progress, BE-HTM systems usually provide a software fallback path to execute a lock-based version of the code. In this paper, we propose a hardware implementation of an irrevocability mechanism as an alternative to the software fallback path to gain insight into the hardware improvements that could enhance the execution of such a fallback. Our mechanism anticipates the abort that causes the transaction serialization, and stalls other transactions in the system so that transactional work loss is mini- mized. In addition, we evaluate the main software fallback path approaches and propose the use of ticket locks that hold precise information of the number of transactions waiting to enter the fallback. Thus, the separation of transactional and fallback execution can be achieved in a precise manner. The evaluation is carried out using the Simics/GEMS simulator and the complete range of STAMP transactional suite benchmarks. We obtain significant performance benefits of around twice the speedup and an abort reduction of 50% over the software fallback path for a number of benchmarks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Software transaction memory (STM) systems have been used as an approach to improve performance, by allowing the concurrent execution of atomic blocks. However, under high-contention workloads, STM-based systems can considerably degrade performance, as transaction conflict rate increases. Contention management policies have been used as a way to select which transaction to abort when a conflict occurs. In general, contention managers are not capable of avoiding conflicts, as they can only select which transaction to abort and the moment it should restart. Since contention managers act only after a conflict is detected, it becomes harder to effectively increase transaction throughput. More proactive approaches have emerged, aiming at predicting when a transaction is likely to abort, postponing its execution. Nevertheless, most of the proposed proactive techniques are limited, as they do not replace the doomed transaction by another or, when they do, they rely on the operating system for that, having little or no control on which transaction to run. This article proposes LUTS, a lightweight user-level transaction scheduler. Unlike other techniques, LUTS provides the means for selecting another transaction to run in parallel, thus improving system throughput. We discuss LUTS design and propose a dynamic conflict-avoidance heuristic built around its scheduling capabilities. Experimental results, conducted with the STAMP and STMBench7 benchmark suites, running on TinySTM and SwissTM, show how our conflict-avoidance heuristic can effectively improve STM performance on high contention applications. © 2012 Springer Science+Business Media, LLC.

Relevância:

100.00% 100.00%

Publicador: