3 resultados para thread
em Glasgow Theses Service
Resumo:
Processors with large numbers of cores are becoming commonplace. In order to utilise the available resources in such systems, the programming paradigm has to move towards increased parallelism. However, increased parallelism does not necessarily lead to better performance. Parallel programming models have to provide not only flexible ways of defining parallel tasks, but also efficient methods to manage the created tasks. Moreover, in a general-purpose system, applications residing in the system compete for the shared resources. Thread and task scheduling in such a multiprogrammed multithreaded environment is a significant challenge. In this thesis, we introduce a new task-based parallel reduction model, called the Glasgow Parallel Reduction Machine (GPRM). Our main objective is to provide high performance while maintaining ease of programming. GPRM supports native parallelism; it provides a modular way of expressing parallel tasks and the communication patterns between them. Compiling a GPRM program results in an Intermediate Representation (IR) containing useful information about tasks, their dependencies, as well as the initial mapping information. This compile-time information helps reduce the overhead of runtime task scheduling and is key to high performance. Generally speaking, the granularity and the number of tasks are major factors in achieving high performance. These factors are even more important in the case of GPRM, as it is highly dependent on tasks, rather than threads. We use three basic benchmarks to provide a detailed comparison of GPRM with Intel OpenMP, Cilk Plus, and Threading Building Blocks (TBB) on the Intel Xeon Phi, and with GNU OpenMP on the Tilera TILEPro64. GPRM shows superior performance in almost all cases, only by controlling the number of tasks. GPRM also provides a low-overhead mechanism, called “Global Sharing”, which improves performance in multiprogramming situations. We use OpenMP, as the most popular model for shared-memory parallel programming as the main GPRM competitor for solving three well-known problems on both platforms: LU factorisation of Sparse Matrices, Image Convolution, and Linked List Processing. We focus on proposing solutions that best fit into the GPRM’s model of execution. GPRM outperforms OpenMP in all cases on the TILEPro64. On the Xeon Phi, our solution for the LU Factorisation results in notable performance improvement for sparse matrices with large numbers of small blocks. We investigate the overhead of GPRM’s task creation and distribution for very short computations using the Image Convolution benchmark. We show that this overhead can be mitigated by combining smaller tasks into larger ones. As a result, GPRM can outperform OpenMP for convolving large 2D matrices on the Xeon Phi. Finally, we demonstrate that our parallel worksharing construct provides an efficient solution for Linked List processing and performs better than OpenMP implementations on the Xeon Phi. The results are very promising, as they verify that our parallel programming framework for manycore processors is flexible and scalable, and can provide high performance without sacrificing productivity.
Resumo:
Cache-coherent non uniform memory access (ccNUMA) architecture is a standard design pattern for contemporary multicore processors, and future generations of architectures are likely to be NUMA. NUMA architectures create new challenges for managed runtime systems. Memory-intensive applications use the system’s distributed memory banks to allocate data, and the automatic memory manager collects garbage left in these memory banks. The garbage collector may need to access remote memory banks, which entails access latency overhead and potential bandwidth saturation for the interconnection between memory banks. This dissertation makes five significant contributions to garbage collection on NUMA systems, with a case study implementation using the Hotspot Java Virtual Machine. It empirically studies data locality for a Stop-The-World garbage collector when tracing connected objects in NUMA heaps. First, it identifies a locality richness which exists naturally in connected objects that contain a root object and its reachable set— ‘rooted sub-graphs’. Second, this dissertation leverages the locality characteristic of rooted sub-graphs to develop a new NUMA-aware garbage collection mechanism. A garbage collector thread processes a local root and its reachable set, which is likely to have a large number of objects in the same NUMA node. Third, a garbage collector thread steals references from sibling threads that run on the same NUMA node to improve data locality. This research evaluates the new NUMA-aware garbage collector using seven benchmarks of an established real-world DaCapo benchmark suite. In addition, evaluation involves a widely used SPECjbb benchmark and Neo4J graph database Java benchmark, as well as an artificial benchmark. The results of the NUMA-aware garbage collector on a multi-hop NUMA architecture show an average of 15% performance improvement. Furthermore, this performance gain is shown to be as a result of an improved NUMA memory access in a ccNUMA system. Fourth, the existing Hotspot JVM adaptive policy for configuring the number of garbage collection threads is shown to be suboptimal for current NUMA machines. The policy uses outdated assumptions and it generates a constant thread count. In fact, the Hotspot JVM still uses this policy in the production version. This research shows that the optimal number of garbage collection threads is application-specific and configuring the optimal number of garbage collection threads yields better collection throughput than the default policy. Fifth, this dissertation designs and implements a runtime technique, which involves heuristics from dynamic collection behavior to calculate an optimal number of garbage collector threads for each collection cycle. The results show an average of 21% improvements to the garbage collection performance for DaCapo benchmarks.
Resumo:
This thesis compares contemporary anglophone and francophone rewritings of traditional fairy tales for adults. Examining material dating from the 1990s to the present, including novels, novellas, short stories, comics, televisual and filmic adaptations, this thesis argues that while the revisions studied share similar themes and have comparable aims, the methods for inducing wonder (where wonder is defined as the effect produced by the text rather than simply its magical contents) are diametrically opposed, and it is this opposition that characterises the difference between the two types of rewriting. While they all engage with the hybridity of the fairy-tale genre, the anglophone works studied tend to question traditional narratives by keeping the fantasy setting, while francophone works debunk the tales not only in relation to questions of content, but also aesthetics. Through theoretical, historical, and cultural contextualisation, along with close readings of the texts, this thesis aims to demonstrate the existence of this francophone/anglophone divide and to explain how and why the authors in each tradition tend to adopt such different views while rewriting similar material. This division is the guiding thread of the thesis and also functions as a springboard to explore other concepts such as genre hybridity, reader-response, and feminism. The thesis is divided into two parts; the first three chapters work as an in-depth literature review: after examining, in chapters one and two, the historical and contemporary cultural field in which these works were created, chapter three examines theories of fantasy and genre hybridity. The second part of the thesis consists of textual studies and comparisons between francophone and anglophone material and is built on three different approaches. The first (chapter four) looks at selected texts in relation to questions of form, studying the process of world building and world creation enacted when authors combine and rewrite several fairy tales in a single narrative world. The second (chapter five) is a thematic approach which investigates the interactions between femininity, the monstrous, and the wondrous in contemporary tales of animal brides. Finally, chapter six compares rewritings of the tale of ‘Bluebeard’ with a comparison hinged on the representation of the forbidden room and its contents: Bluebeard’s cabinet of wonder is one that he holds sacred, one where he sublimates his wives’ corpses, and it is the catalyst of wonder, terror, and awe. The three contextual chapters and the three text-based studies work towards tracing the tangible existence of the division postulated between francophone and anglophone texts, but also the similarities that exist between the two cultural fields and their roles in the renewal of the fairy-tale genre.