2 resultados para implementations
em Glasgow Theses Service
Resumo:
Processors with large numbers of cores are becoming commonplace. In order to utilise the available resources in such systems, the programming paradigm has to move towards increased parallelism. However, increased parallelism does not necessarily lead to better performance. Parallel programming models have to provide not only flexible ways of defining parallel tasks, but also efficient methods to manage the created tasks. Moreover, in a general-purpose system, applications residing in the system compete for the shared resources. Thread and task scheduling in such a multiprogrammed multithreaded environment is a significant challenge. In this thesis, we introduce a new task-based parallel reduction model, called the Glasgow Parallel Reduction Machine (GPRM). Our main objective is to provide high performance while maintaining ease of programming. GPRM supports native parallelism; it provides a modular way of expressing parallel tasks and the communication patterns between them. Compiling a GPRM program results in an Intermediate Representation (IR) containing useful information about tasks, their dependencies, as well as the initial mapping information. This compile-time information helps reduce the overhead of runtime task scheduling and is key to high performance. Generally speaking, the granularity and the number of tasks are major factors in achieving high performance. These factors are even more important in the case of GPRM, as it is highly dependent on tasks, rather than threads. We use three basic benchmarks to provide a detailed comparison of GPRM with Intel OpenMP, Cilk Plus, and Threading Building Blocks (TBB) on the Intel Xeon Phi, and with GNU OpenMP on the Tilera TILEPro64. GPRM shows superior performance in almost all cases, only by controlling the number of tasks. GPRM also provides a low-overhead mechanism, called “Global Sharing”, which improves performance in multiprogramming situations. We use OpenMP, as the most popular model for shared-memory parallel programming as the main GPRM competitor for solving three well-known problems on both platforms: LU factorisation of Sparse Matrices, Image Convolution, and Linked List Processing. We focus on proposing solutions that best fit into the GPRM’s model of execution. GPRM outperforms OpenMP in all cases on the TILEPro64. On the Xeon Phi, our solution for the LU Factorisation results in notable performance improvement for sparse matrices with large numbers of small blocks. We investigate the overhead of GPRM’s task creation and distribution for very short computations using the Image Convolution benchmark. We show that this overhead can be mitigated by combining smaller tasks into larger ones. As a result, GPRM can outperform OpenMP for convolving large 2D matrices on the Xeon Phi. Finally, we demonstrate that our parallel worksharing construct provides an efficient solution for Linked List processing and performs better than OpenMP implementations on the Xeon Phi. The results are very promising, as they verify that our parallel programming framework for manycore processors is flexible and scalable, and can provide high performance without sacrificing productivity.
Resumo:
The next generation of vehicles will be equipped with automated Accident Warning Systems (AWSs) capable of warning neighbouring vehicles about hazards that might lead to accidents. The key enabling technology for these systems is the Vehicular Ad-hoc Networks (VANET) but the dynamics of such networks make the crucial timely delivery of warning messages challenging. While most previously attempted implementations have used broadcast-based data dissemination schemes, these do not cope well as data traffic load or network density increases. This problem of sending warning messages in a timely manner is addressed by employing a network coding technique in this thesis. The proposed NETwork COded DissEmination (NETCODE) is a VANET-based AWS responsible for generating and sending warnings to the vehicles on the road. NETCODE offers an XOR-based data dissemination scheme that sends multiple warning in a single transmission and therefore, reduces the total number of transmissions required to send the same number of warnings that broadcast schemes send. Hence, it reduces contention and collisions in the network improving the delivery time of the warnings. The first part of this research (Chapters 3 and 4) asserts that in order to build a warning system, it is needful to ascertain the system requirements, information to be exchanged, and protocols best suited for communication between vehicles. Therefore, a study of these factors along with a review of existing proposals identifying their strength and weakness is carried out. Then an analysis of existing broadcast-based warning is conducted which concludes that although this is the most straightforward scheme, loading can result an effective collapse, resulting in unacceptably long transmission delays. The second part of this research (Chapter 5) proposes the NETCODE design, including the main contribution of this thesis, a pair of encoding and decoding algorithms that makes the use of an XOR-based technique to reduce transmission overheads and thus allows warnings to get delivered in time. The final part of this research (Chapters 6--8) evaluates the performance of the proposed scheme as to how it reduces the number of transmissions in the network in response to growing data traffic load and network density and investigates its capacity to detect potential accidents. The evaluations use a custom-built simulator to model real-world scenarios such as city areas, junctions, roundabouts, motorways and so on. The study shows that the reduction in the number of transmissions helps reduce competition in the network significantly and this allows vehicles to deliver warning messages more rapidly to their neighbours. It also examines the relative performance of NETCODE when handling both sudden event-driven and longer-term periodic messages in diverse scenarios under stress caused by increasing numbers of vehicles and transmissions per vehicle. This work confirms the thesis' primary contention that XOR-based network coding provides a potential solution on which a more efficient AWS data dissemination scheme can be built.