2 resultados para NPB (NAS parallel benchmarks)
em Glasgow Theses Service
Resumo:
Processors with large numbers of cores are becoming commonplace. In order to utilise the available resources in such systems, the programming paradigm has to move towards increased parallelism. However, increased parallelism does not necessarily lead to better performance. Parallel programming models have to provide not only flexible ways of defining parallel tasks, but also efficient methods to manage the created tasks. Moreover, in a general-purpose system, applications residing in the system compete for the shared resources. Thread and task scheduling in such a multiprogrammed multithreaded environment is a significant challenge. In this thesis, we introduce a new task-based parallel reduction model, called the Glasgow Parallel Reduction Machine (GPRM). Our main objective is to provide high performance while maintaining ease of programming. GPRM supports native parallelism; it provides a modular way of expressing parallel tasks and the communication patterns between them. Compiling a GPRM program results in an Intermediate Representation (IR) containing useful information about tasks, their dependencies, as well as the initial mapping information. This compile-time information helps reduce the overhead of runtime task scheduling and is key to high performance. Generally speaking, the granularity and the number of tasks are major factors in achieving high performance. These factors are even more important in the case of GPRM, as it is highly dependent on tasks, rather than threads. We use three basic benchmarks to provide a detailed comparison of GPRM with Intel OpenMP, Cilk Plus, and Threading Building Blocks (TBB) on the Intel Xeon Phi, and with GNU OpenMP on the Tilera TILEPro64. GPRM shows superior performance in almost all cases, only by controlling the number of tasks. GPRM also provides a low-overhead mechanism, called “Global Sharing”, which improves performance in multiprogramming situations. We use OpenMP, as the most popular model for shared-memory parallel programming as the main GPRM competitor for solving three well-known problems on both platforms: LU factorisation of Sparse Matrices, Image Convolution, and Linked List Processing. We focus on proposing solutions that best fit into the GPRM’s model of execution. GPRM outperforms OpenMP in all cases on the TILEPro64. On the Xeon Phi, our solution for the LU Factorisation results in notable performance improvement for sparse matrices with large numbers of small blocks. We investigate the overhead of GPRM’s task creation and distribution for very short computations using the Image Convolution benchmark. We show that this overhead can be mitigated by combining smaller tasks into larger ones. As a result, GPRM can outperform OpenMP for convolving large 2D matrices on the Xeon Phi. Finally, we demonstrate that our parallel worksharing construct provides an efficient solution for Linked List processing and performs better than OpenMP implementations on the Xeon Phi. The results are very promising, as they verify that our parallel programming framework for manycore processors is flexible and scalable, and can provide high performance without sacrificing productivity.
Resumo:
Vertebrate genomes are organised into a variety of nuclear environments and chromatin states that have profound effects on the regulation of gene transcription. This variation presents a major challenge to the expression of transgenes for experimental research, genetic therapies and the production of biopharmaceuticals. The majority of transgenes succumb to transcriptional silencing by their chromosomal environment when they are randomly integrated into the genome, a phenomenon known as chromosomal position effect (CPE). It is not always feasible to target transgene integration to transcriptionally permissive “safe harbour” loci that favour transgene expression, so there remains an unmet need to identify gene regulatory elements that can be added to transgenes which protect them against CPE. Dominant regulatory elements (DREs) with chromatin barrier (or boundary) activity have been shown to protect transgenes from CPE. The HS4 element from the chicken beta-globin locus and the A2UCOE element from a human housekeeping gene locus have been shown to function as DRE barriers in a wide variety of cell types and species. Despite rapid advances in the profiling of transcription factor binding, chromatin states and chromosomal looping interactions, progress towards functionally validating the many candidate barrier elements in vertebrates has been very slow. This is largely due to the lack of a tractable and efficient assay for chromatin barrier activity. In this study, I have developed the RGBarrier assay system to test the chromatin barrier activity of candidate DREs at pre-defined isogenic loci in human cells. The RGBarrier assay consists in a Flp-based RMCE reaction for the integration of an expression construct, carrying candidate DREs, in a pre-characterised chromosomal location. The RGBarrier system involves the tracking of red, green and blue fluorescent proteins by flow cytometry to monitor on-target versus off-target integration and transgene expression. The analysis of the reporter (GFP) expression for several weeks gives a measure of the protective ability of each candidate elements from chromosomal silencing. This assay can be scaled up to test tens of new putative barrier elements in the same chromosomal context in parallel. The defined chromosomal contexts of the RGBarrier assays will allow for detailed mechanistic studies of chromosomal silencing and DRE barrier element action. Understanding these mechanisms will be of paramount importance for the design of specific solutions for overcoming chromosomal silencing in specific transgenic applications.