49 resultados para Parallelism
Resumo:
A number of high-performance VLSI architectures for real-time image coding applications are described. In particular, attention is focused on circuits for computing the 2-D DCT (discrete cosine transform) and for 2-D vector quantization. The former circuits are based on Winograd algorithms and comprise a number of bit-level systolic arrays with a bit-serial, word-parallel input. The latter circuits exhibit a similar data organization and consist of a number of inner product array circuits. Both circuits are highly regular and allow extremely high data rates to be achieved through extensive use of parallelism.
Resumo:
The study of parallel evolution facilitates the discovery of common rules of diversification. Here, we examine the repeated evolution of thick lips in Midas cichlid fishes (the Amphilophus citrinellus species complex) - from two Great Lakes and two crater lakes in Nicaragua - to assess whether similar changes in ecology, phenotypic trophic traits and gene expression accompany parallel trait evolution. Using next-generation sequencing technology, we characterize transcriptome-wide differential gene expression in the lips of wild-caught sympatric thick- and thin-lipped cichlids from all four instances of repeated thick-lip evolution. Six genes (apolipoprotein D, myelin-associated glycoprotein precursor, four-and-a-half LIM domain protein 2, calpain-9, GTPase IMAP family member 8-like and one hypothetical protein) are significantly underexpressed in the thick-lipped morph across all four lakes. However, other aspects of lips' gene expression in sympatric morphs differ in a lake-specific pattern, including the magnitude of differentially expressed genes (97-510). Generally, fewer genes are differentially expressed among morphs in the younger crater lakes than in those from the older Great Lakes. Body shape, lower pharyngeal jaw size and shape, and stable isotopes (dC and dN) differ between all sympatric morphs, with the greatest differentiation in the Great Lake Nicaragua. Some ecological traits evolve in parallel (those related to foraging ecology; e.g. lip size, body and head shape) but others, somewhat surprisingly, do not (those related to diet and food processing; e.g. jaw size and shape, stable isotopes). Taken together, this case of parallelism among thick- and thin-lipped cichlids shows a mosaic pattern of parallel and nonparallel evolution. © 2012 Blackwell Publishing Ltd.
Resumo:
With the emergence of multicore and manycore processors, engineers must design and develop software in drastically new ways to benefit from the computational power of all cores. However, developing parallel software is much harder than sequential software because parallelism can't be abstracted away easily. Authors Hans Vandierendonck and Tom Mens provide an overview of technologies and tools to support developers in this complex and error-prone task. © 2012 IEEE.
Resumo:
We describe an approach aimed at addressing the issue of joint exploitation of control (stream) and data parallelism in a skeleton based parallel programming environment, based on annotations and refactoring. Annotations drive efficient implementation of a parallel computation. Refactoring is used to transform the associated skeleton tree into a more efficient, functionally equivalent skeleton tree. In most cases, cost models are used to drive the refactoring process. We show how sample use case applications/kernels may be optimized and discuss preliminary experiments with FastFlow assessing the theoretical results. © 2013 Springer-Verlag.
Resumo:
Recent trends towards increasingly parallel computers mean that there needs to be a seismic shift in programming practice. The time is rapidly approaching when most programming will be for parallel systems. However, most programming techniques in use today are geared towards sequential, or occasionally small-scale parallel, programming. While refactoring has so far mainly been applied to sequential programs, it is our contention that refactoring can play a key role in significantly improving the programmability of parallel systems, by allowing the programmer to apply a set of well-defined transformations in order to parallelise their programs. In this paper, we describe a new language-independent refactoring approach that helps introduce and tune parallelism through high-level design patterns targeting a set of well-specified parallel skeletons. We believe this new refactoring process is the key to allowing programmers to truly start thinking in parallel. © 2012 ACM.
Resumo:
We describe a lightweight prototype framework (LIBERO) designed for experimentation with behavioural skeletons-components implementing a well-known parallelism exploitation pattern and a rule-based autonomic manager taking care of some non-functional feature related to pattern computation. LIBERO supports multiple autonomic managers within the same behavioural skeleton, each taking care of a different non-functional concern. We introduce LIBERO-built on plain Java and JBoss-and discuss how multiple managers may be coordinated to achieve a common goal using a two-phase coordination protocol developed in earlier work. We present experimental results that demonstrate how the prototype may be used to investigate autonomic management of multiple, independent concerns. © 2011 Springer-Verlag Berlin Heidelberg.
Resumo:
Recent trends in computing systems, such as multi-core processors and cloud computing, expose tens to thousands of processors to the software. Software developers must respond by introducing parallelism in their software. To obtain highest performance, it is not only necessary to identify parallelism, but also to reason about synchronization between threads and the communication of data from one thread to another. This entry gives an overview on some of the most common abstractions that are used in parallel programming, namely explicit vs. implicit expression of parallelism and shared and distributed memory. Several parallel programming models are reviewed and categorized by means of these abstractions. The pros and cons of parallel programming models from the perspective of performance and programmability are discussed.
Resumo:
This chapter analyses Marvell’s linguistic ingenuity as exemplified by his Latin poetic corpus. Here, it is argued, a pseudo Lucretian sensitivity to the parallelism between the structure of Latin words and the structure of the world co-exists with a linguistic methodology that is essentially Marinesque. Close examination of the Latin poems as a whole assesses the nature and significance of etymological play, paronomasia, puns on juxtaposed Latin words, on place names, and on personal names. It is suggested that such devices demonstrate ways in which the neo-Latin poetic text can serve both as a linguistic microcosm of the literary contexts in which they are employed, and as a re-invention of the artifice, extravagant conceits, and baroque wit of Marinism. The result is a neo-Latin ‘echoing song’ that is both intra- and intertextual. Through bilingual punning and phonological wit Marvell plays with a classical language only to demonstrate its transformative potential. The chapter concludes by offering a new reading of Hortus in relation to the garden sections of Marino’s L’Adone, in which an extravagantly luscious setting confounds the senses and is mirrored linguistically by word-clusters and labyrinthine punning.
Resumo:
This paper describes the ParaPhrase project, a new 3-year targeted research project funded under EU Framework 7 Objective 3.4 (Computer Systems), starting in October 2011. ParaPhrase aims to follow a new approach to introducing parallelism using advanced refactoring techniques coupled with high-level parallel design patterns. The refactoring approach will use these design patterns to restructure programs defined as networks of software components into other forms that are more suited to parallel execution. The programmer will be aided by high-level cost information that will be integrated into the refactoring tools. The implementation of these patterns will then use a well-understood algorithmic skeleton approach to achieve good parallelism. A key ParaPhrase design goal is that parallel components are intended to match heterogeneous architectures, defined in terms of CPU/GPU combinations, for example. In order to achieve this, the ParaPhrase approach will map components at link time to the available hardware, and will then re-map them during program execution, taking account of multiple applications, changes in hardware resource availability, the desire to reduce communication costs etc. In this way, we aim to develop a new approach to programming that will be able to produce software that can adapt to dynamic changes in the system environment. Moreover, by using a strong component basis for parallelism, we can achieve potentially significant gains in terms of reducing sharing at a high level of abstraction, and so in reducing or even eliminating the costs that are usually associated with cache management, locking, and synchronisation. © 2013 Springer-Verlag Berlin Heidelberg.
Resumo:
Processor architectures has taken a turn towards many-core processors, which integrate multiple processing cores on a single chip to increase overall performance, and there are no signs that this trend will stop in the near future. Many-core processors are harder to program than multi-core and single-core processors due to the need of writing parallel or concurrent programs with high degrees of parallelism. Moreover, many-cores have to operate in a mode of strong scaling because of memory bandwidth constraints. In strong scaling increasingly finer-grain parallelism must be extracted in order to keep all processing cores busy.
Task dataflow programming models have a high potential to simplify parallel program- ming because they alleviate the programmer from identifying precisely all inter-task de- pendences when writing programs. Instead, the task dataflow runtime system detects and enforces inter-task dependences during execution based on the description of memory each task accesses. The runtime constructs a task dataflow graph that captures all tasks and their dependences. Tasks are scheduled to execute in parallel taking into account dependences specified in the task graph.
Several papers report important overheads for task dataflow systems, which severely limits the scalability and usability of such systems. In this paper we study efficient schemes to manage task graphs and analyze their scalability. We assume a programming model that supports input, output and in/out annotations on task arguments, as well as commutative in/out and reductions. We analyze the structure of task graphs and identify versions and generations as key concepts for efficient management of task graphs. Then, we present three schemes to manage task graphs building on graph representations, hypergraphs and lists. We also consider a fourth edge-less scheme that synchronizes tasks using integers. Analysis using micro-benchmarks shows that the graph representation is not always scalable and that the edge-less scheme introduces least overhead in nearly all situations.
Resumo:
The paper presents IPPro which is a high performance, scalable soft-core processor targeted for image processing applications. It has been based on the Xilinx DSP48E1 architecture using the ZYNQ Field Programmable Gate Array and is a scalar 16-bit RISC processor that operates at 526MHz, giving 526MIPS of performance. Each IPPro core uses 1 DSP48, 1 Block RAM and 330 Kintex-7 slice-registers, thus making the processor as compact as possible whilst maintaining flexibility and programmability. A key aspect of the approach is in reducing the application design time and implementation effort by using multiple IPPro processors in a SIMD mode. For different applications, this allows us to exploit different levels of parallelism and mapping for the specified processing architecture with the supported instruction set. In this context, a Traffic Sign Recognition (TSR) algorithm has been prototyped on a Zedboard with the colour and morphology operations accelerated using multiple IPPros. Simulation and experimental results demonstrate that the processing platform is able to achieve a speedup of 15 to 33 times for colour filtering and morphology operations respectively, with a reduced design effort and time.
Resumo:
This paper presents a new programming methodology for introducing and tuning parallelism in Erlang programs, using source-level code refactoring from sequential source programs to parallel programs written using our skeleton library, Skel. High-level cost models allow us to predict with reasonable accuracy the parallel performance of the refactored program, enabling programmers to make informed decisions about which refactorings to apply. Using our approach, we demonstrate easily obtainable, significant and scalable speedups of up to 21 on a 24-core machine over the sequential code.
Resumo:
We propose a methodology for optimizing the execution of data parallel (sub-)tasks on CPU and GPU cores of the same heterogeneous architecture. The methodology is based on two main components: i) an analytical performance model for scheduling tasks among CPU and GPU cores, such that the global execution time of the overall data parallel pattern is optimized; and ii) an autonomic module which uses the analytical performance model to implement the data parallel computations in a completely autonomic way, requiring no programmer intervention to optimize the computation across CPU and GPU cores. The analytical performance model uses a small set of simple parameters to devise a partitioning-between CPU and GPU cores-of the tasks derived from structured data parallel patterns/algorithmic skeletons. The model takes into account both hardware related and application dependent parameters. It computes the percentage of tasks to be executed on CPU and GPU cores such that both kinds of cores are exploited and performance figures are optimized. The autonomic module, implemented in FastFlow, executes a generic map (reduce) data parallel pattern scheduling part of the tasks to the GPU and part to CPU cores so as to achieve optimal execution time. Experimental results on state-of-the-art CPU/GPU architectures are shown that assess both performance model properties and autonomic module effectiveness. © 2013 IEEE.
Resumo:
With security and surveillance, there is an increasing need to be able to process image data efficiently and effectively either at source or in a large data networks. Whilst Field Programmable Gate Arrays have been seen as a key technology for enabling this, they typically use high level and/or hardware description language synthesis approaches; this provides a major disadvantage in terms of the time needed to design or program them and to verify correct operation; it considerably reduces the programmability capability of any technique based on this technology. The work here proposes a different approach of using optimised soft-core processors which can be programmed in software. In particular, the paper proposes a design tool chain for programming such processors that uses the CAL Actor Language as a starting point for describing an image processing algorithm and targets its implementation to these custom designed, soft-core processors on FPGA. The main purpose is to exploit the task and data parallelism in order to achieve the same parallelism as a previous HDL implementation but avoiding the design time, verification and debugging steps associated with such approaches.
Resumo:
Question: How parallel is adaptive evolution when it occurs from different genetic backgrounds?
Background: Divergent evolutionary lineages of several post-glacial fish species including the threespine stickleback are found together in Ireland.
Goals: To investigate the morphological diversity of stickleback populations in Ireland and assess whether morphology evolved in parallel between evolutionary lineages.
Methods: We sampled stickleback from lake, river, and coastal habitats across Ireland. Microsatellite and mitochondrial DNA data revealed evolutionary history. Geometric morphometrics and linear trait measurements characterized morphology. We used a multivariate approach to quantify parallel and non-parallel divergence within and between lineages.
Results: Repeated evolution of similar morphologies in similar habitats occurred across Ireland, concordant with patterns observed elsewhere in the stickleback distribution. A strong pattern of habitat-specific morphology existed even among divergent lineages. Furthermore, a strong signal of shared morphological divergence occurred along a marine–freshwater axis. Evidently, deterministic natural selection played a more important role in driving freshwater adaptation than independent evolutionary history.