13 resultados para fastflow
Resumo:
FastFlow is a structured parallel programming framework targeting shared memory multi-core architectures. In this paper we introduce a FastFlow extension aimed at supporting also a network of multi-core workstations. The extension supports the execution of FastFlow programs by coordinating-in a structured way-the fine grain parallel activities running on a single workstation. We discuss the design and the implementation of this extension presenting preliminary experimental results validating it on state-of-the-art networked multi-core nodes. © 2013 Springer-Verlag.
Resumo:
FastFlow is a programming framework specifically targeting cache-coherent shared-memory multi-cores. It is implemented as a stack of C++ template libraries built on top of lock-free (and memory fence free) synchronization mechanisms. Its philosophy is to combine programmability with performance. In this paper a new FastFlow programming methodology aimed at supporting parallelization of existing sequential code via offloading onto a dynamically created software accelerator is presented. The new methodology has been validated using a set of simple micro-benchmarks and some real applications. © 2011 Springer-Verlag.
Resumo:
以蛋白质为基础的分析生物器件,如传感器和生物芯片等,为人们提供了有效的分析技术平台。而蛋白质固定的均质性则是评估分析生物器件质量的一个主要指标。因此,本实验以两种蛋白质为模式蛋白研究蛋白质固定的均质性问题。通过基因操纵构建融合蛋白Protein-Linker-Cysteine。在此设计中,半胱氨酸提供的自由疏基能够在金表面形成Au一S键,在琉基修饰的玻片表面形成-S-S-键实现蛋白质的均质定向固定;Linker可减少基因修饰对蛋白质折叠的影响。构建表达载体pPIC-GOxm(GOx-Linker-Cysteine),利用原生质体转化法将其转进毕氏酵母Pichia Pastoris,采用QSepharoseTM FastFlow阴离子交换柱纯化融合蛋白。动力学性质分析表明GOxm具有与野生型葡萄糖氧化酶相类似的Km和Kcat值,电化学实验结果显示Goxm传感器具有较高的响应电流;GOxm传感器具有较好的互换性,其相对误差为9.48%,GOxw(wild type GOx)相对误差为19.98%,而传统传感器的相对误差为17.54%。原子力显微镜图像显示融合蛋白GOxm能够利用金表面的HC尸位点形成类似六边型晶格的自组装单分子层,而野生型GOxw在金表面为非特异性吸附,形成多层固定导致分子间的聚集。通过利用-S-S-和非特异性吸附,分别制成GOxm蛋白芯片和Goxw蛋白芯片。酶学显色后,通过光学信号评估芯片的均质性,结果表明Goxm能够利用-S-S-形成均质定向固定,10次重复的变异系数小于60k,而GOxw则不能形成均质固定,点阵间的变异系数变化幅度非常大,从40%到80%。构建表达载体pET-BLC,pET-BL。将其转化进大肠杆菌AD494中。原子力显微镜研究整合有磷脂和经抽提去掉磷脂的蛋白在金表面的固定。原子力显微镜图像显示融合蛋白BLC能够利用Au-S键在金表面形成均匀固定,而野生型蛋白在金表面不能形成均匀的固定。蛋白质在金表面的固定受金表面拓扑结构和磷脂的影响。以上的实验结果表明通过此种固定方法可改善分析生物器件的均质性,提高其质量。
Resumo:
将黑曲霉葡萄糖氧化酶 (GOD)基因重组进大肠杆菌 酵母穿梭质粒pPIC9,转化甲基营养酵母Pichiapasto risGS115 ,构建出GOD的高产酵母工程菌株。在酵母α Factor及AOX1基因启动子和终止信号的调控下 ,黑曲霉GOD在甲基酵母中大量表达并分泌至胞外 ,经甲醇诱导 3~ 4d ,发酵液中的GOD活力可达 30~ 40u mL。SDS PAGE证实GOD在培养物上清中的含量显著高于其它杂蛋白 ,约占胞外蛋白总量的 6 0 %~ 70 % ,经QSepharoseTM FastFlow离子交换柱一步纯化即达电泳纯。重组酵母GOD比活达 42 6 6 3u mg蛋白 ,是商品黑曲霉GOD的 1 6倍。动力学性质分析表明 ,重组酵母GOD的Km 和kcat分别为 38 2 5mmol L和 34 92 6 6s- 1 ,与商品黑曲霉GOD相比 ,具有更高的催化效率。重组酵母GOD的高活力特性可有效提高葡萄糖传感器的线性检测范围。
Resumo:
We describe an approach aimed at addressing the issue of joint exploitation of control (stream) and data parallelism in a skeleton based parallel programming environment, based on annotations and refactoring. Annotations drive efficient implementation of a parallel computation. Refactoring is used to transform the associated skeleton tree into a more efficient, functionally equivalent skeleton tree. In most cases, cost models are used to drive the refactoring process. We show how sample use case applications/kernels may be optimized and discuss preliminary experiments with FastFlow assessing the theoretical results. © 2013 Springer-Verlag.
Resumo:
The use of efficient synchronization mechanisms is crucial for implementing fine grained parallel programs on modern shared cache multi-core architectures. In this paper we study this problem by considering Single-Producer/Single- Consumer (SPSC) coordination using unbounded queues. A novel unbounded SPSC algorithm capable of reducing the row synchronization latency and speeding up Producer-Consumer coordination is presented. The algorithm has been extensively tested on a shared-cache multi-core platform and a sketch proof of correctness is presented. The queues proposed have been used as basic building blocks to implement the FastFlow parallel framework, which has been demonstrated to offer very good performance for fine-grain parallel applications. © 2012 Springer-Verlag.
Resumo:
Structured parallel programming is recognised as a viable and effective means of tackling parallel programming problems. Recently, a set of simple and powerful parallel building blocks RISC pb2l) has been proposed to support modelling and implementation of parallel frameworks. In this work we demonstrate how that same parallel building block set may be used to model both general purpose parallel programming abstractions, not usually listed in classical skeleton sets, and more specialized domain specific parallel patterns. We show how an implementation of RISC pb2 l can be realised via the FastFlow framework and present experimental evidence of the feasibility and efficiency of the approach.
Resumo:
We propose a methodology for optimizing the execution of data parallel (sub-)tasks on CPU and GPU cores of the same heterogeneous architecture. The methodology is based on two main components: i) an analytical performance model for scheduling tasks among CPU and GPU cores, such that the global execution time of the overall data parallel pattern is optimized; and ii) an autonomic module which uses the analytical performance model to implement the data parallel computations in a completely autonomic way, requiring no programmer intervention to optimize the computation across CPU and GPU cores. The analytical performance model uses a small set of simple parameters to devise a partitioning-between CPU and GPU cores-of the tasks derived from structured data parallel patterns/algorithmic skeletons. The model takes into account both hardware related and application dependent parameters. It computes the percentage of tasks to be executed on CPU and GPU cores such that both kinds of cores are exploited and performance figures are optimized. The autonomic module, implemented in FastFlow, executes a generic map (reduce) data parallel pattern scheduling part of the tasks to the GPU and part to CPU cores so as to achieve optimal execution time. Experimental results on state-of-the-art CPU/GPU architectures are shown that assess both performance model properties and autonomic module effectiveness. © 2013 IEEE.
Resumo:
We introduce a new parallel pattern derived from a specific application domain and show how it turns out to have application beyond its domain of origin. The pool evolution pattern models the parallel evolution of a population subject to mutations and evolving in such a way that a given fitness function is optimized. The pattern has been demonstrated to be suitable for capturing and modeling the parallel patterns underpinning various evolutionary algorithms, as well as other parallel patterns typical of symbolic computation. In this paper we introduce the pattern, we discuss its implementation on modern multi/many core architectures and finally present experimental results obtained with FastFlow and Erlang implementations to assess its feasibility and scalability.
Resumo:
In this paper we advocate the Loop-of-stencil-reduce pattern as a way to simplify the parallel programming of heterogeneous platforms (multicore+GPUs). Loop-of-Stencil-reduce is general enough to subsume map, reduce, map-reduce, stencil, stencil-reduce, and, crucially, their usage in a loop. It transparently targets (by using OpenCL) combinations of CPU cores and GPUs, and it makes it possible to simplify the deployment of a single stencil computation kernel on different GPUs. The paper discusses the implementation of Loop-of-stencil-reduce within the FastFlow parallel framework, considering a simple iterative data-parallel application as running example (Game of Life) and a highly effective parallel filter for visual data restoration to assess performance. Thanks to the high-level design of the Loop-of-stencil-reduce, it was possible to run the filter seamlessly on a multicore machine, on multi-GPUs, and on both.
Resumo:
We advocate the Loop-of-stencil-reduce pattern as a means of simplifying the implementation of data-parallel programs on heterogeneous multi-core platforms. Loop-of-stencil-reduce is general enough to subsume map, reduce, map-reduce, stencil, stencil-reduce, and, crucially, their usage in a loop in both data-parallel and streaming applications, or a combination of both. The pattern makes it possible to deploy a single stencil computation kernel on different GPUs. We discuss the implementation of Loop-of-stencil-reduce in FastFlow, a framework for the implementation of applications based on the parallel patterns. Experiments are presented to illustrate the use of Loop-of-stencil-reduce in developing data-parallel kernels running on heterogeneous systems.