863 resultados para parallel architectures
Resumo:
The development and characterization of biomolecule sensor formats based on the optical technique Surface Plasmon Resonance (SPR) Spectroscopy and electrochemical methods were investigated. The study can be divided into two parts of different scope. In the first part new novel detection schemes for labeled targets were developed on the basis of the investigations in Surface-plamon Field Enhanced Spectroscopy (SPFS). The first one is SPR fluorescence imaging formats, Surface-plamon Field Enhanced Fluorescence Microscopy (SPFM). Patterned self assembled monolayers (SAMs) were prepared and used to direct the spatial distribution of biomolecules immobilized on surfaces. Here the patterned monolayers would serve as molecular templates to secure different biomolecules to known locations on a surface. The binding processed of labeled target biomolecules from solution to sensor surface were visually and kinetically recorded by the fluorescence microscope, in which fluorescence was excited by the evanescent field of propagating plasmon surface polaritons. The second format which also originates from SPFS technique, Surface-plamon Field Enhanced Fluorescence Spectrometry (SPFSm), concerns the coupling of a fluorometry to normal SPR setup. A spectrograph mounted in place of photomultiplier or microscope can provide the information of fluorescence spectrum as well as fluorescence intensity. This study also firstly demonstrated the analytical combination of surface plasmon enhanced fluorescence detection with analyte tagged by semiconducting nano- crystals (QDs). Electrochemically addressable fabrication of DNA biosensor arrays in aqueous environment was also developed. An electrochemical method was introduced for the directed in-situ assembly of various specific oligonucleotide catcher probes onto different sensing elements of a multi-electrode array in the aqueous environment of a flow cell. Surface plasmon microscopy (SPM) is utilized for the on-line recording of the various functionalization steps. Hybridization reactions between targets from solution to the different surface-bound complementary probes are monitored by surface-plasmon field-enhanced fluorescence microscopy (SPFM) using targets that are either labeled with organic dyes or with semiconducting quantum dots for color-multiplexing. This study provides a new approach for the fabrication of (small) DNA arrays and the recording and quantitative evaluation of parallel hybridization reactions. In the second part of this work, the ideas of combining the SP optical and electrochemical characterization were extended to tethered bilayer lipid membrane (tBLM) format. Tethered bilayer lipid membranes provide a versatile model platform for the study of many membrane related processes. The thiolipids were firstly self-assembled on ultraflat gold substrates. Fusion of the monolayers with small unilamellar vesicles (SUVs) formed the distal layer and the membranes thus obtained have the sealing properties comparable to those of natural membranes. The fusion could be monitored optically by SPR as an increase in reflectivity (thickness) upon formation of the outer leaflet of the bilayer. With EIS, a drop in capacitance and a steady increase in resistance could be observed leading to a tightly sealing membrane with low leakage currents. The assembly of tBLMs and the subsequent incorporation of membrane proteins were investigated with respect to their potential use as a biosensing system. In the case of valinomycin the potassium transport mediated by the ion carrier could be shown by a decrease in resistance upon increasing potassium concentration. Potential mediation of membrane pores could be shown for the ion channel forming peptide alamethicin (Alm). It was shown that at high positive dc bias (cis negative) Alm channels stay at relatively low conductance levels and show higher permeability to potassium than to tetramethylammonium. The addition of inhibitor amiloride can partially block the Alm channels and results in increase of membrane resistance. tBLMs are robust and versatile model membrane architectures that can mimic certain properties of biological membranes. tBLMs with incorporated lipopolysaccharide (LPS) and lipid A mimicking bacteria membranes were used to probe the interactions of antibodies against LPS and to investigate the binding and incorporation of the small antimicrobial peptide V4. The influence of membrane composition and charge on the behavior of V4 was also probed. This study displays the possibility of using tBLM platform to record and valuate the efficiency or potency of numerous synthesized antimicrobial peptides as potential drug candidates.
Resumo:
Complex networks analysis is a very popular topic in computer science. Unfortunately this networks, extracted from different contexts, are usually very large and the analysis may be very complicated: computation of metrics on these structures could be very complex. Among all metrics we analyse the extraction of subnetworks called communities: they are groups of nodes that probably play the same role within the whole structure. Communities extraction is an interesting operation in many different fields (biology, economics,...). In this work we present a parallel community detection algorithm that can operate on networks with huge number of nodes and edges. After an introduction to graph theory and high performance computing, we will explain our design strategies and our implementation. Then, we will show some performance evaluation made on a distributed memory architectures i.e. the supercomputer IBM-BlueGene/Q "Fermi" at the CINECA supercomputing center, Italy, and we will comment our results.
Resumo:
This thesis deals with heterogeneous architectures in standard workstations. Heterogeneous architectures represent an appealing alternative to traditional supercomputers because they are based on commodity components fabricated in large quantities. Hence their price-performance ratio is unparalleled in the world of high performance computing (HPC). In particular, different aspects related to the performance and consumption of heterogeneous architectures have been explored. The thesis initially focuses on an efficient implementation of a parallel application, where the execution time is dominated by an high number of floating point instructions. Then the thesis touches the central problem of efficient management of power peaks in heterogeneous computing systems. Finally it discusses a memory-bounded problem, where the execution time is dominated by the memory latency. Specifically, the following main contributions have been carried out: A novel framework for the design and analysis of solar field for Central Receiver Systems (CRS) has been developed. The implementation based on desktop workstation equipped with multiple Graphics Processing Units (GPUs) is motivated by the need to have an accurate and fast simulation environment for studying mirror imperfection and non-planar geometries. Secondly, a power-aware scheduling algorithm on heterogeneous CPU-GPU architectures, based on an efficient distribution of the computing workload to the resources, has been realized. The scheduler manages the resources of several computing nodes with a view to reducing the peak power. The two main contributions of this work follow: the approach reduces the supply cost due to high peak power whilst having negligible impact on the parallelism of computational nodes. from another point of view the developed model allows designer to increase the number of cores without increasing the capacity of the power supply unit. Finally, an implementation for efficient graph exploration on reconfigurable architectures is presented. The purpose is to accelerate graph exploration, reducing the number of random memory accesses.
Resumo:
During the last few decades an unprecedented technological growth has been at the center of the embedded systems design paramount, with Moore’s Law being the leading factor of this trend. Today in fact an ever increasing number of cores can be integrated on the same die, marking the transition from state-of-the-art multi-core chips to the new many-core design paradigm. Despite the extraordinarily high computing power, the complexity of many-core chips opens the door to several challenges. As a result of the increased silicon density of modern Systems-on-a-Chip (SoC), the design space exploration needed to find the best design has exploded and hardware designers are in fact facing the problem of a huge design space. Virtual Platforms have always been used to enable hardware-software co-design, but today they are facing with the huge complexity of both hardware and software systems. In this thesis two different research works on Virtual Platforms are presented: the first one is intended for the hardware developer, to easily allow complex cycle accurate simulations of many-core SoCs. The second work exploits the parallel computing power of off-the-shelf General Purpose Graphics Processing Units (GPGPUs), with the goal of an increased simulation speed. The term Virtualization can be used in the context of many-core systems not only to refer to the aforementioned hardware emulation tools (Virtual Platforms), but also for two other main purposes: 1) to help the programmer to achieve the maximum possible performance of an application, by hiding the complexity of the underlying hardware. 2) to efficiently exploit the high parallel hardware of many-core chips in environments with multiple active Virtual Machines. This thesis is focused on virtualization techniques with the goal to mitigate, and overtake when possible, some of the challenges introduced by the many-core design paradigm.
Resumo:
We show a method for parallelizing top down dynamic programs in a straightforward way by a careful choice of a lock-free shared hash table implementation and randomization of the order in which the dynamic program computes its subproblems. This generic approach is applied to dynamic programs for knapsack, shortest paths, and RNA structure alignment, as well as to a state-of-the-art solution for minimizing the máximum number of open stacks. Experimental results are provided on three different modern multicore architectures which show that this parallelization is effective and reasonably scalable. In particular, we obtain over 10 times speedup for 32 threads on the open stacks problem.
Resumo:
Membrane systems are computational equivalent to Turing machines. However, their distributed and massively parallel nature obtains polynomial solutions opposite to traditional non-polynomial ones. At this point, it is very important to develop dedicated hardware and software implementations exploiting those two membrane systems features. Dealing with distributed implementations of P systems, the bottleneck communication problem has arisen. When the number of membranes grows up, the network gets congested. The purpose of distributed architectures is to reach a compromise between the massively parallel character of the system and the needed evolution step time to transit from one configuration of the system to the next one, solving the bottleneck communication problem. The goal of this paper is twofold. Firstly, to survey in a systematic and uniform way the main results regarding the way membranes can be placed on processors in order to get a software/hardware simulation of P-Systems in a distributed environment. Secondly, we improve some results about the membrane dissolution problem, prove that it is connected, and discuss the possibility of simulating this property in the distributed model. All this yields an improvement in the system parallelism implementation since it gets an increment of the parallelism of the external communication among processors. Proposed ideas improve previous architectures to tackle the communication bottleneck problem, such as reduction of the total time of an evolution step, increase of the number of membranes that could run on a processor and reduction of the number of processors.
Resumo:
The appearance of radix-$2^{2}$ was a milestone in the design of pipelined FFT hardware architectures. Later, radix-$2^{2}$ was extended to radix-$2^{k}$ . However, radix-$2^{k}$ was only proposed for single-path delay feedback (SDF) architectures, but not for feedforward ones, also called multi-path delay commutator (MDC). This paper presents the radix-$2^{k}$ feedforward (MDC) FFT architectures. In feedforward architectures radix-$2^{k}$ can be used for any number of parallel samples which is a power of two. Furthermore, both decimation in frequency (DIF) and decimation in time (DIT) decompositions can be used. In addition to this, the designs can achieve very high throughputs, which makes them suitable for the most demanding applications. Indeed, the proposed radix-$2^{k}$ feedforward architectures require fewer hardware resources than parallel feedback ones, also called multi-path delay feedback (MDF), when several samples in parallel must be processed. As a result, the proposed radix-$2^{k}$ feedforward architectures not only offer an attractive solution for current applications, but also open up a new research line on feedforward structures.
Resumo:
This paper presents a theoretical analysis and an optimization method for envelope amplifier. Highly efficient envelope amplifiers based on a switching converter in parallel or series with a linear regulator have been analyzed and optimized. The results of the optimization process have been shown and these two architectures are compared regarding their complexity and efficiency. The optimization method that is proposed is based on the previous knowledge about the transmitted signal type (OFDM, WCDMA...) and it can be applied to any signal type as long as the envelope probability distribution is known. Finally, it is shown that the analyzed architectures have an inherent efficiency limit.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-03
Resumo:
The Lattice Solid Model has been used successfully as a virtual laboratory to simulate fracturing of rocks, the dynamics of faults, earthquakes and gouge processes. However, results from those simulations show that in order to make the next step towards more realistic experiments it will be necessary to use models containing a significantly larger number of particles than current models. Thus, those simulations will require a greatly increased amount of computational resources. Whereas the computing power provided by single processors can be expected to increase according to Moore's law, i.e., to double every 18-24 months, parallel computers can provide significantly larger computing power today. In order to make this computing power available for the simulation of the microphysics of earthquakes, a parallel version of the Lattice Solid Model has been implemented. Benchmarks using large models with several millions of particles have shown that the parallel implementation of the Lattice Solid Model can achieve a high parallel-efficiency of about 80% for large numbers of processors on different computer architectures.
Resumo:
Despite the insight gained from 2-D particle models, and given that the dynamics of crustal faults occur in 3-D space, the question remains, how do the 3-D fault gouge dynamics differ from those in 2-D? Traditionally, 2-D modeling has been preferred over 3-D simulations because of the computational cost of solving 3-D problems. However, modern high performance computing architectures, combined with a parallel implementation of the Lattice Solid Model (LSM), provide the opportunity to explore 3-D fault micro-mechanics and to advance understanding of effective constitutive relations of fault gouge layers. In this paper, macroscopic friction values from 2-D and 3-D LSM simulations, performed on an SGI Altix 3700 super-cluster, are compared. Two rectangular elastic blocks of bonded particles, with a rough fault plane and separated by a region of randomly sized non-bonded gouge particles, are sheared in opposite directions by normally-loaded driving plates. The results demonstrate that the gouge particles in the 3-D models undergo significant out-of-plane motion during shear. The 3-D models also exhibit a higher mean macroscopic friction than the 2-D models for varying values of interparticle friction. 2-D LSM gouge models have previously been shown to exhibit accelerating energy release in simulated earthquake cycles, supporting the Critical Point hypothesis. The 3-D models are shown to also display accelerating energy release, and good fits of power law time-to-failure functions to the cumulative energy release are obtained.
Resumo:
Very large spatially-referenced datasets, for example, those derived from satellite-based sensors which sample across the globe or large monitoring networks of individual sensors, are becoming increasingly common and more widely available for use in environmental decision making. In large or dense sensor networks, huge quantities of data can be collected over small time periods. In many applications the generation of maps, or predictions at specific locations, from the data in (near) real-time is crucial. Geostatistical operations such as interpolation are vital in this map-generation process and in emergency situations, the resulting predictions need to be available almost instantly, so that decision makers can make informed decisions and define risk and evacuation zones. It is also helpful when analysing data in less time critical applications, for example when interacting directly with the data for exploratory analysis, that the algorithms are responsive within a reasonable time frame. Performing geostatistical analysis on such large spatial datasets can present a number of problems, particularly in the case where maximum likelihood. Although the storage requirements only scale linearly with the number of observations in the dataset, the computational complexity in terms of memory and speed, scale quadratically and cubically respectively. Most modern commodity hardware has at least 2 processor cores if not more. Other mechanisms for allowing parallel computation such as Grid based systems are also becoming increasingly commonly available. However, currently there seems to be little interest in exploiting this extra processing power within the context of geostatistics. In this paper we review the existing parallel approaches for geostatistics. By recognising that diffeerent natural parallelisms exist and can be exploited depending on whether the dataset is sparsely or densely sampled with respect to the range of variation, we introduce two contrasting novel implementations of parallel algorithms based on approximating the data likelihood extending the methods of Vecchia [1988] and Tresp [2000]. Using parallel maximum likelihood variogram estimation and parallel prediction algorithms we show that computational time can be significantly reduced. We demonstrate this with both sparsely sampled data and densely sampled data on a variety of architectures ranging from the common dual core processor, found in many modern desktop computers, to large multi-node super computers. To highlight the strengths and weaknesses of the diffeerent methods we employ synthetic data sets and go on to show how the methods allow maximum likelihood based inference on the exhaustive Walker Lake data set.
Resumo:
We describe a novel and potentially important tool for candidate subunit vaccine selection through in silico reverse-vaccinology. A set of Bayesian networks able to make individual predictions for specific subcellular locations is implemented in three pipelines with different architectures: a parallel implementation with a confidence level-based decision engine and two serial implementations with a hierarchical decision structure, one initially rooted by prediction between membrane types and another rooted by soluble versus membrane prediction. The parallel pipeline outperformed the serial pipeline, but took twice as long to execute. The soluble-rooted serial pipeline outperformed the membrane-rooted predictor. Assessment using genomic test sets was more equivocal, as many more predictions are made by the parallel pipeline, yet the serial pipeline identifies 22 more of the 74 proteins of known location.
Resumo:
In the field of Transition P systems implementation, it has been determined that it is very important to determine in advance how long takes evolution rules application in membranes. Moreover, to have time estimations of rules application in membranes makes possible to take important decisions related to hardware / software architectures design. The work presented here introduces an algorithm for applying active evolution rules in Transition P systems, which is based on active rules elimination. The algorithm complies the requisites of being nondeterministic, massively parallel, and what is more important, it is time delimited because it is only dependant on the number of membrane evolution rules.
Resumo:
In this paper, we develop a fast implementation of an hyperspectral coded aperture (HYCA) algorithm on different platforms using OpenCL, an open standard for parallel programing on heterogeneous systems, which includes a wide variety of devices, from dense multicore systems from major manufactures such as Intel or ARM to new accelerators such as graphics processing units (GPUs), field programmable gate arrays (FPGAs), the Intel Xeon Phi and other custom devices. Our proposed implementation of HYCA significantly reduces its computational cost. Our experiments have been conducted using simulated data and reveal considerable acceleration factors. This kind of implementations with the same descriptive language on different architectures are very important in order to really calibrate the possibility of using heterogeneous platforms for efficient hyperspectral imaging processing in real remote sensing missions.