933 resultados para Parallel computing
Resumo:
One challenge on data assimilation (DA) methods is how the error covariance for the model state is computed. Ensemble methods have been proposed for producing error covariance estimates, as error is propagated in time using the non-linear model. Variational methods, on the other hand, use the concepts of control theory, whereby the state estimate is optimized from both the background and the measurements. Numerical optimization schemes are applied which solve the problem of memory storage and huge matrix inversion needed by classical Kalman filter methods. Variational Ensemble Kalman filter (VEnKF), as a method inspired the Variational Kalman Filter (VKF), enjoys the benefits from both ensemble methods and variational methods. It avoids filter inbreeding problems which emerge when the ensemble spread underestimates the true error covariance. In VEnKF this is tackled by resampling the ensemble every time measurements are available. One advantage of VEnKF over VKF is that it needs neither tangent linear code nor adjoint code. In this thesis, VEnKF has been applied to a two-dimensional shallow water model simulating a dam-break experiment. The model is a public code with water height measurements recorded in seven stations along the 21:2 m long 1:4 m wide flume’s mid-line. Because the data were too sparse to assimilate the 30 171 model state vector, we chose to interpolate the data both in time and in space. The results of the assimilation were compared with that of a pure simulation. We have found that the results revealed by the VEnKF were more realistic, without numerical artifacts present in the pure simulation. Creating a wrapper code for a model and DA scheme might be challenging, especially when the two were designed independently or are poorly documented. In this thesis we have presented a non-intrusive approach of coupling the model and a DA scheme. An external program is used to send and receive information between the model and DA procedure using files. The advantage of this method is that the model code changes needed are minimal, only a few lines which facilitate input and output. Apart from being simple to coupling, the approach can be employed even if the two were written in different programming languages, because the communication is not through code. The non-intrusive approach is made to accommodate parallel computing by just telling the control program to wait until all the processes have ended before the DA procedure is invoked. It is worth mentioning the overhead increase caused by the approach, as at every assimilation cycle both the model and the DA procedure have to be initialized. Nonetheless, the method can be an ideal approach for a benchmark platform in testing DA methods. The non-intrusive VEnKF has been applied to a multi-purpose hydrodynamic model COHERENS to assimilate Total Suspended Matter (TSM) in lake Säkylän Pyhäjärvi. The lake has an area of 154 km2 with an average depth of 5:4 m. Turbidity and chlorophyll-a concentrations from MERIS satellite images for 7 days between May 16 and July 6 2009 were available. The effect of the organic matter has been computationally eliminated to obtain TSM data. Because of computational demands from both COHERENS and VEnKF, we have chosen to use 1 km grid resolution. The results of the VEnKF have been compared with the measurements recorded at an automatic station located at the North-Western part of the lake. However, due to TSM data sparsity in both time and space, it could not be well matched. The use of multiple automatic stations with real time data is important to elude the time sparsity problem. With DA, this will help in better understanding the environmental hazard variables for instance. We have found that using a very high ensemble size does not necessarily improve the results, because there is a limit whereby additional ensemble members add very little to the performance. Successful implementation of the non-intrusive VEnKF and the ensemble size limit for performance leads to an emerging area of Reduced Order Modeling (ROM). To save computational resources, running full-blown model in ROM is avoided. When the ROM is applied with the non-intrusive DA approach, it might result in a cheaper algorithm that will relax computation challenges existing in the field of modelling and DA.
Resumo:
The research described in this thesis was motivated by the need of a robust model capable of representing 3D data obtained with 3D sensors, which are inherently noisy. In addition, time constraints have to be considered as these sensors are capable of providing a 3D data stream in real time. This thesis proposed the use of Self-Organizing Maps (SOMs) as a 3D representation model. In particular, we proposed the use of the Growing Neural Gas (GNG) network, which has been successfully used for clustering, pattern recognition and topology representation of multi-dimensional data. Until now, Self-Organizing Maps have been primarily computed offline and their application in 3D data has mainly focused on free noise models, without considering time constraints. It is proposed a hardware implementation leveraging the computing power of modern GPUs, which takes advantage of a new paradigm coined as General-Purpose Computing on Graphics Processing Units (GPGPU). The proposed methods were applied to different problem and applications in the area of computer vision such as the recognition and localization of objects, visual surveillance or 3D reconstruction.
Resumo:
Virtual screening (VS) methods can considerably aid clinical research, predicting how ligands interact with drug targets. Most VS methods suppose a unique binding site for the target, but it has been demonstrated that diverse ligands interact with unrelated parts of the target and many VS methods do not take into account this relevant fact. This problem is circumvented by a novel VS methodology named BINDSURF that scans the whole protein surface in order to find new hotspots, where ligands might potentially interact with, and which is implemented in last generation massively parallel GPU hardware, allowing fast processing of large ligand databases. BINDSURF can thus be used in drug discovery, drug design, drug repurposing and therefore helps considerably in clinical research. However, the accuracy of most VS methods and concretely BINDSURF is constrained by limitations in the scoring function that describes biomolecular interactions, and even nowadays these uncertainties are not completely understood. In order to improve accuracy of the scoring functions used in BINDSURF we propose a hybrid novel approach where neural networks (NNET) and support vector machines (SVM) methods are trained with databases of known active (drugs) and inactive compounds, being this information exploited afterwards to improve BINDSURF VS predictions.
Resumo:
En este documento se expondrá una implementación del problema del viajante de comercio usando una implementación personalizada de un mapa auto-organizado basándose en soluciones anteriores y adaptándolas a la arquitectura CUDA, haciendo a la vez una comparativa de la implementación eficiente en CUDA C/C++ con la implementación de las funciones de GPU incluidas en el Parallel Computing Toolbox de Matlab. La solución que se da reduce en casi un cuarto las iteraciones necesarias para llegar a una solución buena del problema mencionado, además de la mejora inminente del uso de las arquitecturas paralelas. En esta solución se estudia la mejora en tiempo que se consigue con el uso específico de la memoria compartida, siendo esta una de las herramientas más potentes para mejorar el rendimiento. En lo referente a los tiempos de ejecución, se llega a concluir que la mejor solución es el lanzamiento de un kernel de CUDA desde Matlab a través de la funcionalidad incluida en el Parallel Computing Toolbox.
Resumo:
Various mechanisms have been proposed to explain extreme waves or rogue waves in an oceanic environment including directional focusing, dispersive focusing, wave-current interaction, and nonlinear modulational instability. The Benjamin-Feir instability (nonlinear modulational instability), however, is considered to be one of the primary mechanisms for rogue-wave occurrence. The nonlinear Schrodinger equation is a well-established approximate model based on the same assumptions as required for the derivation of the Benjamin-Feir theory. Solutions of the nonlinear Schrodinger equation, including new rogue-wave type solutions are presented in the author's dissertation work. The solutions are obtained by using a predictive eigenvalue map based predictor-corrector procedure developed by the author. Features of the predictive map are explored and the influences of certain parameter variations are investigated. The solutions are rescaled to match the length scales of waves generated in a wave tank. Based on the information provided by the map and the details of physical scaling, a framework is developed that can serve as a basis for experimental investigations into a variety of extreme waves as well localizations in wave fields. To derive further fundamental insights into the complexity of extreme wave conditions, Smoothed Particle Hydrodynamics (SPH) simulations are carried out on an advanced Graphic Processing Unit (GPU) based parallel computational platform. Free surface gravity wave simulations have successfully characterized water-wave dispersion in the SPH model while demonstrating extreme energy focusing and wave growth in both linear and nonlinear regimes. A virtual wave tank is simulated wherein wave motions can be excited from either side. Focusing of several wave trains and isolated waves has been simulated. With properly chosen parameters, dispersion effects are observed causing a chirped wave train to focus and exhibit growth. By using the insights derived from the study of the nonlinear Schrodinger equation, modulational instability or self-focusing has been induced in a numerical wave tank and studied through several numerical simulations. Due to the inherent dissipative nature of SPH models, simulating persistent progressive waves can be problematic. This issue has been addressed and an observation-based solution has been provided. The efficacy of SPH in modeling wave focusing can be critical to further our understanding and predicting extreme wave phenomena through simulations. A deeper understanding of the mechanisms underlying extreme energy localization phenomena can help facilitate energy harnessing and serve as a basis to predict and mitigate the impact of energy focusing.
Resumo:
Nowadays, new computers generation provides a high performance that enables to build computationally expensive computer vision applications applied to mobile robotics. Building a map of the environment is a common task of a robot and is an essential part to allow the robots to move through these environments. Traditionally, mobile robots used a combination of several sensors from different technologies. Lasers, sonars and contact sensors have been typically used in any mobile robotic architecture, however color cameras are an important sensor due to we want the robots to use the same information that humans to sense and move through the different environments. Color cameras are cheap and flexible but a lot of work need to be done to give robots enough visual understanding of the scenes. Computer vision algorithms are computational complex problems but nowadays robots have access to different and powerful architectures that can be used for mobile robotics purposes. The advent of low-cost RGB-D sensors like Microsoft Kinect which provide 3D colored point clouds at high frame rates made the computer vision even more relevant in the mobile robotics field. The combination of visual and 3D data allows the systems to use both computer vision and 3D processing and therefore to be aware of more details of the surrounding environment. The research described in this thesis was motivated by the need of scene mapping. Being aware of the surrounding environment is a key feature in many mobile robotics applications from simple robotic navigation to complex surveillance applications. In addition, the acquisition of a 3D model of the scenes is useful in many areas as video games scene modeling where well-known places are reconstructed and added to game systems or advertising where once you get the 3D model of one room the system can add furniture pieces using augmented reality techniques. In this thesis we perform an experimental study of the state-of-the-art registration methods to find which one fits better to our scene mapping purposes. Different methods are tested and analyzed on different scene distributions of visual and geometry appearance. In addition, this thesis proposes two methods for 3d data compression and representation of 3D maps. Our 3D representation proposal is based on the use of Growing Neural Gas (GNG) method. This Self-Organizing Maps (SOMs) has been successfully used for clustering, pattern recognition and topology representation of various kind of data. Until now, Self-Organizing Maps have been primarily computed offline and their application in 3D data has mainly focused on free noise models without considering time constraints. Self-organising neural models have the ability to provide a good representation of the input space. In particular, the Growing Neural Gas (GNG) is a suitable model because of its flexibility, rapid adaptation and excellent quality of representation. However, this type of learning is time consuming, specially for high-dimensional input data. Since real applications often work under time constraints, it is necessary to adapt the learning process in order to complete it in a predefined time. This thesis proposes a hardware implementation leveraging the computing power of modern GPUs which takes advantage of a new paradigm coined as General-Purpose Computing on Graphics Processing Units (GPGPU). Our proposed geometrical 3D compression method seeks to reduce the 3D information using plane detection as basic structure to compress the data. This is due to our target environments are man-made and therefore there are a lot of points that belong to a plane surface. Our proposed method is able to get good compression results in those man-made scenarios. The detected and compressed planes can be also used in other applications as surface reconstruction or plane-based registration algorithms. Finally, we have also demonstrated the goodness of the GPU technologies getting a high performance implementation of a CAD/CAM common technique called Virtual Digitizing.
Resumo:
Visualization of vector fields plays an important role in research activities nowadays -- Web applications allow a fast, multi-platform and multi-device access to data, which results in the need of optimized applications to be implemented in both high-performance and low-performance devices -- Point trajectory calculation procedures usually perform repeated calculations due to the fact that several points might lie over the same trajectory -- This paper presents a new methodology to calculate point trajectories over highly-dense and uniformly-distributed grid of points in which the trajectories are forced to lie over the points in the grid -- Its advantages rely on a highly parallel computing architecture implementation and in the reduction of the computational effort to calculate the stream paths since unnecessary calculations are avoided, reusing data through iterations -- As case study, the visualization of oceanic currents through in the web platform is presented and analyzed, using WebGL as the parallel computing architecture and the rendering Application Programming Interface
Resumo:
In the multi-core CPU world, transactional memory (TM)has emerged as an alternative to lock-based programming for thread synchronization. Recent research proposes the use of TM in GPU architectures, where a high number of computing threads, organized in SIMT fashion, requires an effective synchronization method. In contrast to CPUs, GPUs offer two memory spaces: global memory and local memory. The local memory space serves as a shared scratch-pad for a subset of the computing threads, and it is used by programmers to speed-up their applications thanks to its low latency. Prior work from the authors proposed a lightweight hardware TM (HTM) support based in the local memory, modifying the SIMT execution model and adding a conflict detection mechanism. An efficient implementation of these features is key in order to provide an effective synchronization mechanism at the local memory level. After a quick description of the main features of our HTM design for GPU local memory, in this work we gather together a number of proposals designed with the aim of improving those mechanisms with high impact on performance. Firstly, the SIMT execution model is modified to increase the parallelism of the application when transactions must be serialized in order to make forward progress. Secondly, the conflict detection mechanism is optimized depending on application characteristics, such us the read/write sets, the probability of conflict between transactions and the existence of read-only transactions. As these features can be present in hardware simultaneously, it is a task of the compiler and runtime to determine which ones are more important for a given application. This work includes a discussion on the analysis to be done in order to choose the best configuration solution.
Resumo:
En el campo de la medicina clínica es crucial poder determinar la seguridad y la eficacia de los fármacos actuales y además acelerar el descubrimiento de nuevos compuestos activos. Para ello se llevan a cabo ensayos de laboratorio, que son métodos muy costosos y que requieren mucho tiempo. Sin embargo, la bioinformática puede facilitar enormemente la investigación clínica para los fines mencionados, ya que proporciona la predicción de la toxicidad de los fármacos y su actividad en enfermedades nuevas, así como la evolución de los compuestos activos descubiertos en ensayos clínicos. Esto se puede lograr gracias a la disponibilidad de herramientas de bioinformática y métodos de cribado virtual por ordenador (CV) que permitan probar todas las hipótesis necesarias antes de realizar los ensayos clínicos, tales como el docking estructural, mediante el programa BINDSURF. Sin embargo, la precisión de la mayoría de los métodos de CV se ve muy restringida a causa de las limitaciones presentes en las funciones de afinidad o scoring que describen las interacciones biomoleculares, e incluso hoy en día estas incertidumbres no se conocen completamente. En este trabajo abordamos este problema, proponiendo un nuevo enfoque en el que las redes neuronales se entrenan con información relativa a bases de datos de compuestos conocidos (proteínas diana y fármacos), y se aprovecha después el método para incrementar la precisión de las predicciones de afinidad del método de CV BINDSURF.
Resumo:
With hundreds of millions of users reporting locations and embracing mobile technologies, Location Based Services (LBSs) are raising new challenges. In this dissertation, we address three emerging problems in location services, where geolocation data plays a central role. First, to handle the unprecedented growth of generated geolocation data, existing location services rely on geospatial database systems. However, their inability to leverage combined geographical and textual information in analytical queries (e.g. spatial similarity joins) remains an open problem. To address this, we introduce SpsJoin, a framework for computing spatial set-similarity joins. SpsJoin handles combined similarity queries that involve textual and spatial constraints simultaneously. LBSs use this system to tackle different types of problems, such as deduplication, geolocation enhancement and record linkage. We define the spatial set-similarity join problem in a general case and propose an algorithm for its efficient computation. Our solution utilizes parallel computing with MapReduce to handle scalability issues in large geospatial databases. Second, applications that use geolocation data are seldom concerned with ensuring the privacy of participating users. To motivate participation and address privacy concerns, we propose iSafe, a privacy preserving algorithm for computing safety snapshots of co-located mobile devices as well as geosocial network users. iSafe combines geolocation data extracted from crime datasets and geosocial networks such as Yelp. In order to enhance iSafe's ability to compute safety recommendations, even when crime information is incomplete or sparse, we need to identify relationships between Yelp venues and crime indices at their locations. To achieve this, we use SpsJoin on two datasets (Yelp venues and geolocated businesses) to find venues that have not been reviewed and to further compute the crime indices of their locations. Our results show a statistically significant dependence between location crime indices and Yelp features. Third, review centered LBSs (e.g., Yelp) are increasingly becoming targets of malicious campaigns that aim to bias the public image of represented businesses. Although Yelp actively attempts to detect and filter fraudulent reviews, our experiments showed that Yelp is still vulnerable. Fraudulent LBS information also impacts the ability of iSafe to provide correct safety values. We take steps toward addressing this problem by proposing SpiDeR, an algorithm that takes advantage of the richness of information available in Yelp to detect abnormal review patterns. We propose a fake venue detection solution that applies SpsJoin on Yelp and U.S. housing datasets. We validate the proposed solutions using ground truth data extracted by our experiments and reviews filtered by Yelp.
Resumo:
In the presented thesis work, the meshfree method with distance fields was coupled with the lattice Boltzmann method to obtain solutions of fluid-structure interaction problems. The thesis work involved development and implementation of numerical algorithms, data structure, and software. Numerical and computational properties of the coupling algorithm combining the meshfree method with distance fields and the lattice Boltzmann method were investigated. Convergence and accuracy of the methodology was validated by analytical solutions. The research was focused on fluid-structure interaction solutions in complex, mesh-resistant domains as both the lattice Boltzmann method and the meshfree method with distance fields are particularly adept in these situations. Furthermore, the fluid solution provided by the lattice Boltzmann method is massively scalable, allowing extensive use of cutting edge parallel computing resources to accelerate this phase of the solution process. The meshfree method with distance fields allows for exact satisfaction of boundary conditions making it possible to exactly capture the effects of the fluid field on the solid structure.
Resumo:
A composite SaaS (Software as a Service) is a software that is comprised of several software components and data components. The composite SaaS placement problem is to determine where each of the components should be deployed in a cloud computing environment such that the performance of the composite SaaS is optimal. From the computational point of view, the composite SaaS placement problem is a large-scale combinatorial optimization problem. Thus, an Iterative Cooperative Co-evolutionary Genetic Algorithm (ICCGA) was proposed. The ICCGA can find reasonable quality of solutions. However, its computation time is noticeably slow. Aiming at improving the computation time, we propose an unsynchronized Parallel Cooperative Co-evolutionary Genetic Algorithm (PCCGA) in this paper. Experimental results have shown that the PCCGA not only has quicker computation time, but also generates better quality of solutions than the ICCGA.
Resumo:
Tridiagonal diagonally dominant linear systems arise in many scientific and engineering applications. The standard Thomas algorithm for solving such systems is inherently serial forming a bottleneck in computation. Algorithms such as cyclic reduction and SPIKE reduce a single large tridiagonal system into multiple small independent systems which can be solved in parallel. We have developed portable cyclic reduction and SPIKE algorithm OpenCL implementations with the intent to target a range of co-processors in a heterogeneous computing environment including Field Programmable Gate Arrays (FPGAs), Graphics Processing Units (GPUs) and other multi-core processors. In this paper, we evaluate these designs in the context of solver performance, resource efficiency and numerical accuracy.
Resumo:
The contour tree is a topological abstraction of a scalar field that captures evolution in level set connectivity. It is an effective representation for visual exploration and analysis of scientific data. We describe a work-efficient, output sensitive, and scalable parallel algorithm for computing the contour tree of a scalar field defined on a domain that is represented using either an unstructured mesh or a structured grid. A hybrid implementation of the algorithm using the GPU and multi-core CPU can compute the contour tree of an input containing 16 million vertices in less than ten seconds with a speedup factor of upto 13. Experiments based on an implementation in a multi-core CPU environment show near-linear speedup for large data sets.
Resumo:
Negabinary is a component of the positional number system. A complete set of negabinary arithmetic operations are presented, including the basic addition/subtraction logic, the two-step carry-free addition/subtraction algorithm based on negabinary signed-digit (NSD) representation, parallel multiplication, and the fast conversion from NSD to the normal negabinary in the carry-look-ahead mode. All the arithmetic operations can be performed with binary logic. By programming the binary reference bits, addition and subtraction can be realized in parallel with the same binary logic functions. This offers a technique to perform space-variant arithmetic-logic functions with space-invariant instructions. Multiplication can be performed in the tree structure and it is simpler than the modified signed-digit (MSD) counterpart. The parallelism of the algorithms is very suitable for optical implementation. Correspondingly, a general-purpose optical logic system using an electron trapping device is suggested. Various complex logic functions can be performed by programming the illumination of the data arrays without additional temporal latency of the intermediate results. The system can be compact. These properties make the proposed negabinary arithmetic-logic system a strong candidate for future applications in digital optical computing with the development of smart pixel arrays. (C) 1999 Society of Photo-Optical Instrumentation Engineers. [S0091-3286(99)00803-X].