246 resultados para virtualised GPU


Relevância:

10.00% 10.00%

Publicador:

Resumo:

This thesis develops and tests various transient and steady-state computational models such as direct numerical simulation (DNS), large eddy simulation (LES), filtered unsteady Reynolds-averaged Navier-Stokes (URANS) and steady Reynolds-averaged Navier-Stokes (RANS) with and without magnetic field to investigate turbulent flows in canonical as well as in the nozzle and mold geometries of the continuous casting process. The direct numerical simulations are first performed in channel, square and 2:1 aspect rectangular ducts to investigate the effect of magnetic field on turbulent flows. The rectangular duct is a more practical geometry for continuous casting nozzle and mold and has the option of applying magnetic field either perpendicular to broader side or shorter side. This work forms the part of a graphic processing unit (GPU) based CFD code (CU-FLOW) development for magnetohydrodynamic (MHD) turbulent flows. The DNS results revealed interesting effects of the magnetic field and its orientation on primary, secondary flows (instantaneous and mean), Reynolds stresses, turbulent kinetic energy (TKE) budgets, momentum budgets and frictional losses, besides providing DNS database for two-wall bounded square and rectangular duct MHD turbulent flows. Further, the low- and high-Reynolds number RANS models (k-ε and Reynolds stress models) are developed and tested with DNS databases for channel and square duct flows with and without magnetic field. The MHD sink terms in k- and ε-equations are implemented as proposed by Kenjereš and Hanjalić using a user defined function (UDF) in FLUENT. This work revealed varying accuracies of different RANS models at different levels. This work is useful for industry to understand the accuracies of these models, including continuous casting. After realizing the accuracy and computational cost of RANS models, the steady-state k-ε model is then combined with the particle image velocimetry (PIV) and impeller probe velocity measurements in a 1/3rd scale water model to study the flow quality coming out of the well- and mountain-bottom nozzles and the effect of stopper-rod misalignment on fluid flow. The mountain-bottom nozzle was found more prone to the longtime asymmetries and higher surface velocities. The left misalignment of stopper gave higher surface velocity on the right leading to significantly large number of vortices forming behind the nozzle on the left. Later, the transient and steady-state models such as LES, filtered URANS and steady RANS models are combined with ultrasonic Doppler velocimetry (UDV) measurements in a GaInSn model of typical continuous casting process. LES-CU-LOW is the fastest and the most accurate model owing to much finer mesh and a smaller timestep. This work provided a good understanding on the performance of these models. The behavior of instantaneous flows, Reynolds stresses and proper orthogonal decomposition (POD) analysis quantified the nozzle bottom swirl and its importance on the turbulent flow in the mold. Afterwards, the aforementioned work in GaInSn model is extended with electromagnetic braking (EMBr) to help optimize a ruler-type brake and its location for the continuous casting process. The magnetic field suppressed turbulence and promoted vortical structures with their axis aligned with the magnetic field suggesting tendency towards 2-d turbulence. The stronger magnetic field at the nozzle well and around the jet region created large scale and lower frequency flow behavior by suppressing nozzle bottom swirl and its front-back alternation. Based on this work, it is advised to avoid stronger magnetic field around jet and nozzle bottom to get more stable and less defect prone flow.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Due to the growth of design size and complexity, design verification is an important aspect of the Logic Circuit development process. The purpose of verification is to validate that the design meets the system requirements and specification. This is done by either functional or formal verification. The most popular approach to functional verification is the use of simulation based techniques. Using models to replicate the behaviour of an actual system is called simulation. In this thesis, a software/data structure architecture without explicit locks is proposed to accelerate logic gate circuit simulation. We call thus system ZSIM. The ZSIM software architecture simulator targets low cost SIMD multi-core machines. Its performance is evaluated on the Intel Xeon Phi and 2 other machines (Intel Xeon and AMD Opteron). The aim of these experiments is to: • Verify that the data structure used allows SIMD acceleration, particularly on machines with gather instructions ( section 5.3.1). • Verify that, on sufficiently large circuits, substantial gains could be made from multicore parallelism ( section 5.3.2 ). • Show that a simulator using this approach out-performs an existing commercial simulator on a standard workstation ( section 5.3.3 ). • Show that the performance on a cheap Xeon Phi card is competitive with results reported elsewhere on much more expensive super-computers ( section 5.3.5 ). To evaluate the ZSIM, two types of test circuits were used: 1. Circuits from the IWLS benchmark suit [1] which allow direct comparison with other published studies of parallel simulators.2. Circuits generated by a parametrised circuit synthesizer. The synthesizer used an algorithm that has been shown to generate circuits that are statistically representative of real logic circuits. The synthesizer allowed testing of a range of very large circuits, larger than the ones for which it was possible to obtain open source files. The experimental results show that with SIMD acceleration and multicore, ZSIM gained a peak parallelisation factor of 300 on Intel Xeon Phi and 11 on Intel Xeon. With only SIMD enabled, ZSIM achieved a maximum parallelistion gain of 10 on Intel Xeon Phi and 4 on Intel Xeon. Furthermore, it was shown that this software architecture simulator running on a SIMD machine is much faster than, and can handle much bigger circuits than a widely used commercial simulator (Xilinx) running on a workstation. The performance achieved by ZSIM was also compared with similar pre-existing work on logic simulation targeting GPUs and supercomputers. It was shown that ZSIM simulator running on a Xeon Phi machine gives comparable simulation performance to the IBM Blue Gene supercomputer at very much lower cost. The experimental results have shown that the Xeon Phi is competitive with simulation on GPUs and allows the handling of much larger circuits than have been reported for GPU simulation. When targeting Xeon Phi architecture, the automatic cache management of the Xeon Phi, handles and manages the on-chip local store without any explicit mention of the local store being made in the architecture of the simulator itself. However, targeting GPUs, explicit cache management in program increases the complexity of the software architecture. Furthermore, one of the strongest points of the ZSIM simulator is its portability. Note that the same code was tested on both AMD and Xeon Phi machines. The same architecture that efficiently performs on Xeon Phi, was ported into a 64 core NUMA AMD Opteron. To conclude, the two main achievements are restated as following: The primary achievement of this work was proving that the ZSIM architecture was faster than previously published logic simulators on low cost platforms. The secondary achievement was the development of a synthetic testing suite that went beyond the scale range that was previously publicly available, based on prior work that showed the synthesis technique is valid.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Abstract: Medical image processing in general and brain image processing in particular are computationally intensive tasks. Luckily, their use can be liberalized by means of techniques such as GPU programming. In this article we study NiftyReg, a brain image processing library with a GPU implementation using CUDA, and analyse different possible ways of further optimising the existing codes. We will focus on fully using the memory hierarchy and on exploiting the computational power of the CPU. The ideas that lead us towards the different attempts to change and optimize the code will be shown as hypotheses, which we will then test empirically using the results obtained from running the application. Finally, for each set of related optimizations we will study the validity of the obtained results in terms of both performance and the accuracy of the resulting images.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Virtual screening (VS) methods can considerably aid clinical research, predicting how ligands interact with drug targets. Most VS methods suppose a unique binding site for the target, but it has been demonstrated that diverse ligands interact with unrelated parts of the target and many VS methods do not take into account this relevant fact. This problem is circumvented by a novel VS methodology named BINDSURF that scans the whole protein surface in order to find new hotspots, where ligands might potentially interact with, and which is implemented in last generation massively parallel GPU hardware, allowing fast processing of large ligand databases. BINDSURF can thus be used in drug discovery, drug design, drug repurposing and therefore helps considerably in clinical research. However, the accuracy of most VS methods and concretely BINDSURF is constrained by limitations in the scoring function that describes biomolecular interactions, and even nowadays these uncertainties are not completely understood. In order to improve accuracy of the scoring functions used in BINDSURF we propose a hybrid novel approach where neural networks (NNET) and support vector machines (SVM) methods are trained with databases of known active (drugs) and inactive compounds, being this information exploited afterwards to improve BINDSURF VS predictions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

En este documento se expondrá una implementación del problema del viajante de comercio usando una implementación personalizada de un mapa auto-organizado basándose en soluciones anteriores y adaptándolas a la arquitectura CUDA, haciendo a la vez una comparativa de la implementación eficiente en CUDA C/C++ con la implementación de las funciones de GPU incluidas en el Parallel Computing Toolbox de Matlab. La solución que se da reduce en casi un cuarto las iteraciones necesarias para llegar a una solución buena del problema mencionado, además de la mejora inminente del uso de las arquitecturas paralelas. En esta solución se estudia la mejora en tiempo que se consigue con el uso específico de la memoria compartida, siendo esta una de las herramientas más potentes para mejorar el rendimiento. En lo referente a los tiempos de ejecución, se llega a concluir que la mejor solución es el lanzamiento de un kernel de CUDA desde Matlab a través de la funcionalidad incluida en el Parallel Computing Toolbox.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Current trends in broadband mobile networks are addressed towards the placement of different capabilities at the edge of the mobile network in a centralised way. On one hand, the split of the eNB between baseband processing units and remote radio headers makes it possible to process some of the protocols in centralised premises, likely with virtualised resources. On the other hand, mobile edge computing makes use of processing and storage capabilities close to the air interface in order to deploy optimised services with minimum delay. The confluence of both trends is a hot topic in the definition of future 5G networks. The full centralisation of both technologies in cloud data centres imposes stringent requirements to the fronthaul connections in terms of throughput and latency. Therefore, all those cells with limited network access would not be able to offer these types of services. This paper proposes a solution for these cases, based on the placement of processing and storage capabilities close to the remote units, which is especially well suited for the deployment of clusters of small cells. The proposed cloud-enabled small cells include a highly efficient microserver with a limited set of virtualised resources offered to the cluster of small cells. As a result, a light data centre is created and commonly used for deploying centralised eNB and mobile edge computing functionalities. The paper covers the proposed architecture, with special focus on the integration of both aspects, and possible scenarios of application.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Various mechanisms have been proposed to explain extreme waves or rogue waves in an oceanic environment including directional focusing, dispersive focusing, wave-current interaction, and nonlinear modulational instability. The Benjamin-Feir instability (nonlinear modulational instability), however, is considered to be one of the primary mechanisms for rogue-wave occurrence. The nonlinear Schrodinger equation is a well-established approximate model based on the same assumptions as required for the derivation of the Benjamin-Feir theory. Solutions of the nonlinear Schrodinger equation, including new rogue-wave type solutions are presented in the author's dissertation work. The solutions are obtained by using a predictive eigenvalue map based predictor-corrector procedure developed by the author. Features of the predictive map are explored and the influences of certain parameter variations are investigated. The solutions are rescaled to match the length scales of waves generated in a wave tank. Based on the information provided by the map and the details of physical scaling, a framework is developed that can serve as a basis for experimental investigations into a variety of extreme waves as well localizations in wave fields. To derive further fundamental insights into the complexity of extreme wave conditions, Smoothed Particle Hydrodynamics (SPH) simulations are carried out on an advanced Graphic Processing Unit (GPU) based parallel computational platform. Free surface gravity wave simulations have successfully characterized water-wave dispersion in the SPH model while demonstrating extreme energy focusing and wave growth in both linear and nonlinear regimes. A virtual wave tank is simulated wherein wave motions can be excited from either side. Focusing of several wave trains and isolated waves has been simulated. With properly chosen parameters, dispersion effects are observed causing a chirped wave train to focus and exhibit growth. By using the insights derived from the study of the nonlinear Schrodinger equation, modulational instability or self-focusing has been induced in a numerical wave tank and studied through several numerical simulations. Due to the inherent dissipative nature of SPH models, simulating persistent progressive waves can be problematic. This issue has been addressed and an observation-based solution has been provided. The efficacy of SPH in modeling wave focusing can be critical to further our understanding and predicting extreme wave phenomena through simulations. A deeper understanding of the mechanisms underlying extreme energy localization phenomena can help facilitate energy harnessing and serve as a basis to predict and mitigate the impact of energy focusing.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Nowadays, new computers generation provides a high performance that enables to build computationally expensive computer vision applications applied to mobile robotics. Building a map of the environment is a common task of a robot and is an essential part to allow the robots to move through these environments. Traditionally, mobile robots used a combination of several sensors from different technologies. Lasers, sonars and contact sensors have been typically used in any mobile robotic architecture, however color cameras are an important sensor due to we want the robots to use the same information that humans to sense and move through the different environments. Color cameras are cheap and flexible but a lot of work need to be done to give robots enough visual understanding of the scenes. Computer vision algorithms are computational complex problems but nowadays robots have access to different and powerful architectures that can be used for mobile robotics purposes. The advent of low-cost RGB-D sensors like Microsoft Kinect which provide 3D colored point clouds at high frame rates made the computer vision even more relevant in the mobile robotics field. The combination of visual and 3D data allows the systems to use both computer vision and 3D processing and therefore to be aware of more details of the surrounding environment. The research described in this thesis was motivated by the need of scene mapping. Being aware of the surrounding environment is a key feature in many mobile robotics applications from simple robotic navigation to complex surveillance applications. In addition, the acquisition of a 3D model of the scenes is useful in many areas as video games scene modeling where well-known places are reconstructed and added to game systems or advertising where once you get the 3D model of one room the system can add furniture pieces using augmented reality techniques. In this thesis we perform an experimental study of the state-of-the-art registration methods to find which one fits better to our scene mapping purposes. Different methods are tested and analyzed on different scene distributions of visual and geometry appearance. In addition, this thesis proposes two methods for 3d data compression and representation of 3D maps. Our 3D representation proposal is based on the use of Growing Neural Gas (GNG) method. This Self-Organizing Maps (SOMs) has been successfully used for clustering, pattern recognition and topology representation of various kind of data. Until now, Self-Organizing Maps have been primarily computed offline and their application in 3D data has mainly focused on free noise models without considering time constraints. Self-organising neural models have the ability to provide a good representation of the input space. In particular, the Growing Neural Gas (GNG) is a suitable model because of its flexibility, rapid adaptation and excellent quality of representation. However, this type of learning is time consuming, specially for high-dimensional input data. Since real applications often work under time constraints, it is necessary to adapt the learning process in order to complete it in a predefined time. This thesis proposes a hardware implementation leveraging the computing power of modern GPUs which takes advantage of a new paradigm coined as General-Purpose Computing on Graphics Processing Units (GPGPU). Our proposed geometrical 3D compression method seeks to reduce the 3D information using plane detection as basic structure to compress the data. This is due to our target environments are man-made and therefore there are a lot of points that belong to a plane surface. Our proposed method is able to get good compression results in those man-made scenarios. The detected and compressed planes can be also used in other applications as surface reconstruction or plane-based registration algorithms. Finally, we have also demonstrated the goodness of the GPU technologies getting a high performance implementation of a CAD/CAM common technique called Virtual Digitizing.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Dissertação (mestrado)–Universidade de Brasília, Universidade UnB de Planaltina, Programa de Pós-Graduação em Ciência de Materiais, 2015.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Visualization of vector fields plays an important role in research activities nowadays -- Web applications allow a fast, multi-platform and multi-device access to data, which results in the need of optimized applications to be implemented in both high-performance and low-performance devices -- Point trajectory calculation procedures usually perform repeated calculations due to the fact that several points might lie over the same trajectory -- This paper presents a new methodology to calculate point trajectories over highly-dense and uniformly-distributed grid of points in which the trajectories are forced to lie over the points in the grid -- Its advantages rely on a highly parallel computing architecture implementation and in the reduction of the computational effort to calculate the stream paths since unnecessary calculations are avoided, reusing data through iterations -- As case study, the visualization of oceanic currents through in the web platform is presented and analyzed, using WebGL as the parallel computing architecture and the rendering Application Programming Interface

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This work focuses on the creation and applications of a dynamic simulation software in order to study the hard metal structure (WC-Co). The technological ground used to increase the GPU hardware capacity was Geforce 9600 GT along with the PhysX chip created to make games more realistic. The software simulates the three-dimensional carbide structure to the shape of a cubic box where tungsten carbide (WC) are modeled as triangular prisms and truncated triangular prisms. The program was proven effective regarding checking testes, ranging from calculations of parameter measures such as the capacity to increase the number of particles simulated dynamically. It was possible to make an investigation of both the mean parameters and distributions stereological parameters used to characterize the carbide structure through cutting plans. Grounded on the cutting plans concerning the analyzed structures, we have investigated the linear intercepts, the intercepts to the area, and the perimeter section of the intercepted grains as well as the binder phase to the structure by calculating the mean value and distribution of the free path. As literature shows almost consensually that the distribution of the linear intercepts is lognormal, this suggests that the grain distribution is also lognormal. Thus, a routine was developed regarding the program which made possible a more detailed research on this issue. We have observed that it is possible, under certain values for the parameters which define the shape and size of the Prismatic grain to find out the distribution to the linear intercepts that approach the lognormal shape. Regarding a number of developed simulations, we have observed that the distribution curves of the linear and area intercepts as well as the perimeter section are consistent with studies on static computer simulation to these parameters.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents the implementation of a high quality real-time 3D video system intended for 3D videoconferencing -- Basically, the system is able to extract depth information from a pair of images coming from a short-baseline camera setup -- The system is based on the use of a variant of the adaptive support-weight algorithm to be applied on GPU-based architectures -- The reason to do it is to get real-time results without compromising accuracy and also to reduce costs by using commodity hardware -- The complete system runs over the GStreamer multimedia software platform to make it even more flexible -- Moreover, an autoestereoscopic display has been used as the end-up terminal for 3D content visualization

Relevância:

10.00% 10.00%

Publicador:

Resumo:

En esta tesis doctoral se exponen los fundamentos teóricos necesarios en el diseño de esquemas numéricos de volúmenes finitos para sistemas hiperbólicos no conservativos de una y dos dimensiones. Para el caso unidimensional se repasan los conceptos de esquema camino-conservativo y esquema bien equilibrado, así como la extensión de los esquemas numéricos a alto orden, basados en la reconstrucción de estados. En particular, se presentan los esquemas de tipo PVM (Polynomial Viscosity Matrix), así como diversos esquemas de limitadores de flujo que resultan de la extensión natural del método WAF, utilizando como base algunos esquemas de tipo PVM. Para el caso bidimensional se aborda el diseño de esquemas numéricos camino-conservativos y bien equilibrados de volúmenes finitos para sistemas hiperbólicos no conservativos y su extensión a alto orden, en particular se presenta una reconstrucción de estados de tercer orden compacta y que resulta de la combinación WENO de paraboloides y planos. 
 Se presenta además el desarrollo de métodos numéricos para el sistema de aguas someras bidimensional de una capa. En particular se definen esquemas de primer orden de tipo HLL y FORCE y su extensión a alto orden, un método de limitadores de flujo basado en el esquema HLL-WAF, así como su implementación en arquitecturas de tipo GPU, usando el entorno de programación CUDA. A continuación, se presenta un esquema numérico de orden uno para el sistema de aguas someras de una capa bidimensional en coordenadas esféricas (longitud/latitud), así como la extensión natural del método de limitadores de flujo presentado en el Capítulo 3 a este sistema. Finalmente, se presenta la validación del esquema de limitadores de flujo mediante la simulación de tsunamis reales, y la comparación con datos de campo.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

After a decade evolving in the High Performance Computing arena, GPU-equipped supercomputers have con- quered the top500 and green500 lists, providing us unprecedented levels of computational power and memory bandwidth. This year, major vendors have introduced new accelerators based on 3D memory, like Xeon Phi Knights Landing by Intel and Pascal architecture by Nvidia. This paper reviews hardware features of those new HPC accelerators and unveils potential performance for scientific applications, with an emphasis on Hybrid Memory Cube (HMC) and High Bandwidth Memory (HBM) used by commercial products according to roadmaps already announced.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The intersection of network function virtualisation (NFV) technologies and big data has the potential of revolutionising today's telecommunication networks from deployment to operations resulting in significant reductions in capital expenditure (CAPEX) and operational expenditure, as well as cloud vendor and additional revenue growths for the operators. One of the contributions of this article is the comparisons of the requirements for big data and network virtualisation and the formulation of the key performance indicators for the distributed big data NFVs at the operator's infrastructures. Big data and virtualisation are highly interdependent and their intersections and dependencies are analysed and the potential optimisation gains resulted from open interfaces between big data and carrier networks NFV functional blocks for an adaptive environment are then discussed. Another contribution of this article is a comprehensive discussion on open interface recommendations which enables global collaborative and scalable virtualised big data applications.