991 resultados para graphics processing unit (GPU)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

For broadcasting purposes MIXED REALITY, the combination of real and virtual scene content, has become ubiquitous nowadays. Mixed Reality recording still requires expensive studio setups and is often limited to simple color keying. We present a system for Mixed Reality applications which uses depth keying and provides threedimensional mixing of real and artificial content. It features enhanced realism through automatic shadow computation which we consider a core issue to obtain realism and a convincing visual perception, besides the correct alignment of the two modalities and correct occlusion handling. Furthermore we present a possibility to support placement of virtual content in the scene. Core feature of our system is the incorporation of a TIME-OF-FLIGHT (TOF)-camera device. This device delivers real-time depth images of the environment at a reasonable resolution and quality. This camera is used to build a static environment model and it also allows correct handling of mutual occlusions between real and virtual content, shadow computation and enhanced content planning. The presented system is inexpensive, compact, mobile, flexible and provides convenient calibration procedures. Chroma-keying is replaced by depth-keying which is efficiently performed on the GRAPHICS PROCESSING UNIT (GPU) by the usage of an environment model and the current ToF-camera image. Automatic extraction and tracking of dynamic scene content is herewith performed and this information is used for planning and alignment of virtual content. An additional sustainable feature is that depth maps of the mixed content are available in real-time, which makes the approach suitable for future 3DTV productions. The presented paper gives an overview of the whole system approach including camera calibration, environment model generation, real-time keying and mixing of virtual and real content, shadowing for virtual content and dynamic object tracking for content planning.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a non-rigid free-from 2D-3D registration approach using statistical deformation model (SDM). In our approach the SDM is first constructed from a set of training data using a non-rigid registration algorithm based on b-spline free-form deformation to encode a priori information about the underlying anatomy. A novel intensity-based non-rigid 2D-3D registration algorithm is then presented to iteratively fit the 3D b-spline-based SDM to the 2D X-ray images of an unseen subject, which requires a computationally expensive inversion of the instantiated deformation in each iteration. In this paper, we propose to solve this challenge with a fast B-spline pseudo-inversion algorithm that is implemented on graphics processing unit (GPU). Experiments conducted on C-arm and X-ray images of cadaveric femurs demonstrate the efficacy of the present approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Self-organising neural models have the ability to provide a good representation of the input space. In particular the Growing Neural Gas (GNG) is a suitable model because of its flexibility, rapid adaptation and excellent quality of representation. However, this type of learning is time-consuming, especially for high-dimensional input data. Since real applications often work under time constraints, it is necessary to adapt the learning process in order to complete it in a predefined time. This paper proposes a Graphics Processing Unit (GPU) parallel implementation of the GNG with Compute Unified Device Architecture (CUDA). In contrast to existing algorithms, the proposed GPU implementation allows the acceleration of the learning process keeping a good quality of representation. Comparative experiments using iterative, parallel and hybrid implementations are carried out to demonstrate the effectiveness of CUDA implementation. The results show that GNG learning with the proposed implementation achieves a speed-up of 6× compared with the single-threaded CPU implementation. GPU implementation has also been applied to a real application with time constraints: acceleration of 3D scene reconstruction for egomotion, in order to validate the proposal.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Femtosecond laser microfabrication has emerged over the last decade as a 3D flexible technology in photonics. Numerical simulations provide an important insight into spatial and temporal beam and pulse shaping during the course of extremely intricate nonlinear propagation (see e.g. [1,2]). Electromagnetics of such propagation is typically described in the form of the generalized Non-Linear Schrdinger Equation (NLSE) coupled with Drude model for plasma [3]. In this paper we consider a multi-threaded parallel numerical solution for a specific model which describes femtosecond laser pulse propagation in transparent media [4, 5]. However our approach can be extended to similar models. The numerical code is implemented in NVIDIA Graphics Processing Unit (GPU) which provides an effitient hardware platform for multi-threded computing. We compare the performance of the described below parallel code implementated for GPU using CUDA programming interface [3] with a serial CPU version used in our previous papers [4,5]. © 2011 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an approach to create what we have called a Unified Sentiment Lexicon (USL). This approach aims at aligning, unifying, and expanding the set of sentiment lexicons which are available on the web in order to increase their robustness of coverage. One problem related to the task of the automatic unification of different scores of sentiment lexicons is that there are multiple lexical entries for which the classification of positive, negative, or neutral {P, Z, N} depends on the unit of measurement used in the annotation methodology of the source sentiment lexicon. Our USL approach computes the unified strength of polarity of each lexical entry based on the Pearson correlation coefficient which measures how correlated lexical entries are with a value between 1 and -1, where 1 indicates that the lexical entries are perfectly correlated, 0 indicates no correlation, and -1 means they are perfectly inversely correlated and so is the UnifiedMetrics procedure for CPU and GPU, respectively. Another problem is the high processing time required for computing all the lexical entries in the unification task. Thus, the USL approach computes a subset of lexical entries in each of the 1344 GPU cores and uses parallel processing in order to unify 155802 lexical entries. The results of the analysis conducted using the USL approach show that the USL has 95.430 lexical entries, out of which there are 35.201 considered to be positive, 22.029 negative, and 38.200 neutral. Finally, the runtime was 10 minutes for 95.430 lexical entries; this allows a reduction of the time computing for the UnifiedMetrics by 3 times.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Endmember extraction (EE) is a fundamental and crucial task in hyperspectral unmixing. Among other methods vertex component analysis ( VCA) has become a very popular and useful tool to unmix hyperspectral data. VCA is a geometrical based method that extracts endmember signatures from large hyperspectral datasets without the use of any a priori knowledge about the constituent spectra. Many Hyperspectral imagery applications require a response in real time or near-real time. Thus, to met this requirement this paper proposes a parallel implementation of VCA developed for graphics processing units. The impact on the complexity and on the accuracy of the proposed parallel implementation of VCA is examined using both simulated and real hyperspectral datasets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação para obtenção do Grau de Mestre em Engenharia Biomédica

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the present work the benefits of using graphics processing units (GPU) to aid the design of complex geometry profile extrusion dies, are studied. For that purpose, a3Dfinite volume based code that employs unstructured meshes to solve and couple the continuity, momentum and energy conservation equations governing the fluid flow, together with aconstitutive equation, was used. To evaluate the possibility of reducing the calculation time spent on the numerical calculations, the numerical code was parallelized in the GPU, using asimple programing approach without complex memory manipulations. For verificationpurposes, simulations were performed for three benchmark problems: Poiseuille flow, lid-driven cavity flow and flow around acylinder. Subsequently, the code was used on the design of two real life extrusion dies for the production of a medical catheter and a wood plastic composite decking profile. To evaluate the benefits, the results obtained with the GPU parallelized code were compared, in terms of speedup, with a serial implementation of the same code, that traditionally runs on the central processing unit (CPU). The results obtained show that, even with the simple parallelization approach employed, it was possible to obtain a significant reduction of the computation times.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Pós-graduação em Ciência da Computação - IBILCE

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Técnicas de reconhecimento de padrões tem como principal objetivo classificar um conjunto de amostras, sendo o processo de aprendizado a fase de maior consumo de tempo. O problema pode piorar em ferramentas de classificação interativas, o que pode ser inaceitável para grandes bases de dados. Um exemplo de classificador é o baseado em Floresta de Caminhos Ótimos [8] - OPF. Dado que muitos trabalhos tem sido orientados à implementação de algoritmos de reconhecimento de padrões em ambiente General Purpose Graphics Processing Unit - GPGPU, o presente estudo objetivou a implementação da etapa de treinamento do classificador Floresta de Caminhos Ótimos em CUDA, visando aumentar a sua eficiência. A otimização do classificador em CUDA demonstrou uma fase de treinamento mais rápida que a versão original.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Microprocessori basati su singolo processore (CPU), hanno visto una rapida crescita di performances ed un abbattimento dei costi per circa venti anni. Questi microprocessori hanno portato una potenza di calcolo nell’ordine del GFLOPS (Giga Floating Point Operation per Second) sui PC Desktop e centinaia di GFLOPS su clusters di server. Questa ascesa ha portato nuove funzionalità nei programmi, migliori interfacce utente e tanti altri vantaggi. Tuttavia questa crescita ha subito un brusco rallentamento nel 2003 a causa di consumi energetici sempre più elevati e problemi di dissipazione termica, che hanno impedito incrementi di frequenza di clock. I limiti fisici del silicio erano sempre più vicini. Per ovviare al problema i produttori di CPU (Central Processing Unit) hanno iniziato a progettare microprocessori multicore, scelta che ha avuto un impatto notevole sulla comunità degli sviluppatori, abituati a considerare il software come una serie di comandi sequenziali. Quindi i programmi che avevano sempre giovato di miglioramenti di prestazioni ad ogni nuova generazione di CPU, non hanno avuto incrementi di performance, in quanto essendo eseguiti su un solo core, non beneficiavano dell’intera potenza della CPU. Per sfruttare appieno la potenza delle nuove CPU la programmazione concorrente, precedentemente utilizzata solo su sistemi costosi o supercomputers, è diventata una pratica sempre più utilizzata dagli sviluppatori. Allo stesso tempo, l’industria videoludica ha conquistato una fetta di mercato notevole: solo nel 2013 verranno spesi quasi 100 miliardi di dollari fra hardware e software dedicati al gaming. Le software houses impegnate nello sviluppo di videogames, per rendere i loro titoli più accattivanti, puntano su motori grafici sempre più potenti e spesso scarsamente ottimizzati, rendendoli estremamente esosi in termini di performance. Per questo motivo i produttori di GPU (Graphic Processing Unit), specialmente nell’ultimo decennio, hanno dato vita ad una vera e propria rincorsa alle performances che li ha portati ad ottenere dei prodotti con capacità di calcolo vertiginose. Ma al contrario delle CPU che agli inizi del 2000 intrapresero la strada del multicore per continuare a favorire programmi sequenziali, le GPU sono diventate manycore, ovvero con centinaia e centinaia di piccoli cores che eseguono calcoli in parallelo. Questa immensa capacità di calcolo può essere utilizzata in altri campi applicativi? La risposta è si e l’obiettivo di questa tesi è proprio quello di constatare allo stato attuale, in che modo e con quale efficienza pùo un software generico, avvalersi dell’utilizzo della GPU invece della CPU.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many methodologies dealing with prediction or simulation of soft tissue deformations on medical image data require preprocessing of the data in order to produce a different shape representation that complies with standard methodologies, such as mass–spring networks, finite element method s (FEM). On the other hand, methodologies working directly on the image space normally do not take into account mechanical behavior of tissues and tend to lack physics foundations driving soft tissue deformations. This chapter presents a method to simulate soft tissue deformations based on coupled concepts from image analysis and mechanics theory. The proposed methodology is based on a robust stochastic approach that takes into account material properties retrieved directly from the image, concepts from continuum mechanics and FEM. The optimization framework is solved within a hierarchical Markov random field (HMRF) which is implemented on the graphics processor unit (GPU See Graphics processing unit ).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Photocopy of typescript.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hyperspectral imaging can be used for object detection and for discriminating between different objects based on their spectral characteristics. One of the main problems of hyperspectral data analysis is the presence of mixed pixels, due to the low spatial resolution of such images. This means that several spectrally pure signatures (endmembers) are combined into the same mixed pixel. Linear spectral unmixing follows an unsupervised approach which aims at inferring pure spectral signatures and their material fractions at each pixel of the scene. The huge data volumes acquired by such sensors put stringent requirements on processing and unmixing methods. This paper proposes an efficient implementation of a unsupervised linear unmixing method on GPUs using CUDA. The method finds the smallest simplex by solving a sequence of nonsmooth convex subproblems using variable splitting to obtain a constraint formulation, and then applying an augmented Lagrangian technique. The parallel implementation of SISAL presented in this work exploits the GPU architecture at low level, using shared memory and coalesced accesses to memory. The results herein presented indicate that the GPU implementation can significantly accelerate the method's execution over big datasets while maintaining the methods accuracy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Remote hyperspectral sensors collect large amounts of data per flight usually with low spatial resolution. It is known that the bandwidth connection between the satellite/airborne platform and the ground station is reduced, thus a compression onboard method is desirable to reduce the amount of data to be transmitted. This paper presents a parallel implementation of an compressive sensing method, called parallel hyperspectral coded aperture (P-HYCA), for graphics processing units (GPU) using the compute unified device architecture (CUDA). This method takes into account two main properties of hyperspectral dataset, namely the high correlation existing among the spectral bands and the generally low number of endmembers needed to explain the data, which largely reduces the number of measurements necessary to correctly reconstruct the original data. Experimental results conducted using synthetic and real hyperspectral datasets on two different GPU architectures by NVIDIA: GeForce GTX 590 and GeForce GTX TITAN, reveal that the use of GPUs can provide real-time compressive sensing performance. The achieved speedup is up to 20 times when compared with the processing time of HYCA running on one core of the Intel i7-2600 CPU (3.4GHz), with 16 Gbyte memory.