969 resultados para Graphics processing units


Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper we present a scalable software architecture for on-line multi-camera video processing, that guarantees a good trade off between computational power, scalability and flexibility. The software system is modular and its main blocks are the Processing Units (PUs), and the Central Unit. The Central Unit works as a supervisor of the running PUs and each PU manages the acquisition phase and the processing phase. Furthermore, an approach to easily parallelize the desired processing application has been presented. In this paper, as case study, we apply the proposed software architecture to a multi-camera system in order to efficiently manage multiple 2D object detection modules in a real-time scenario. System performance has been evaluated under different load conditions such as number of cameras and image sizes. The results show that the software architecture scales well with the number of camera and can easily works with different image formats respecting the real time constraints. Moreover, the parallelization approach can be used in order to speed up the processing tasks with a low level of overhead

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Shading reduces the power output of a photovoltaic (PV) system. The design engineering of PV systems requires modeling and evaluating shading losses. Some PV systems are affected by complex shading scenes whose resulting PV energy losses are very difficult to evaluate with current modeling tools. Several specialized PV design and simulation software include the possibility to evaluate shading losses. They generally possess a Graphical User Interface (GUI) through which the user can draw a 3D shading scene, and then evaluate its corresponding PV energy losses. The complexity of the objects that these tools can handle is relatively limited. We have created a software solution, 3DPV, which allows evaluating the energy losses induced by complex 3D scenes on PV generators. The 3D objects can be imported from specialized 3D modeling software or from a 3D object library. The shadows cast by this 3D scene on the PV generator are then directly evaluated from the Graphics Processing Unit (GPU). Thanks to the recent development of GPUs for the video game industry, the shadows can be evaluated with a very high spatial resolution that reaches well beyond the PV cell level, in very short calculation times. A PV simulation model then translates the geometrical shading into PV energy output losses. 3DPV has been implemented using WebGL, which allows it to run directly from a Web browser, without requiring any local installation from the user. This also allows taken full benefits from the information already available from Internet, such as the 3D object libraries. This contribution describes, step by step, the method that allows 3DPV to evaluate the PV energy losses caused by complex shading. We then illustrate the results of this methodology to several application cases that are encountered in the world of PV systems design. Keywords: 3D, modeling, simulation, GPU, shading, losses, shadow mapping, solar, photovoltaic, PV, WebGL

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Tool path generation is one of the most complex problems in Computer Aided Manufacturing. Although some efficient strategies have been developed, most of them are only useful for standard machining. However, the algorithms used for tool path computation demand a higher computation performance, which makes the implementation on many existing systems very slow or even impractical. Hardware acceleration is an incremental solution that can be cleanly added to these systems while keeping everything else intact. It is completely transparent to the user. The cost is much lower and the development time is much shorter than replacing the computers by faster ones. This paper presents an optimisation that uses a specific graphic hardware approach using the power of multi-core Graphic Processing Units (GPUs) in order to improve the tool path computation. This improvement is applied on a highly accurate and robust tool path generation algorithm. The paper presents, as a case of study, a fully implemented algorithm used for turning lathe machining of shoe lasts. A comparative study will show the gain achieved in terms of total computing time. The execution time is almost two orders of magnitude faster than modern PCs.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

El avance en la potencia de cómputo en nuestros días viene dado por la paralelización del procesamiento, dadas las características que disponen las nuevas arquitecturas de hardware. Utilizar convenientemente este hardware impacta en la aceleración de los algoritmos en ejecución (programas). Sin embargo, convertir de forma adecuada el algoritmo en su forma paralela es complejo, y a su vez, esta forma, es específica para cada tipo de hardware paralelo. En la actualidad los procesadores de uso general más comunes son los multicore, procesadores paralelos, también denominados Symmetric Multi-Processors (SMP). Hoy en día es difícil hallar un procesador para computadoras de escritorio que no tengan algún tipo de paralelismo del caracterizado por los SMP, siendo la tendencia de desarrollo, que cada día nos encontremos con procesadores con mayor numero de cores disponibles. Por otro lado, los dispositivos de procesamiento de video (Graphics Processor Units - GPU), a su vez, han ido desarrollando su potencia de cómputo por medio de disponer de múltiples unidades de procesamiento dentro de su composición electrónica, a tal punto que en la actualidad no es difícil encontrar placas de GPU con capacidad de 200 a 400 hilos de procesamiento paralelo. Estos procesadores son muy veloces y específicos para la tarea que fueron desarrollados, principalmente el procesamiento de video. Sin embargo, como este tipo de procesadores tiene muchos puntos en común con el procesamiento científico, estos dispositivos han ido reorientándose con el nombre de General Processing Graphics Processor Unit (GPGPU). A diferencia de los procesadores SMP señalados anteriormente, las GPGPU no son de propósito general y tienen sus complicaciones para uso general debido al límite en la cantidad de memoria que cada placa puede disponer y al tipo de procesamiento paralelo que debe realizar para poder ser productiva su utilización. Los dispositivos de lógica programable, FPGA, son dispositivos capaces de realizar grandes cantidades de operaciones en paralelo, por lo que pueden ser usados para la implementación de algoritmos específicos, aprovechando el paralelismo que estas ofrecen. Su inconveniente viene derivado de la complejidad para la programación y el testing del algoritmo instanciado en el dispositivo. Ante esta diversidad de procesadores paralelos, el objetivo de nuestro trabajo está enfocado en analizar las características especificas que cada uno de estos tienen, y su impacto en la estructura de los algoritmos para que su utilización pueda obtener rendimientos de procesamiento acordes al número de recursos utilizados y combinarlos de forma tal que su complementación sea benéfica. Específicamente, partiendo desde las características del hardware, determinar las propiedades que el algoritmo paralelo debe tener para poder ser acelerado. Las características de los algoritmos paralelos determinará a su vez cuál de estos nuevos tipos de hardware son los mas adecuados para su instanciación. En particular serán tenidos en cuenta el nivel de dependencia de datos, la necesidad de realizar sincronizaciones durante el procesamiento paralelo, el tamaño de datos a procesar y la complejidad de la programación paralela en cada tipo de hardware.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This thesis describes advances in the characterisation, calibration and data processing of optical coherence tomography (OCT) systems. Femtosecond (fs) laser inscription was used for producing OCT-phantoms. Transparent materials are generally inert to infra-red radiations, but with fs lasers material modification occurs via non-linear processes when the highly focused light source interacts with the materials. This modification is confined to the focal volume and is highly reproducible. In order to select the best inscription parameters, combination of different inscription parameters were tested, using three fs laser systems, with different operating properties, on a variety of materials. This facilitated the understanding of the key characteristics of the produced structures with the aim of producing viable OCT-phantoms. Finally, OCT-phantoms were successfully designed and fabricated in fused silica. The use of these phantoms to characterise many properties (resolution, distortion, sensitivity decay, scan linearity) of an OCT system was demonstrated. Quantitative methods were developed to support the characterisation of an OCT system collecting images from phantoms and also to improve the quality of the OCT images. Characterisation methods include the measurement of the spatially variant resolution (point spread function (PSF) and modulation transfer function (MTF)), sensitivity and distortion. Processing of OCT data is a computer intensive process. Standard central processing unit (CPU) based processing might take several minutes to a few hours to process acquired data, thus data processing is a significant bottleneck. An alternative choice is to use expensive hardware-based processing such as field programmable gate arrays (FPGAs). However, recently graphics processing unit (GPU) based data processing methods have been developed to minimize this data processing and rendering time. These processing techniques include standard-processing methods which includes a set of algorithms to process the raw data (interference) obtained by the detector and generate A-scans. The work presented here describes accelerated data processing and post processing techniques for OCT systems. The GPU based processing developed, during the PhD, was later implemented into a custom built Fourier domain optical coherence tomography (FD-OCT) system. This system currently processes and renders data in real time. Processing throughput of this system is currently limited by the camera capture rate. OCTphantoms have been heavily used for the qualitative characterization and adjustment/ fine tuning of the operating conditions of OCT system. Currently, investigations are under way to characterize OCT systems using our phantoms. The work presented in this thesis demonstrate several novel techniques of fabricating OCT-phantoms and accelerating OCT data processing using GPUs. In the process of developing phantoms and quantitative methods, a thorough understanding and practical knowledge of OCT and fs laser processing systems was developed. This understanding leads to several novel pieces of research that are not only relevant to OCT but have broader importance. For example, extensive understanding of the properties of fs inscribed structures will be useful in other photonic application such as making of phase mask, wave guides and microfluidic channels. Acceleration of data processing with GPUs is also useful in other fields.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

* The following text has been originally published in the Proceedings of the Language Recourses and Evaluation Conference held in Lisbon, Portugal, 2004, under the title of "Towards Intelligent Written Cultural Heritage Processing - Lexical processing". I present here a revised contribution of the aforementioned paper and I add here the latest efforts done in the Center for Computational Linguistic in Prague in the field under discussion.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Machine vision represents a particularly attractive solution for sensing and detecting potential collision-course targets due to the relatively low cost, size, weight, and power requirements of the sensors involved (as opposed to radar). This paper describes the development and evaluation of a vision-based collision detection algorithm suitable for fixed-wing aerial robotics. The system was evaluated using highly realistic vision data of the moments leading up to a collision. Based on the collected data, our detection approaches were able to detect targets at distances ranging from 400m to about 900m. These distances (with some assumptions about closing speeds and aircraft trajectories) translate to an advanced warning of between 8-10 seconds ahead of impact, which approaches the 12.5 second response time recommended for human pilots. We make use of the enormous potential of graphic processing units to achieve processing rates of 30Hz (for images of size 1024-by- 768). Currently, integration in the final platform is under way.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The emergence of pseudo-marginal algorithms has led to improved computational efficiency for dealing with complex Bayesian models with latent variables. Here an unbiased estimator of the likelihood replaces the true likelihood in order to produce a Bayesian algorithm that remains on the marginal space of the model parameter (with latent variables integrated out), with a target distribution that is still the correct posterior distribution. Very efficient proposal distributions can be developed on the marginal space relative to the joint space of model parameter and latent variables. Thus psuedo-marginal algorithms tend to have substantially better mixing properties. However, for pseudo-marginal approaches to perform well, the likelihood has to be estimated rather precisely. This can be difficult to achieve in complex applications. In this paper we propose to take advantage of multiple central processing units (CPUs), that are readily available on most standard desktop computers. Here the likelihood is estimated independently on the multiple CPUs, with the ultimate estimate of the likelihood being the average of the estimates obtained from the multiple CPUs. The estimate remains unbiased, but the variability is reduced. We compare and contrast two different technologies that allow the implementation of this idea, both of which require a negligible amount of extra programming effort. The superior performance of this idea over the standard approach is demonstrated on simulated data from a stochastic volatility model.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A "self-exciting" market is one in which the probability of observing a crash increases in response to the occurrence of a crash. It essentially describes cases where the initial crash serves to weaken the system to some extent, making subsequent crashes more likely. This thesis investigates if equity markets possess this property. A self-exciting extension of the well-known jump-based Bates (1996) model is used as the workhorse model for this thesis, and a particle-filtering algorithm is used to facilitate estimation by means of maximum likelihood. The estimation method is developed so that option prices are easily included in the dataset, leading to higher quality estimates. Equilibrium arguments are used to price the risks associated with the time-varying crash probability, and in turn to motivate a risk-neutral system for use in option pricing. The option pricing function for the model is obtained via the application of widely-used Fourier techniques. An application to S&P500 index returns and a panel of S&P500 index option prices reveals evidence of self excitation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The nonlinear problem of steady free-surface flow past a submerged source is considered as a case study for three-dimensional ship wave problems. Of particular interest is the distinctive wedge-shaped wave pattern that forms on the surface of the fluid. By reformulating the governing equations with a standard boundary-integral method, we derive a system of nonlinear algebraic equations that enforce a singular integro-differential equation at each midpoint on a two-dimensional mesh. Our contribution is to solve the system of equations with a Jacobian-free Newton-Krylov method together with a banded preconditioner that is carefully constructed with entries taken from the Jacobian of the linearised problem. Further, we are able to utilise graphics processing unit acceleration to significantly increase the grid refinement and decrease the run-time of our solutions in comparison to schemes that are presently employed in the literature. Our approach provides opportunities to explore the nonlinear features of three-dimensional ship wave patterns, such as the shape of steep waves close to their limiting configuration, in a manner that has been possible in the two-dimensional analogue for some time.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The evolution of technological systems is hindered by systemic components, referred to as reverse salients, which fail to deliver the necessary level of technological performance thereby inhibiting the performance delivery of the system as a whole. This paper develops a performance gap measure of reverse salience and applies this measurement in the study of the PC (personal computer) technological system, focusing on the evolutions of firstly the CPU (central processing unit) and PC game sub-systems, and secondly the GPU (graphics processing unit) and PC game sub-systems. The measurement of the temporal behavior of reverse salience indicates that the PC game sub-system is the reverse salient, continuously trailing behind the technological performance of the CPU and GPU sub-systems from 1996 through 2006. The technological performance of the PC game sub-system as a reverse salient trails that of the CPU sub-system by up to 2300 MHz with a gradually decreasing performance disparity in recent years. In contrast, the dynamics of the PC game sub-system as a reverse salient trails the GPU sub-system with an ever increasing performance gap throughout the timeframe of analysis. In addition, we further discuss the research and managerial implications of our findings.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A distributed system is a collection of networked autonomous processing units which must work in a cooperative manner. Currently, large-scale distributed systems, such as various telecommunication and computer networks, are abundant and used in a multitude of tasks. The field of distributed computing studies what can be computed efficiently in such systems. Distributed systems are usually modelled as graphs where nodes represent the processors and edges denote communication links between processors. This thesis concentrates on the computational complexity of the distributed graph colouring problem. The objective of the graph colouring problem is to assign a colour to each node in such a way that no two nodes connected by an edge share the same colour. In particular, it is often desirable to use only a small number of colours. This task is a fundamental symmetry-breaking primitive in various distributed algorithms. A graph that has been coloured in this manner using at most k different colours is said to be k-coloured. This work examines the synchronous message-passing model of distributed computation: every node runs the same algorithm, and the system operates in discrete synchronous communication rounds. During each round, a node can communicate with its neighbours and perform local computation. In this model, the time complexity of a problem is the number of synchronous communication rounds required to solve the problem. It is known that 3-colouring any k-coloured directed cycle requires at least ½(log* k - 3) communication rounds and is possible in ½(log* k + 7) communication rounds for all k ≥ 3. This work shows that for any k ≥ 3, colouring a k-coloured directed cycle with at most three colours is possible in ½(log* k + 3) rounds. In contrast, it is also shown that for some values of k, colouring a directed cycle with at most three colours requires at least ½(log* k + 1) communication rounds. Furthermore, in the case of directed rooted trees, reducing a k-colouring into a 3-colouring requires at least log* k + 1 rounds for some k and possible in log* k + 3 rounds for all k ≥ 3. The new positive and negative results are derived using computational methods, as the existence of distributed colouring algorithms corresponds to the colourability of so-called neighbourhood graphs. The colourability of these graphs is analysed using Boolean satisfiability (SAT) solvers. Finally, this thesis shows that similar methods are applicable in capturing the existence of distributed algorithms for other graph problems, such as the maximal matching problem.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Real-time simulation of deformable solids is essential for some applications such as biological organ simulations for surgical simulators. In this work, deformable solids are approximated to be linear elastic, and an easy and straight forward numerical technique, the Finite Point Method (FPM), is used to model three dimensional linear elastostatics. Graphics Processing Unit (GPU) is used to accelerate computations. Results show that the Finite Point Method, together with GPU, can compute three dimensional linear elastostatic responses of solids at rates suitable for real-time graphics, for solids represented by reasonable number of points.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Real-time simulation of deformable solids is essential for some applications such as biological organ simulations for surgical simulators. In this work, deformable solids are approximated to be linear elastic, and an easy and straight forward numerical technique, the Finite Point Method (FPM), is used to model three dimensional linear elastostatics. Graphics Processing Unit (GPU) is used to accelerate computations. Results show that the Finite Point Method, together with GPU, can compute three dimensional linear elastostatic responses of solids at rates suitable for real-time graphics, for solids represented by reasonable number of points.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this work, first a Fortran code is developed for three dimensional linear elastostatics using constant boundary elements; the code is based on a MATLAB code developed by the author earlier. Next, the code is parallelized using BLACS, MPI, and ScaLAPACK. Later, the parallelized code is used to demonstrate the usefulness of the Boundary Element Method (BEM) as applied to the realtime computational simulation of biological organs, while focusing on the speed and accuracy offered by BEM. A computer cluster is used in this part of the work. The commercial software package ANSYS is used to obtain the `exact' solution against which the solution from BEM is compared; analytical solutions, wherever available, are also used to establish the accuracy of BEM. A pig liver is the biological organ considered. Next, instead of the computer cluster, a Graphics Processing Unit (GPU) is used as the parallel hardware. Results indicate that BEM is an interesting choice for the simulation of biological organs. Although the use of BEM for the simulation of biological organs is not new, the results presented in the present study are not found elsewhere in the literature. Also, a serial MATLAB code, and both serial and parallel versions of a Fortran code, which can solve three dimensional (3D) linear elastostatic problems using constant boundary elements, are provided as supplementary files that can be freely downloaded.