17 resultados para Graphics processing units

em Universidad Politécnica de Madrid


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an approach to create what we have called a Unified Sentiment Lexicon (USL). This approach aims at aligning, unifying, and expanding the set of sentiment lexicons which are available on the web in order to increase their robustness of coverage. One problem related to the task of the automatic unification of different scores of sentiment lexicons is that there are multiple lexical entries for which the classification of positive, negative, or neutral {P, Z, N} depends on the unit of measurement used in the annotation methodology of the source sentiment lexicon. Our USL approach computes the unified strength of polarity of each lexical entry based on the Pearson correlation coefficient which measures how correlated lexical entries are with a value between 1 and -1, where 1 indicates that the lexical entries are perfectly correlated, 0 indicates no correlation, and -1 means they are perfectly inversely correlated and so is the UnifiedMetrics procedure for CPU and GPU, respectively. Another problem is the high processing time required for computing all the lexical entries in the unification task. Thus, the USL approach computes a subset of lexical entries in each of the 1344 GPU cores and uses parallel processing in order to unify 155802 lexical entries. The results of the analysis conducted using the USL approach show that the USL has 95.430 lexical entries, out of which there are 35.201 considered to be positive, 22.029 negative, and 38.200 neutral. Finally, the runtime was 10 minutes for 95.430 lexical entries; this allows a reduction of the time computing for the UnifiedMetrics by 3 times.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La evolución de los teléfonos móviles inteligentes, dotados de cámaras digitales, está provocando una creciente demanda de aplicaciones cada vez más complejas que necesitan algoritmos de visión artificial en tiempo real; puesto que el tamaño de las señales de vídeo no hace sino aumentar y en cambio el rendimiento de los procesadores de un solo núcleo se ha estancado, los nuevos algoritmos que se diseñen para visión artificial han de ser paralelos para poder ejecutarse en múltiples procesadores y ser computacionalmente escalables. Una de las clases de procesadores más interesantes en la actualidad se encuentra en las tarjetas gráficas (GPU), que son dispositivos que ofrecen un alto grado de paralelismo, un excelente rendimiento numérico y una creciente versatilidad, lo que los hace interesantes para llevar a cabo computación científica. En esta tesis se exploran dos aplicaciones de visión artificial que revisten una gran complejidad computacional y no pueden ser ejecutadas en tiempo real empleando procesadores tradicionales. En cambio, como se demuestra en esta tesis, la paralelización de las distintas subtareas y su implementación sobre una GPU arrojan los resultados deseados de ejecución con tasas de refresco interactivas. Asimismo, se propone una técnica para la evaluación rápida de funciones de complejidad arbitraria especialmente indicada para su uso en una GPU. En primer lugar se estudia la aplicación de técnicas de síntesis de imágenes virtuales a partir de únicamente dos cámaras lejanas y no paralelas—en contraste con la configuración habitual en TV 3D de cámaras cercanas y paralelas—con información de color y profundidad. Empleando filtros de mediana modificados para la elaboración de un mapa de profundidad virtual y proyecciones inversas, se comprueba que estas técnicas son adecuadas para una libre elección del punto de vista. Además, se demuestra que la codificación de la información de profundidad con respecto a un sistema de referencia global es sumamente perjudicial y debería ser evitada. Por otro lado se propone un sistema de detección de objetos móviles basado en técnicas de estimación de densidad con funciones locales. Este tipo de técnicas es muy adecuada para el modelado de escenas complejas con fondos multimodales, pero ha recibido poco uso debido a su gran complejidad computacional. El sistema propuesto, implementado en tiempo real sobre una GPU, incluye propuestas para la estimación dinámica de los anchos de banda de las funciones locales, actualización selectiva del modelo de fondo, actualización de la posición de las muestras de referencia del modelo de primer plano empleando un filtro de partículas multirregión y selección automática de regiones de interés para reducir el coste computacional. Los resultados, evaluados sobre diversas bases de datos y comparados con otros algoritmos del estado del arte, demuestran la gran versatilidad y calidad de la propuesta. Finalmente se propone un método para la aproximación de funciones arbitrarias empleando funciones continuas lineales a tramos, especialmente indicada para su implementación en una GPU mediante el uso de las unidades de filtraje de texturas, normalmente no utilizadas para cómputo numérico. La propuesta incluye un riguroso análisis matemático del error cometido en la aproximación en función del número de muestras empleadas, así como un método para la obtención de una partición cuasióptima del dominio de la función para minimizar el error. ABSTRACT The evolution of smartphones, all equipped with digital cameras, is driving a growing demand for ever more complex applications that need to rely on real-time computer vision algorithms. However, video signals are only increasing in size, whereas the performance of single-core processors has somewhat stagnated in the past few years. Consequently, new computer vision algorithms will need to be parallel to run on multiple processors and be computationally scalable. One of the most promising classes of processors nowadays can be found in graphics processing units (GPU). These are devices offering a high parallelism degree, excellent numerical performance and increasing versatility, which makes them interesting to run scientific computations. In this thesis, we explore two computer vision applications with a high computational complexity that precludes them from running in real time on traditional uniprocessors. However, we show that by parallelizing subtasks and implementing them on a GPU, both applications attain their goals of running at interactive frame rates. In addition, we propose a technique for fast evaluation of arbitrarily complex functions, specially designed for GPU implementation. First, we explore the application of depth-image–based rendering techniques to the unusual configuration of two convergent, wide baseline cameras, in contrast to the usual configuration used in 3D TV, which are narrow baseline, parallel cameras. By using a backward mapping approach with a depth inpainting scheme based on median filters, we show that these techniques are adequate for free viewpoint video applications. In addition, we show that referring depth information to a global reference system is ill-advised and should be avoided. Then, we propose a background subtraction system based on kernel density estimation techniques. These techniques are very adequate for modelling complex scenes featuring multimodal backgrounds, but have not been so popular due to their huge computational and memory complexity. The proposed system, implemented in real time on a GPU, features novel proposals for dynamic kernel bandwidth estimation for the background model, selective update of the background model, update of the position of reference samples of the foreground model using a multi-region particle filter, and automatic selection of regions of interest to reduce computational cost. The results, evaluated on several databases and compared to other state-of-the-art algorithms, demonstrate the high quality and versatility of our proposal. Finally, we propose a general method for the approximation of arbitrarily complex functions using continuous piecewise linear functions, specially formulated for GPU implementation by leveraging their texture filtering units, normally unused for numerical computation. Our proposal features a rigorous mathematical analysis of the approximation error in function of the number of samples, as well as a method to obtain a suboptimal partition of the domain of the function to minimize approximation error.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Graphics Processing Units have become a booster for the microelectronics industry. However, due to intellectual property issues, there is a serious lack of information on implementation details of the hardware architecture that is behind GPUs. For instance, the way texture is handled and decompressed in a GPU to reduce bandwidth usage has never been dealt with in depth from a hardware point of view. This work addresses a comparative study on the hardware implementation of different texture decompression algorithms for both conventional (PCs and video game consoles) and mobile platforms. Circuit synthesis is performed targeting both a reconfigurable hardware platform and a 90nm standard cell library. Area-delay trade-offs have been extensively analyzed, which allows us to compare the complexity of decompressors and thus determine suitability of algorithms for systems with limited hardware resources.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper outlines the problems found in the parallelization of SPH (Smoothed Particle Hydrodynamics) algorithms using Graphics Processing Units. Different results of some parallel GPU implementations in terms of the speed-up and the scalability compared to the CPU sequential codes are shown. The most problematic stage in the GPU-SPH algorithms is the one responsible for locating neighboring particles and building the vectors where this information is stored, since these specific algorithms raise many dificulties for a data-level parallelization. Because of the fact that the neighbor location using linked lists does not show enough data-level parallelism, two new approaches have been pro- posed to minimize bank conflicts in the writing and subsequent reading of the neighbor lists. The first strategy proposes an efficient coordination between CPU-GPU, using GPU algorithms for those stages that allow a straight forward parallelization, and sequential CPU algorithms for those instructions that involve some kind of vector reduction. This coordination provides a relatively orderly reading of the neighbor lists in the interactions stage, achieving a speed-up factor of x47 in this stage. However, since the construction of the neighbor lists is quite expensive, it is achieved an overall speed-up of x41. The second strategy seeks to maximize the use of the GPU in the neighbor's location process by executing a specific vector sorting algorithm that allows some data-level parallelism. Al- though this strategy has succeeded in improving the speed-up on the stage of neighboring location, the global speed-up on the interactions stage falls, due to inefficient reading of the neighbor vectors. Some changes to these strategies are proposed, aimed at maximizing the computational load of the GPU and using the GPU texture-units, in order to reach the maximum speed-up for such codes. Different practical applications have been added to the mentioned GPU codes. First, the classical dam-break problem is studied. Second, the wave impact of the sloshing fluid contained in LNG vessel tanks is also simulated as a practical example of particle methods

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The analysis of complex nonlinear systems is often carried out using simpler piecewise linear representations of them. A principled and practical technique is proposed to linearize and evaluate arbitrary continuous nonlinear functions using polygonal (continuous piecewise linear) models under the L1 norm. A thorough error analysis is developed to guide an optimal design of two kinds of polygonal approximations in the asymptotic case of a large budget of evaluation subintervals N. The method allows the user to obtain the level of linearization (N) for a target approximation error and vice versa. It is suitable for, but not limited to, an efficient implementation in modern Graphics Processing Units (GPUs), allowing real-time performance of computationally demanding applications. The quality and efficiency of the technique has been measured in detail on two nonlinear functions that are widely used in many areas of scientific computing and are expensive to evaluate.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Situado en el límite entre Ingeniería, Informática y Biología, la mecánica computacional de las neuronas aparece como un nuevo campo interdisciplinar que potencialmente puede ser capaz de abordar problemas clínicos desde una perspectiva diferente. Este campo es multiescala por naturaleza, yendo desde la nanoescala (como, por ejemplo, los dímeros de tubulina) a la macroescala (como, por ejemplo, el tejido cerebral), y tiene como objetivo abordar problemas que son complejos, y algunas veces imposibles, de estudiar con medios experimentales. La modelización computacional ha sido ampliamente empleada en aplicaciones Neurocientíficas tan diversas como el crecimiento neuronal o la propagación de los potenciales de acción compuestos. Sin embargo, en la mayoría de los enfoques de modelización hechos hasta ahora, la interacción entre la célula y el medio/estímulo que la rodea ha sido muy poco explorada. A pesar de la tremenda importancia de esa relación en algunos desafíos médicos—como, por ejemplo, lesiones traumáticas en el cerebro, cáncer, la enfermedad del Alzheimer—un puente que relacione las propiedades electrofisiológicas-químicas y mecánicas desde la escala molecular al nivel celular todavía no existe. Con ese objetivo, esta investigación propone un marco computacional multiescala particularizado para dos escenarios respresentativos: el crecimiento del axón y el acomplamiento electrofisiológicomecánico de las neuritas. En el primer caso, se explora la relación entre los constituyentes moleculares del axón durante su crecimiento y sus propiedades mecánicas resultantes, mientras que en el último, un estímulo mecánico provoca deficiencias funcionales a nivel celular como consecuencia de sus alteraciones electrofisiológicas-químicas. La modelización computacional empleada en este trabajo es el método de las diferencias finitas, y es implementada en un nuevo programa llamado Neurite. Aunque el método de los elementos finitos es también explorado en parte de esta investigación, el método de las diferencias finitas tiene la flexibilidad y versatilidad necesaria para implementar mode los biológicos, así como la simplicidad matemática para extenderlos a simulaciones a gran escala con un coste computacional bajo. Centrándose primero en el efecto de las propiedades electrofisiológicas-químicas sobre las propiedades mecánicas, una versión adaptada de Neurite es desarrollada para simular la polimerización de los microtúbulos en el crecimiento del axón y proporcionar las propiedades mecánicas como función de la ocupación de los microtúbulos. Después de calibrar el modelo de crecimiento del axón frente a resultados experimentales disponibles en la literatura, las características mecánicas pueden ser evaluadas durante la simulación. Las propiedades mecánicas del axón muestran variaciones dramáticas en la punta de éste, donde el cono de crecimiento soporta las señales químicas y mecánicas. Bansándose en el conocimiento ganado con el modelo de diferencias finitas, y con el objetivo de ir de 1D a 3D, este esquema preliminar pero de una naturaleza innovadora allana el camino a futuros estudios con el método de los elementos finitos. Centrándose finalmente en el efecto de las propiedades mecánicas sobre las propiedades electrofisiológicas- químicas, Neurite es empleado para relacionar las cargas mecánicas macroscópicas con las deformaciones y velocidades de deformación a escala microscópica, y simular la propagación de la señal eléctrica en las neuritas bajo carga mecánica. Las simulaciones fueron calibradas con resultados experimentales publicados en la literatura, proporcionando, por tanto, un modelo capaz de predecir las alteraciones de las funciones electrofisiológicas neuronales bajo cargas externas dañinas, y uniendo lesiones mecánicas con las correspondientes deficiencias funcionales. Para abordar simulaciones a gran escala, aunque otras arquitecturas avanzadas basadas en muchos núcleos integrados (MICs) fueron consideradas, los solvers explícito e implícito se implementaron en unidades de procesamiento central (CPU) y unidades de procesamiento gráfico (GPUs). Estudios de escalabilidad fueron llevados acabo para ambas implementaciones mostrando resultados prometedores para casos de simulaciones extremadamente grandes con GPUs. Esta tesis abre la vía para futuros modelos mecánicos con el objetivo de unir las propiedades electrofisiológicas-químicas con las propiedades mecánicas. El objetivo general es mejorar el conocimiento de las comunidades médicas y de bioingeniería sobre la mecánica de las neuronas y las deficiencias funcionales que aparecen de los daños producidos por traumatismos mecánicos, como lesiones traumáticas en el cerebro, o enfermedades neurodegenerativas como la enfermedad del Alzheimer. ABSTRACT Sitting at the interface between Engineering, Computer Science and Biology, Computational Neuron Mechanics appears as a new interdisciplinary field potentially able to tackle clinical problems from a new perspective. This field is multiscale by nature, ranging from the nanoscale (e.g., tubulin dimers) to the macroscale (e.g., brain tissue), and aims at tackling problems that are complex, and sometime impossible, to study through experimental means. Computational modeling has been widely used in different Neuroscience applications as diverse as neuronal growth or compound action potential propagation. However, in the majority of the modeling approaches done in this field to date, the interactions between the cell and its surrounding media/stimulus have been rarely explored. Despite of the tremendous importance of such relationship in several medical challenges—e.g., traumatic brain injury (TBI), cancer, Alzheimer’s disease (AD)—a bridge between electrophysiological-chemical and mechanical properties of neurons from the molecular scale to the cell level is still lacking. To this end, this research proposes a multiscale computational framework particularized for two representative scenarios: axon growth and electrophysiological-mechanical coupling of neurites. In the former case, the relation between the molecular constituents of the axon during its growth and its resulting mechanical properties is explored, whereas in the latter, a mechanical stimulus provokes functional deficits at cell level as a consequence of its electrophysiological-chemical alterations. The computational modeling approach chosen in this work is the finite difference method (FDM), and was implemented in a new program called Neurite. Although the finite element method (FEM) is also explored as part of this research, the FDM provides the necessary flexibility and versatility to implement biological models, as well as the mathematical simplicity to extend them to large scale simulations with a low computational cost. Focusing first on the effect of electrophysiological-chemical properties on the mechanical proper ties, an adaptation of Neurite was developed to simulate microtubule polymerization in axonal growth and provide the axon mechanical properties as a function of microtubule occupancy. After calibrating the axon growth model against experimental results available in the literature, the mechanical characteristics can be tracked during the simulation. The axon mechanical properties show dramatic variations at the tip of the axon, where the growth cone supports the chemical and mechanical signaling. Based on the knowledge gained from the FDM scheme, and in order to go from 1D to 3D, this preliminary yet novel scheme paves the road for future studies with FEM. Focusing then on the effect of mechanical properties on the electrophysiological-chemical properties, Neurite was used to relate macroscopic mechanical loading to microscopic strains and strain rates, and simulate the electrical signal propagation along neurites under mechanical loading. The simulations were calibrated against experimental results published in the literature, thus providing a model able to predict the alteration of neuronal electrophysiological function under external damaging load, and linking mechanical injuries to subsequent acute functional deficits. To undertake large scale simulations, although other state-of-the-art architectures based on many integrated cores (MICs) were considered, the explicit and implicit solvers were implemented for central processing units (CPUs) and graphics processing units (GPUs). Scalability studies were done for both implementations showing promising results for extremely large scale simulations with GPUs. This thesis opens the avenue for future mechanical modeling approaches aimed at linking electrophysiological- chemical properties to mechanical properties. Its overarching goal is to enhance the bioengineering and medical communities knowledge on neuronal mechanics and functional deficits arising from damages produced by direct mechanical insults, such as TBI, or neurodegenerative evolving illness, such as AD.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the growing body of research on traumatic brain injury and spinal cord injury, computational neuroscience has recently focused its modeling efforts on neuronal functional deficits following mechanical loading. However, in most of these efforts, cell damage is generally only characterized by purely mechanistic criteria, function of quantities such as stress, strain or their corresponding rates. The modeling of functional deficits in neurites as a consequence of macroscopic mechanical insults has been rarely explored. In particular, a quantitative mechanically based model of electrophysiological impairment in neuronal cells has only very recently been proposed (Jerusalem et al., 2013). In this paper, we present the implementation details of Neurite: the finite difference parallel program used in this reference. Following the application of a macroscopic strain at a given strain rate produced by a mechanical insult, Neurite is able to simulate the resulting neuronal electrical signal propagation, and thus the corresponding functional deficits. The simulation of the coupled mechanical and electrophysiological behaviors requires computational expensive calculations that increase in complexity as the network of the simulated cells grows. The solvers implemented in Neurite-explicit and implicit-were therefore parallelized using graphics processing units in order to reduce the burden of the simulation costs of large scale scenarios. Cable Theory and Hodgkin-Huxley models were implemented to account for the electrophysiological passive and active regions of a neurite, respectively, whereas a coupled mechanical model accounting for the neurite mechanical behavior within its surrounding medium was adopted as a link between lectrophysiology and mechanics (Jerusalem et al., 2013). This paper provides the details of the parallel implementation of Neurite, along with three different application examples: a long myelinated axon, a segmented dendritic tree, and a damaged axon. The capabilities of the program to deal with large scale scenarios, segmented neuronal structures, and functional deficits under mechanical loading are specifically highlighted.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many computer vision and human-computer interaction applications developed in recent years need evaluating complex and continuous mathematical functions as an essential step toward proper operation. However, rigorous evaluation of this kind of functions often implies a very high computational cost, unacceptable in real-time applications. To alleviate this problem, functions are commonly approximated by simpler piecewise-polynomial representations. Following this idea, we propose a novel, efficient, and practical technique to evaluate complex and continuous functions using a nearly optimal design of two types of piecewise linear approximations in the case of a large budget of evaluation subintervals. To this end, we develop a thorough error analysis that yields asymptotically tight bounds to accurately quantify the approximation performance of both representations. It provides an improvement upon previous error estimates and allows the user to control the trade-off between the approximation error and the number of evaluation subintervals. To guarantee real-time operation, the method is suitable for, but not limited to, an efficient implementation in modern Graphics Processing Units (GPUs), where it outperforms previous alternative approaches by exploiting the fixed-function interpolation routines present in their texture units. The proposed technique is a perfect match for any application requiring the evaluation of continuous functions, we have measured in detail its quality and efficiency on several functions, and, in particular, the Gaussian function because it is extensively used in many areas of computer vision and cybernetics, and it is expensive to evaluate.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper we present a scalable software architecture for on-line multi-camera video processing, that guarantees a good trade off between computational power, scalability and flexibility. The software system is modular and its main blocks are the Processing Units (PUs), and the Central Unit. The Central Unit works as a supervisor of the running PUs and each PU manages the acquisition phase and the processing phase. Furthermore, an approach to easily parallelize the desired processing application has been presented. In this paper, as case study, we apply the proposed software architecture to a multi-camera system in order to efficiently manage multiple 2D object detection modules in a real-time scenario. System performance has been evaluated under different load conditions such as number of cameras and image sizes. The results show that the software architecture scales well with the number of camera and can easily works with different image formats respecting the real time constraints. Moreover, the parallelization approach can be used in order to speed up the processing tasks with a low level of overhead

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Shading reduces the power output of a photovoltaic (PV) system. The design engineering of PV systems requires modeling and evaluating shading losses. Some PV systems are affected by complex shading scenes whose resulting PV energy losses are very difficult to evaluate with current modeling tools. Several specialized PV design and simulation software include the possibility to evaluate shading losses. They generally possess a Graphical User Interface (GUI) through which the user can draw a 3D shading scene, and then evaluate its corresponding PV energy losses. The complexity of the objects that these tools can handle is relatively limited. We have created a software solution, 3DPV, which allows evaluating the energy losses induced by complex 3D scenes on PV generators. The 3D objects can be imported from specialized 3D modeling software or from a 3D object library. The shadows cast by this 3D scene on the PV generator are then directly evaluated from the Graphics Processing Unit (GPU). Thanks to the recent development of GPUs for the video game industry, the shadows can be evaluated with a very high spatial resolution that reaches well beyond the PV cell level, in very short calculation times. A PV simulation model then translates the geometrical shading into PV energy output losses. 3DPV has been implemented using WebGL, which allows it to run directly from a Web browser, without requiring any local installation from the user. This also allows taken full benefits from the information already available from Internet, such as the 3D object libraries. This contribution describes, step by step, the method that allows 3DPV to evaluate the PV energy losses caused by complex shading. We then illustrate the results of this methodology to several application cases that are encountered in the world of PV systems design. Keywords: 3D, modeling, simulation, GPU, shading, losses, shadow mapping, solar, photovoltaic, PV, WebGL

Relevância:

80.00% 80.00%

Publicador:

Resumo:

To perceive a coherent environment, incomplete or overlapping visual forms must be integrated into meaningful coherent percepts, a process referred to as ?Gestalt? formation or perceptual completion. Increasing evidence suggests that this process engages oscillatory neuronal activity in a distributed neuronal assembly. A separate line of evidence suggests that Gestalt formation requires top-down feedback from higher order brain regions to early visual cortex. Here we combine magnetoencephalography (MEG) and effective connectivity analysis in the frequency domain to specifically address the effective coupling between sources of oscillatory brain activity during Gestalt formation. We demonstrate that perceptual completion of two-tone ?Mooney? faces induces increased gamma frequency band power (55?71 Hz) in human early visual, fusiform and parietal cortices. Within this distributed neuronal assembly fusiform and parietal gamma oscillators are coupled by forward and backward connectivity during Mooney face perception, indicating reciprocal influences of gamma activity between these higher order visual brain regions. Critically, gamma band oscillations in early visual cortex are modulated by top-down feedback connectivity from both fusiform and parietal cortices. Thus, we provide a mechanistic account of Gestalt perception in which gamma oscillations in feature sensitive and spatial attention-relevant brain regions reciprocally drive one another and convey global stimulus aspects to local processing units at low levels of the sensory hierarchy by top-down feedback. Our data therefore support the notion of inverse hierarchical processing within the visual system underlying awareness of coherent percepts.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Purpose: A fully three-dimensional (3D) massively parallelizable list-mode ordered-subsets expectation-maximization (LM-OSEM) reconstruction algorithm has been developed for high-resolution PET cameras. System response probabilities are calculated online from a set of parameters derived from Monte Carlo simulations. The shape of a system response for a given line of response (LOR) has been shown to be asymmetrical around the LOR. This work has been focused on the development of efficient region-search techniques to sample the system response probabilities, which are suitable for asymmetric kernel models, including elliptical Gaussian models that allow for high accuracy and high parallelization efficiency. The novel region-search scheme using variable kernel models is applied in the proposed PET reconstruction algorithm. Methods: A novel region-search technique has been used to sample the probability density function in correspondence with a small dynamic subset of the field of view that constitutes the region of response (ROR). The ROR is identified around the LOR by searching for any voxel within a dynamically calculated contour. The contour condition is currently defined as a fixed threshold over the posterior probability, and arbitrary kernel models can be applied using a numerical approach. The processing of the LORs is distributed in batches among the available computing devices, then, individual LORs are processed within different processing units. In this way, both multicore and multiple many-core processing units can be efficiently exploited. Tests have been conducted with probability models that take into account the noncolinearity, positron range, and crystal penetration effects, that produced tubes of response with varying elliptical sections whose axes were a function of the crystal's thickness and angle of incidence of the given LOR. The algorithm treats the probability model as a 3D scalar field defined within a reference system aligned with the ideal LOR. Results: This new technique provides superior image quality in terms of signal-to-noise ratio as compared with the histogram-mode method based on precomputed system matrices available for a commercial small animal scanner. Reconstruction times can be kept low with the use of multicore, many-core architectures, including multiple graphic processing units. Conclusions: A highly parallelizable LM reconstruction method has been proposed based on Monte Carlo simulations and new parallelization techniques aimed at improving the reconstruction speed and the image signal-to-noise of a given OSEM algorithm. The method has been validated using simulated and real phantoms. A special advantage of the new method is the possibility of defining dynamically the cut-off threshold over the calculated probabilities thus allowing for a direct control on the trade-off between speed and quality during the reconstruction.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The last generation of consumer electronic devices is endowed with Augmented Reality (AR) tools. These tools require moving object detection strategies, which should be fast and efficient, to carry out higher level object analysis tasks. We propose a lightweight spatio-temporal-based non-parametric background-foreground modeling strategy in a General Purpose Graphics Processing Unit (GPGPU), which provides real-time high-quality results in a great variety of scenarios and is suitable for AR applications.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

En la última década, los sistemas de telecomunicación de alta frecuencia han evolucionado tremendamente. Las bandas de frecuencias, los anchos de banda del usuario, las técnicas de modulación y otras características eléctricas están en constante cambio de acuerdo a la evolución de la tecnología y la aparición de nuevas aplicaciones. Las arquitecturas de los transceptores modernos son diferentes de las tradicionales. Muchas de las funciones convencionalmente realizadas por circuitos analógicos han sido asignadas gradualmente a procesadores digitales de señal, de esta manera, las fronteras entre la banda base y las funcionalidades de RF se difuminan. Además, los transceptores inalámbricos digitales modernos son capaces de soportar protocolos de datos de alta velocidad, por lo que emplean una elevada escala de integración para muchos de los subsistemas que componen las diferentes etapas. Uno de los objetivos de este trabajo de investigación es realizar un estudio de las nuevas configuraciones en el desarrollo de demostradores de radiofrecuencia (un receptor y un transmisor) y transpondedores para fines de comunicaciones y militares, respectivamente. Algunos trabajos se han llevado a cabo en el marco del proyecto TECRAIL, donde se ha implementado un demostrador de la capa física LTE para evaluar la viabilidad del estándar LTE en el entorno ferroviario. En el ámbito militar y asociado al proyecto de calibración de radares (CALRADAR), se ha efectuado una actividad importante en el campo de la calibración de radares balísticos Doppler donde se ha analizado cuidadosamente su precisión y se ha desarrollado la unidad generadora de Doppler de un patrón electrónico para la calibración de estos radares. Dicha unidad Doppler es la responsable de la elevada resolución en frecuencia del generador de “blancos” radar construido. Por otro lado, se ha elaborado un análisis completo de las incertidumbres del sistema para optimizar el proceso de calibración. En una segunda fase se han propuesto soluciones en el desarrollo de dispositivos electro-ópticos para aplicaciones de comunicaciones. Estos dispositivos son considerados, debido a sus ventajas, tecnologías de soporte para futuros dispositivos y subsistemas de RF/microondas. Algunas demandas de radio definida por software podrían cubrirse aplicando nuevos conceptos de circuitos sintonizables mediante parámetros programables de un modo dinámico. También se ha realizado una contribución relacionada con el diseño de filtros paso banda con topología “Hairpin”, los cuales son compactos y se pueden integrar fácilmente en circuitos de microondas en una amplia gama de aplicaciones destinadas a las comunicaciones y a los sistemas militares. Como importante aportación final, se ha presentado una propuesta para ecualizar y mejorar las transmisiones de señales discretas de temporización entre los TRMs y otras unidades de procesamiento, en el satélite de última generación SEOSAR/PAZ. Tras un análisis exhaustivo, se ha obtenido la configuración óptima de los buses de transmisión de datos de alta velocidad basadas en una red de transceptores. ABSTRACT In the last decade, high-frequency telecommunications systems have extremely evolved. Frequency bands, user bandwidths, modulation techniques and other electrical characteristics of these systems are constantly changing following to the evolution of technology and the emergence of new applications. The architectures of modern transceivers are different from the traditional ones. Many of the functions conventionally performed by analog circuitry have gradually been assigned to digital signal processors. In this way, boundaries between baseband and RF functionalities are diffused. The design of modern digital wireless transceivers are capable of supporting high-speed data protocols. Therefore, a high integration scale is required for many of the components in the block chain. One of the goals of this research work is to investigate new configurations in the development of RF demonstrators (a receiver and a transmitter) and transponders for communications and military purposes, respectively. A LTE physical layer demonstrator has been implemented to assess the viability of LTE in railway scenario under the framework of the TECRAIL project. An important activity, related to the CALRADAR project, for the calibration of Doppler radars with extremely high precision has been performed. The contribution is the Doppler unit of the radar target generator developed that reveals a high frequency resolution. In order to assure the accuracy of radar calibration process, a complete analysis of the uncertainty in the above mentioned procedure has been carried out. Another important research topic has been the development of photonic devices that are considered enabling technologies for future RF and microwave devices and subsystems. Some Software Defined Radio demands are addressed by the proposed novel circuit concepts based on photonically tunable elements with dynamically programmable parameters. A small contribution has been made in the field of Hairpin-line bandpass filters. These filters are compact and can also be easily integrated into microwave circuits finding a wide range of applications in communication and military systems. In this research field, the contributions made have been the improvements in the design and the simulations of wideband filters. Finally, an important proposal to balance and enhance transmissions of discrete timing signals between TRMs and other processing units into the state of the art SEOSAR/PAZ Satellite has been carried out obtaining the optimal configuration of the high-speed data transmission buses based on a transceiver network. RÉSUMÉ Les systèmes d'hyperfréquence dédiés aux télécommunications ont beaucoup évolué dans la dernière décennie. Les bandes de fréquences, les bandes passantes par utilisateur, les techniques de modulation et d'autres caractéristiques électriques sont en constant changement en fonction de l'évolution des technologies et l'émergence de nouvelles applications. Les architectures modernes des transcepteurs sont différentes des traditionnelles. Un grand nombre d’opérations normalement effectuées par les circuits analogiques a été progressivement alloué à des processeurs de signaux numériques. Ainsi, les frontières entre la bande de base et la fonctionnalité RF sont floues. Les transcepteurs sans fils numériques modernes sont capables de transférer des données à haute vitesse selon les différents protocoles de communication utilisés. C'est pour cette raison qu’un niveau élevé d'intégration est nécessaire pour un grand nombre de composants qui constitue les différentes étapes des systèmes. L'un des objectifs de cette recherche est d'étudier les nouvelles configurations dans le développement des démonstrateurs RF (récepteur et émetteur) et des transpondeurs à des fins militaire et de communication. Certains travaux ont été réalisés dans le cadre du projet TECRAIL, où un démonstrateur de la couche physique LTE a été mis en place pour évaluer la faisabilité de la norme LTE dans l'environnement ferroviaire. Une contribution importante, liée au projet CALRADAR, est proposée dans le domaine des systèmes d’étalonnage de radar Doppler de haute précision. Cette contribution est le module Doppler de génération d’hyperfréquence intégré dans le système électronique de génération de cibles radar virtuelles que présente une résolution de fréquence très élevée. Une analyse complète de l'incertitude dans l'étalonnage des radars Doppler a été effectuée, afin d'assurer la précision du calibrage. La conception et la mise en oeuvre de quelques dispositifs photoniques sont un autre sujet important du travail de recherche présenté dans cette thèse. De tels dispositifs sont considérés comme étant des technologies habilitantes clés pour les futurs dispositifs et sous-systèmes RF et micro-ondes grâce à leurs avantages. Certaines demandes de radio définies par logiciel pourraient être supportées par nouveaux concepts de circuits basés sur des éléments dynamiquement programmables en utilisant des paramètres ajustables. Une petite contribution a été apportée pour améliorer la conception et les simulations des filtres passe-bande Hairpin à large bande. Ces filtres sont compacts et peuvent également être intégrés dans des circuits à micro-ondes compatibles avec un large éventail d'applications dans les systèmes militaires et de communication. Finalement, une proposition a été effectuée visant à équilibrer et améliorer la transmission des signaux discrets de synchronisation entre les TRMs et d'autres unités de traitement dans le satellite SEOSAR/PAZ de dernière génération et permettant l’obtention de la configuration optimale des bus de transmission de données à grande vitesse basés sur un réseau de transcepteurs.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

El objetivo de este proyecto es evaluar la mejora de rendimiento que aporta la paralelización de algoritmos de procesamiento de imágenes, para su ejecución en una tarjeta gráfica. Para ello, una vez seleccionados los algoritmos a estudio, fueron desarrollados en lenguaje C++ bajo el paradigma secuencial. A continuación, tomando como base estas implementaciones, se paralelizaron siguiendo las directivas de la tecnología CUDA (Compute Unified Device Architecture) desarrollada por NVIDIA. Posteriormente, se desarrolló un interfaz gráfico de usuario en Visual C#, para una utilización más sencilla de la herramienta. Por último, se midió el rendimiento de cada uno de los algoritmos, en términos de tiempo de ejecución paralela y speedup, mediante el procesamiento de una serie de imágenes de distintos tamaños.---ABSTRACT---The aim of this Project is to evaluate the performance improvement provided by the parallelization of image processing algorithms, which will be executed on a graphics processing unit. In order to do this, once the algorithms to study were selected, each of them was developed in C++ under sequential paradigm. Then, based on these implementations, these algorithms were implemented using the compute unified device architecture (CUDA) programming model provided by NVIDIA. After that, a graphical user interface (GUI) was developed to increase application’s usability. Finally, performance of each algorithm was measured in terms of parallel execution time and speedup by processing a set of images of different sizes.