961 resultados para General-purpose computing on graphics processing units (GPGPU)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Microprocessori basati su singolo processore (CPU), hanno visto una rapida crescita di performances ed un abbattimento dei costi per circa venti anni. Questi microprocessori hanno portato una potenza di calcolo nell’ordine del GFLOPS (Giga Floating Point Operation per Second) sui PC Desktop e centinaia di GFLOPS su clusters di server. Questa ascesa ha portato nuove funzionalità nei programmi, migliori interfacce utente e tanti altri vantaggi. Tuttavia questa crescita ha subito un brusco rallentamento nel 2003 a causa di consumi energetici sempre più elevati e problemi di dissipazione termica, che hanno impedito incrementi di frequenza di clock. I limiti fisici del silicio erano sempre più vicini. Per ovviare al problema i produttori di CPU (Central Processing Unit) hanno iniziato a progettare microprocessori multicore, scelta che ha avuto un impatto notevole sulla comunità degli sviluppatori, abituati a considerare il software come una serie di comandi sequenziali. Quindi i programmi che avevano sempre giovato di miglioramenti di prestazioni ad ogni nuova generazione di CPU, non hanno avuto incrementi di performance, in quanto essendo eseguiti su un solo core, non beneficiavano dell’intera potenza della CPU. Per sfruttare appieno la potenza delle nuove CPU la programmazione concorrente, precedentemente utilizzata solo su sistemi costosi o supercomputers, è diventata una pratica sempre più utilizzata dagli sviluppatori. Allo stesso tempo, l’industria videoludica ha conquistato una fetta di mercato notevole: solo nel 2013 verranno spesi quasi 100 miliardi di dollari fra hardware e software dedicati al gaming. Le software houses impegnate nello sviluppo di videogames, per rendere i loro titoli più accattivanti, puntano su motori grafici sempre più potenti e spesso scarsamente ottimizzati, rendendoli estremamente esosi in termini di performance. Per questo motivo i produttori di GPU (Graphic Processing Unit), specialmente nell’ultimo decennio, hanno dato vita ad una vera e propria rincorsa alle performances che li ha portati ad ottenere dei prodotti con capacità di calcolo vertiginose. Ma al contrario delle CPU che agli inizi del 2000 intrapresero la strada del multicore per continuare a favorire programmi sequenziali, le GPU sono diventate manycore, ovvero con centinaia e centinaia di piccoli cores che eseguono calcoli in parallelo. Questa immensa capacità di calcolo può essere utilizzata in altri campi applicativi? La risposta è si e l’obiettivo di questa tesi è proprio quello di constatare allo stato attuale, in che modo e con quale efficienza pùo un software generico, avvalersi dell’utilizzo della GPU invece della CPU.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The research described in this thesis was motivated by the need of a robust model capable of representing 3D data obtained with 3D sensors, which are inherently noisy. In addition, time constraints have to be considered as these sensors are capable of providing a 3D data stream in real time. This thesis proposed the use of Self-Organizing Maps (SOMs) as a 3D representation model. In particular, we proposed the use of the Growing Neural Gas (GNG) network, which has been successfully used for clustering, pattern recognition and topology representation of multi-dimensional data. Until now, Self-Organizing Maps have been primarily computed offline and their application in 3D data has mainly focused on free noise models, without considering time constraints. It is proposed a hardware implementation leveraging the computing power of modern GPUs, which takes advantage of a new paradigm coined as General-Purpose Computing on Graphics Processing Units (GPGPU). The proposed methods were applied to different problem and applications in the area of computer vision such as the recognition and localization of objects, visual surveillance or 3D reconstruction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação para obtenção do Grau de Mestre em Engenharia Biomédica

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nowadays, new computers generation provides a high performance that enables to build computationally expensive computer vision applications applied to mobile robotics. Building a map of the environment is a common task of a robot and is an essential part to allow the robots to move through these environments. Traditionally, mobile robots used a combination of several sensors from different technologies. Lasers, sonars and contact sensors have been typically used in any mobile robotic architecture, however color cameras are an important sensor due to we want the robots to use the same information that humans to sense and move through the different environments. Color cameras are cheap and flexible but a lot of work need to be done to give robots enough visual understanding of the scenes. Computer vision algorithms are computational complex problems but nowadays robots have access to different and powerful architectures that can be used for mobile robotics purposes. The advent of low-cost RGB-D sensors like Microsoft Kinect which provide 3D colored point clouds at high frame rates made the computer vision even more relevant in the mobile robotics field. The combination of visual and 3D data allows the systems to use both computer vision and 3D processing and therefore to be aware of more details of the surrounding environment. The research described in this thesis was motivated by the need of scene mapping. Being aware of the surrounding environment is a key feature in many mobile robotics applications from simple robotic navigation to complex surveillance applications. In addition, the acquisition of a 3D model of the scenes is useful in many areas as video games scene modeling where well-known places are reconstructed and added to game systems or advertising where once you get the 3D model of one room the system can add furniture pieces using augmented reality techniques. In this thesis we perform an experimental study of the state-of-the-art registration methods to find which one fits better to our scene mapping purposes. Different methods are tested and analyzed on different scene distributions of visual and geometry appearance. In addition, this thesis proposes two methods for 3d data compression and representation of 3D maps. Our 3D representation proposal is based on the use of Growing Neural Gas (GNG) method. This Self-Organizing Maps (SOMs) has been successfully used for clustering, pattern recognition and topology representation of various kind of data. Until now, Self-Organizing Maps have been primarily computed offline and their application in 3D data has mainly focused on free noise models without considering time constraints. Self-organising neural models have the ability to provide a good representation of the input space. In particular, the Growing Neural Gas (GNG) is a suitable model because of its flexibility, rapid adaptation and excellent quality of representation. However, this type of learning is time consuming, specially for high-dimensional input data. Since real applications often work under time constraints, it is necessary to adapt the learning process in order to complete it in a predefined time. This thesis proposes a hardware implementation leveraging the computing power of modern GPUs which takes advantage of a new paradigm coined as General-Purpose Computing on Graphics Processing Units (GPGPU). Our proposed geometrical 3D compression method seeks to reduce the 3D information using plane detection as basic structure to compress the data. This is due to our target environments are man-made and therefore there are a lot of points that belong to a plane surface. Our proposed method is able to get good compression results in those man-made scenarios. The detected and compressed planes can be also used in other applications as surface reconstruction or plane-based registration algorithms. Finally, we have also demonstrated the goodness of the GPU technologies getting a high performance implementation of a CAD/CAM common technique called Virtual Digitizing.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent advances in the massively parallel computational abilities of graphical processing units (GPUs) have increased their use for general purpose computation, as companies look to take advantage of big data processing techniques. This has given rise to the potential for malicious software targeting GPUs, which is of interest to forensic investigators examining the operation of software. The ability to carry out reverse-engineering of software is of great importance within the security and forensics elds, particularly when investigating malicious software or carrying out forensic analysis following a successful security breach. Due to the complexity of the Nvidia CUDA (Compute Uni ed Device Architecture) framework, it is not clear how best to approach the reverse engineering of a piece of CUDA software. We carry out a review of the di erent binary output formats which may be encountered from the CUDA compiler, and their implications on reverse engineering. We then demonstrate the process of carrying out disassembly of an example CUDA application, to establish the various techniques available to forensic investigators carrying out black-box disassembly and reverse engineering of CUDA binaries. We show that the Nvidia compiler, using default settings, leaks useful information. Finally, we demonstrate techniques to better protect intellectual property in CUDA algorithm implementations from reverse engineering.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an approach to create what we have called a Unified Sentiment Lexicon (USL). This approach aims at aligning, unifying, and expanding the set of sentiment lexicons which are available on the web in order to increase their robustness of coverage. One problem related to the task of the automatic unification of different scores of sentiment lexicons is that there are multiple lexical entries for which the classification of positive, negative, or neutral {P, Z, N} depends on the unit of measurement used in the annotation methodology of the source sentiment lexicon. Our USL approach computes the unified strength of polarity of each lexical entry based on the Pearson correlation coefficient which measures how correlated lexical entries are with a value between 1 and -1, where 1 indicates that the lexical entries are perfectly correlated, 0 indicates no correlation, and -1 means they are perfectly inversely correlated and so is the UnifiedMetrics procedure for CPU and GPU, respectively. Another problem is the high processing time required for computing all the lexical entries in the unification task. Thus, the USL approach computes a subset of lexical entries in each of the 1344 GPU cores and uses parallel processing in order to unify 155802 lexical entries. The results of the analysis conducted using the USL approach show that the USL has 95.430 lexical entries, out of which there are 35.201 considered to be positive, 22.029 negative, and 38.200 neutral. Finally, the runtime was 10 minutes for 95.430 lexical entries; this allows a reduction of the time computing for the UnifiedMetrics by 3 times.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ein System in einem metastabilen Zustand muss eine bestimmte Barriere in derrnfreien Energie überwinden um einen Tropfen der stabilen Phase zu formen.rnHerkömmliche Untersuchungen nehmen hierbei kugelförmige Tropfen an. Inrnanisotropen Systemen (wie z.B. Kristallen) ist diese Annahme aber nicht ange-rnbracht. Bei tiefen Temperaturen wirkt sich die Anisotropie des Systems starkrnauf die freie Energie ihrer Oberfläche aus. Diese Wirkung wird oberhalb derrnAufrauungstemperatur T R schwächer. Das Ising-Modell ist ein einfaches Mo-rndell, welches eine solche Anisotropie aufweist. Wir führen großangelegte Sim-rnulationen durch, um die Effekte, die mit einer endlichen Simulationsbox ein-rnhergehen, sowie statistische Ungenauigkeiten möglichst klein zu halten. DasrnAusmaß der Simulationen die benötigt werden um sinnvolle Ergebnisse zu pro-rnduzieren, erfordert die Entwicklung eines skalierbaren Simulationsprogrammsrnfür das Ising-Modell, welcher auf verschiedenen parallelen Architekturen (z.B.rnGrafikkarten) verwendet werden kann. Plattformunabhängigkeit wird durch ab-rnstrakte Schnittstellen erreicht, welche plattformspezifische Implementierungs-rndetails verstecken. Wir benutzen eine Systemgeometrie die es erlaubt eine Ober-rnfläche mit einem variablen Winkel zur Kristallebene zu untersuchen. Die Ober-rnfläche ist in Kontakt mit einer harten Wand, wobei der Kontaktwinkel Θ durchrnein Oberflächenfeld eingestellt werden kann. Wir leiten eine Differenzialglei-rnchung ab, welche das Verhalten der freien Energie der Oberfläche in einemrnanisotropen System beschreibt. Kombiniert mit thermodynamischer Integrationrnkann die Gleichung benutzt werden, um die anisotrope Oberflächenspannungrnüber einen großen Winkelbereich zu integrieren. Vergleiche mit früheren Mes-rnsungen in anderen Geometrien und anderen Methoden zeigen hohe Überein-rnstimung und Genauigkeit, welche vor allem durch die im Vergleich zu früherenrnMessungen wesentlich größeren Simulationsdomänen erreicht wird. Die Temper-rnaturabhängigkeit der Oberflächensteifheit κ wird oberhalb von T R durch diernKrümmung der freien Energie der Oberfläche für kleine Winkel gemessen. DiesernMessung lässt sich mit Simulationsergebnissen in der Literatur vergleichen undrnhat bessere Übereinstimmung mit theoretischen Voraussagen über das Skalen-rnverhalten von κ. Darüber hinaus entwickeln wir ein Tieftemperatur-Modell fürrndas Verhalten um Θ = 90 Grad weit unterhalb von T R. Der Winkel bleibt bis zu einemrnkritischen Feld H C quasi null; oberhalb des kritischen Feldes steigt der Winkelrnrapide an. H C wird mit der freien Energie einer Stufe in Verbindung gebracht,rnwas es ermöglicht, das kritische Verhalten dieser Größe zu analysieren. Die harternWand muss in die Analyse einbezogen werden. Durch den Vergleich freier En-rnergien bei geschickt gewählten Systemgrößen ist es möglich, den Beitrag derrnKontaktlinie zur freien Energie in Abhängigkeit von Θ zu messen. Diese Anal-rnyse wird bei verschiedenen Temperaturen durchgeführt. Im letzten Kapitel wirdrneine 2D Fluiddynamik Simulation für Grafikkarten parallelisiert, welche u. a.rnbenutzt werden kann um die Dynamik der Atmosphäre zu simulieren. Wir im-rnplementieren einen parallelen Evolution Galerkin Operator und erreichen

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Durante los últimos años ha sido creciente el uso de las unidades de procesamiento gráfico, más conocidas como GPU (Graphic Processing Unit), en aplicaciones de propósito general, dejando a un lado el objetivo para el que fueron creadas y que no era otro que el renderizado de gráficos por computador. Este crecimiento se debe en parte a la evolución que han experimentado estos dispositivos durante este tiempo y que les ha dotado de gran potencia de cálculo, consiguiendo que su uso se extienda desde ordenadores personales a grandes cluster. Este hecho unido a la proliferación de sensores RGB-D de bajo coste ha hecho que crezca el número de aplicaciones de visión que hacen uso de esta tecnología para la resolución de problemas, así como también para el desarrollo de nuevas aplicaciones. Todas estas mejoras no solamente se han realizado en la parte hardware, es decir en los dispositivos, sino también en la parte software con la aparición de nuevas herramientas de desarrollo que facilitan la programación de estos dispositivos GPU. Este nuevo paradigma se acuñó como Computación de Propósito General sobre Unidades de Proceso Gráfico (General-Purpose computation on Graphics Processing Units, GPGPU). Los dispositivos GPU se clasifican en diferentes familias, en función de las distintas características hardware que poseen. Cada nueva familia que aparece incorpora nuevas mejoras tecnológicas que le permite conseguir mejor rendimiento que las anteriores. No obstante, para sacar un rendimiento óptimo a un dispositivo GPU es necesario configurarlo correctamente antes de usarlo. Esta configuración viene determinada por los valores asignados a una serie de parámetros del dispositivo. Por tanto, muchas de las implementaciones que hoy en día hacen uso de los dispositivos GPU para el registro denso de nubes de puntos 3D, podrían ver mejorado su rendimiento con una configuración óptima de dichos parámetros, en función del dispositivo utilizado. Es por ello que, ante la falta de un estudio detallado del grado de afectación de los parámetros GPU sobre el rendimiento final de una implementación, se consideró muy conveniente la realización de este estudio. Este estudio no sólo se realizó con distintas configuraciones de parámetros GPU, sino también con diferentes arquitecturas de dispositivos GPU. El objetivo de este estudio es proporcionar una herramienta de decisión que ayude a los desarrolladores a la hora implementar aplicaciones para dispositivos GPU. Uno de los campos de investigación en los que más prolifera el uso de estas tecnologías es el campo de la robótica ya que tradicionalmente en robótica, sobre todo en la robótica móvil, se utilizaban combinaciones de sensores de distinta naturaleza con un alto coste económico, como el láser, el sónar o el sensor de contacto, para obtener datos del entorno. Más tarde, estos datos eran utilizados en aplicaciones de visión por computador con un coste computacional muy alto. Todo este coste, tanto el económico de los sensores utilizados como el coste computacional, se ha visto reducido notablemente gracias a estas nuevas tecnologías. Dentro de las aplicaciones de visión por computador más utilizadas está el registro de nubes de puntos. Este proceso es, en general, la transformación de diferentes nubes de puntos a un sistema de coordenadas conocido. Los datos pueden proceder de fotografías, de diferentes sensores, etc. Se utiliza en diferentes campos como son la visión artificial, la imagen médica, el reconocimiento de objetos y el análisis de imágenes y datos de satélites. El registro se utiliza para poder comparar o integrar los datos obtenidos en diferentes mediciones. En este trabajo se realiza un repaso del estado del arte de los métodos de registro 3D. Al mismo tiempo, se presenta un profundo estudio sobre el método de registro 3D más utilizado, Iterative Closest Point (ICP), y una de sus variantes más conocidas, Expectation-Maximization ICP (EMICP). Este estudio contempla tanto su implementación secuencial como su implementación paralela en dispositivos GPU, centrándose en cómo afectan a su rendimiento las distintas configuraciones de parámetros GPU. Como consecuencia de este estudio, también se presenta una propuesta para mejorar el aprovechamiento de la memoria de los dispositivos GPU, permitiendo el trabajo con nubes de puntos más grandes, reduciendo el problema de la limitación de memoria impuesta por el dispositivo. El funcionamiento de los métodos de registro 3D utilizados en este trabajo depende en gran medida de la inicialización del problema. En este caso, esa inicialización del problema consiste en la correcta elección de la matriz de transformación con la que se iniciará el algoritmo. Debido a que este aspecto es muy importante en este tipo de algoritmos, ya que de él depende llegar antes o no a la solución o, incluso, no llegar nunca a la solución, en este trabajo se presenta un estudio sobre el espacio de transformaciones con el objetivo de caracterizarlo y facilitar la elección de la transformación inicial a utilizar en estos algoritmos.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

General-purpose computing devices allow us to (1) customize computation after fabrication and (2) conserve area by reusing expensive active circuitry for different functions in time. We define RP-space, a restricted domain of the general-purpose architectural space focussed on reconfigurable computing architectures. Two dominant features differentiate reconfigurable from special-purpose architectures and account for most of the area overhead associated with RP devices: (1) instructions which tell the device how to behave, and (2) flexible interconnect which supports task dependent dataflow between operations. We can characterize RP-space by the allocation and structure of these resources and compare the efficiencies of architectural points across broad application characteristics. Conventional FPGAs fall at one extreme end of this space and their efficiency ranges over two orders of magnitude across the space of application characteristics. Understanding RP-space and its consequences allows us to pick the best architecture for a task and to search for more robust design points in the space. Our DPGA, a fine- grained computing device which adds small, on-chip instruction memories to FPGAs is one such design point. For typical logic applications and finite- state machines, a DPGA can implement tasks in one-third the area of a traditional FPGA. TSFPGA, a variant of the DPGA which focuses on heavily time-switched interconnect, achieves circuit densities close to the DPGA, while reducing typical physical mapping times from hours to seconds. Rigid, fabrication-time organization of instruction resources significantly narrows the range of efficiency for conventional architectures. To avoid this performance brittleness, we developed MATRIX, the first architecture to defer the binding of instruction resources until run-time, allowing the application to organize resources according to its needs. Our focus MATRIX design point is based on an array of 8-bit ALU and register-file building blocks interconnected via a byte-wide network. With today's silicon, a single chip MATRIX array can deliver over 10 Gop/s (8-bit ops). On sample image processing tasks, we show that MATRIX yields 10-20x the computational density of conventional processors. Understanding the cost structure of RP-space helps us identify these intermediate architectural points and may provide useful insight more broadly in guiding our continual search for robust and efficient general-purpose computing structures.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La evolución de los teléfonos móviles inteligentes, dotados de cámaras digitales, está provocando una creciente demanda de aplicaciones cada vez más complejas que necesitan algoritmos de visión artificial en tiempo real; puesto que el tamaño de las señales de vídeo no hace sino aumentar y en cambio el rendimiento de los procesadores de un solo núcleo se ha estancado, los nuevos algoritmos que se diseñen para visión artificial han de ser paralelos para poder ejecutarse en múltiples procesadores y ser computacionalmente escalables. Una de las clases de procesadores más interesantes en la actualidad se encuentra en las tarjetas gráficas (GPU), que son dispositivos que ofrecen un alto grado de paralelismo, un excelente rendimiento numérico y una creciente versatilidad, lo que los hace interesantes para llevar a cabo computación científica. En esta tesis se exploran dos aplicaciones de visión artificial que revisten una gran complejidad computacional y no pueden ser ejecutadas en tiempo real empleando procesadores tradicionales. En cambio, como se demuestra en esta tesis, la paralelización de las distintas subtareas y su implementación sobre una GPU arrojan los resultados deseados de ejecución con tasas de refresco interactivas. Asimismo, se propone una técnica para la evaluación rápida de funciones de complejidad arbitraria especialmente indicada para su uso en una GPU. En primer lugar se estudia la aplicación de técnicas de síntesis de imágenes virtuales a partir de únicamente dos cámaras lejanas y no paralelas—en contraste con la configuración habitual en TV 3D de cámaras cercanas y paralelas—con información de color y profundidad. Empleando filtros de mediana modificados para la elaboración de un mapa de profundidad virtual y proyecciones inversas, se comprueba que estas técnicas son adecuadas para una libre elección del punto de vista. Además, se demuestra que la codificación de la información de profundidad con respecto a un sistema de referencia global es sumamente perjudicial y debería ser evitada. Por otro lado se propone un sistema de detección de objetos móviles basado en técnicas de estimación de densidad con funciones locales. Este tipo de técnicas es muy adecuada para el modelado de escenas complejas con fondos multimodales, pero ha recibido poco uso debido a su gran complejidad computacional. El sistema propuesto, implementado en tiempo real sobre una GPU, incluye propuestas para la estimación dinámica de los anchos de banda de las funciones locales, actualización selectiva del modelo de fondo, actualización de la posición de las muestras de referencia del modelo de primer plano empleando un filtro de partículas multirregión y selección automática de regiones de interés para reducir el coste computacional. Los resultados, evaluados sobre diversas bases de datos y comparados con otros algoritmos del estado del arte, demuestran la gran versatilidad y calidad de la propuesta. Finalmente se propone un método para la aproximación de funciones arbitrarias empleando funciones continuas lineales a tramos, especialmente indicada para su implementación en una GPU mediante el uso de las unidades de filtraje de texturas, normalmente no utilizadas para cómputo numérico. La propuesta incluye un riguroso análisis matemático del error cometido en la aproximación en función del número de muestras empleadas, así como un método para la obtención de una partición cuasióptima del dominio de la función para minimizar el error. ABSTRACT The evolution of smartphones, all equipped with digital cameras, is driving a growing demand for ever more complex applications that need to rely on real-time computer vision algorithms. However, video signals are only increasing in size, whereas the performance of single-core processors has somewhat stagnated in the past few years. Consequently, new computer vision algorithms will need to be parallel to run on multiple processors and be computationally scalable. One of the most promising classes of processors nowadays can be found in graphics processing units (GPU). These are devices offering a high parallelism degree, excellent numerical performance and increasing versatility, which makes them interesting to run scientific computations. In this thesis, we explore two computer vision applications with a high computational complexity that precludes them from running in real time on traditional uniprocessors. However, we show that by parallelizing subtasks and implementing them on a GPU, both applications attain their goals of running at interactive frame rates. In addition, we propose a technique for fast evaluation of arbitrarily complex functions, specially designed for GPU implementation. First, we explore the application of depth-image–based rendering techniques to the unusual configuration of two convergent, wide baseline cameras, in contrast to the usual configuration used in 3D TV, which are narrow baseline, parallel cameras. By using a backward mapping approach with a depth inpainting scheme based on median filters, we show that these techniques are adequate for free viewpoint video applications. In addition, we show that referring depth information to a global reference system is ill-advised and should be avoided. Then, we propose a background subtraction system based on kernel density estimation techniques. These techniques are very adequate for modelling complex scenes featuring multimodal backgrounds, but have not been so popular due to their huge computational and memory complexity. The proposed system, implemented in real time on a GPU, features novel proposals for dynamic kernel bandwidth estimation for the background model, selective update of the background model, update of the position of reference samples of the foreground model using a multi-region particle filter, and automatic selection of regions of interest to reduce computational cost. The results, evaluated on several databases and compared to other state-of-the-art algorithms, demonstrate the high quality and versatility of our proposal. Finally, we propose a general method for the approximation of arbitrarily complex functions using continuous piecewise linear functions, specially formulated for GPU implementation by leveraging their texture filtering units, normally unused for numerical computation. Our proposal features a rigorous mathematical analysis of the approximation error in function of the number of samples, as well as a method to obtain a suboptimal partition of the domain of the function to minimize approximation error.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper studies the effects of the diffusion of a General Purpose Technology (GPT) that spreads first within the developed North country of its origin, and then to a developing South country. In the developed general equilibrium growth model, each final good can be produced by one of two technologies. Each technology is characterized by a specific labor complemented by a specific set of intermediate goods, which are enhanced periodically by Schumpeterian R&D activities. When quality reaches a threshold level, a GPT arises in one of the technologies and spreads first to the other technology within the North. Then, it propagates to the South, following a similar sequence. Since diffusion is not even, neither intra- nor inter-country, the GPT produces successive changes in the direction of technological knowledge and in inter- and intra-country wage inequality. Through this mechanism the different observed paths of wage inequality can be accommodated.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Floating-point computing with more than one TFLOP of peak performance is already a reality in recent Field-Programmable Gate Arrays (FPGA). General-Purpose Graphics Processing Units (GPGPU) and recent many-core CPUs have also taken advantage of the recent technological innovations in integrated circuit (IC) design and had also dramatically improved their peak performances. In this paper, we compare the trends of these computing architectures for high-performance computing and survey these platforms in the execution of algorithms belonging to different scientific application domains. Trends in peak performance, power consumption and sustained performances, for particular applications, show that FPGAs are increasing the gap to GPUs and many-core CPUs moving them away from high-performance computing with intensive floating-point calculations. FPGAs become competitive for custom floating-point or fixed-point representations, for smaller input sizes of certain algorithms, for combinational logic problems and parallel map-reduce problems. © 2014 Technical University of Munich (TUM).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Purpose: The objective of this study is to investigate the feasibility of detecting and quantifying 3D cerebrovascular wall motion from a single 3D rotational x-ray angiography (3DRA) acquisition within a clinically acceptable time and computing from the estimated motion field for the further biomechanical modeling of the cerebrovascular wall. Methods: The whole motion cycle of the cerebral vasculature is modeled using a 4D B-spline transformation, which is estimated from a 4D to 2D + t image registration framework. The registration is performed by optimizing a single similarity metric between the entire 2D + t measured projection sequence and the corresponding forward projections of the deformed volume at their exact time instants. The joint use of two acceleration strategies, together with their implementation on graphics processing units, is also proposed so as to reach computation times close to clinical requirements. For further characterizing vessel wall properties, an approximation of the wall thickness changes is obtained through a strain calculation. Results: Evaluation on in silico and in vitro pulsating phantom aneurysms demonstrated an accurate estimation of wall motion curves. In general, the error was below 10% of the maximum pulsation, even in the situation when substantial inhomogeneous intensity pattern was present. Experiments on in vivo data provided realistic aneurysm and vessel wall motion estimates, whereas in regions where motion was neither visible nor anatomically possible, no motion was detected. The use of the acceleration strategies enabled completing the estimation process for one entire cycle in 5-10 min without degrading the overall performance. The strain map extracted from our motion estimation provided a realistic deformation measure of the vessel wall. Conclusions: The authors' technique has demonstrated that it can provide accurate and robust 4D estimates of cerebrovascular wall motion within a clinically acceptable time, although it has to be applied to a larger patient population prior to possible wide application to routine endovascular procedures. In particular, for the first time, this feasibility study has shown that in vivo cerebrovascular motion can be obtained intraprocedurally from a 3DRA acquisition. Results have also shown the potential of performing strain analysis using this imaging modality, thus making possible for the future modeling of biomechanical properties of the vascular wall.