947 resultados para parallel processing


Relevância:

70.00% 70.00%

Publicador:

Resumo:

In Fourier domain optical coherence tomography (FD-OCT), a large amount of interference data needs to be resampled from the wavelength domain to the wavenumber domain prior to Fourier transformation. We present an approach to optimize this data processing, using a graphics processing unit (GPU) and parallel processing algorithms. We demonstrate an increased processing and rendering rate over that previously reported by using GPU paged memory to render data in the GPU rather than copying back to the CPU. This avoids unnecessary and slow data transfer, enabling a processing and display rate of well over 524,000 A-scan/s for a single frame. To the best of our knowledge this is the fastest processing demonstrated to date and the first time that FD-OCT processing and rendering has been demonstrated entirely on a GPU.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This study provides evidence for a Stroop-like interference effect in word recognition. Based on phonologic and semantic properties of simple words, participants who performed a same/different word-recognition task exhibited a significant response latency increase when word pairs (e.g., POLL, ROD) featured a comparison word (POLL) that was a homonym of a synonym (pole) of the target word (ROD). These results support a parallel-processing framework of lexical decision making, in which activation of the pathways to word recognition may occur at different levels automatically and in parallel. A subset of simple words that are also brand names was examined and exhibited this same interference. Implications for word recognition theory and practical implications for strategic marketing are discussed.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A parallel method for dynamic partitioning of unstructured meshes is described. The method employs a new iterative optimisation technique which both balances the workload and attempts to minimise the interprocessor communications overhead. Experiments on a series of adaptively refined meshes indicate that the algorithm provides partitions of an equivalent or higher quality to static partitioners (which do not reuse the existing partition) and much more quickly. Perhaps more importantly, the algorithm results in only a small fraction of the amount of data migration compared to the static partitioners.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

OctVCE is a cartesian cell CFD code produced especially for numerical simulations of shock and blast wave interactions with complex geometries. Virtual Cell Embedding (VCE) was chosen as its cartesian cell kernel as it is simple to code and sufficient for practical engineering design problems. This also makes the code much more ‘user-friendly’ than structured grid approaches as the gridding process is done automatically. The CFD methodology relies on a finite-volume formulation of the unsteady Euler equations and is solved using a standard explicit Godonov (MUSCL) scheme. Both octree-based adaptive mesh refinement and shared-memory parallel processing capability have also been incorporated. For further details on the theory behind the code, see the companion report 2007/12.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A central problem in visual perception concerns how humans perceive stable and uniform object colors despite variable lighting conditions (i.e. color constancy). One solution is to 'discount' variations in lighting across object surfaces by encoding color contrasts, and utilize this information to 'fill in' properties of the entire object surface. Implicit in this solution is the caveat that the color contrasts defining object boundaries must be distinguished from the spurious color fringes that occur naturally along luminance-defined edges in the retinal image (i.e. optical chromatic aberration). In the present paper, we propose that the neural machinery underlying color constancy is complemented by an 'error-correction' procedure which compensates for chromatic aberration, and suggest that error-correction may be linked functionally to the experimentally induced illusory colored aftereffects known as McCollough effects (MEs). To test these proposals, we develop a neural network model which incorporates many of the receptive-field (RF) profiles of neurons in primate color vision. The model is composed of two parallel processing streams which encode complementary sets of stimulus features: one stream encodes color contrasts to facilitate filling-in and color constancy; the other stream selectively encodes (spurious) color fringes at luminance boundaries, and learns to inhibit the filling-in of these colors within the first stream. Computer simulations of the model illustrate how complementary color-spatial interactions between error-correction and filling-in operations (a) facilitate color constancy, (b) reveal functional links between color constancy and the ME, and (c) reconcile previously reported anomalies in the local (edge) and global (spreading) properties of the ME. We discuss the broader implications of these findings by considering the complementary functional roles performed by RFs mediating color-spatial interactions in the primate visual system. (C) 2002 Elsevier Science Ltd. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper proposes a wireless EEG acquisition platform based on Open Multimedia Architecture Platform (OMAP) embedded system. A high-impedance active dry electrode was tested for improving the scalp- electrode interface. It was used the sigma-delta ADS1298 analog-to-digital converter, and developed a “kernelspace” character driver to manage the communications between the converter unit and the OMAP’s ARM core. The acquired EEG signal data is processed by a “userspace” application, which accesses the driver’s memory, saves the data to a SD-card and transmits them through a wireless TCP/IP-socket to a PC. The electrodes were tested through the alpha wave replacement phenomenon. The experimental results presented the expected alpha rhythm (8-13 Hz) reactiveness to the eyes opening task. The driver spends about 725 μs to acquire and store the data samples. The application takes about 244 μs to get the data from the driver and 1.4 ms to save it in the SD-card. A WiFi throughput of 12.8Mbps was measured which results in a transmission time of 5 ms for 512 kb of data. The embedded system consumes about 200 mAh when wireless off and 400 mAh when it is on. The system exhibits a reliable performance to record EEG signals and transmit them wirelessly. Besides the microcontroller-based architectures, the proposed platform demonstrates that powerful ARM processors running embedded operating systems can be programmed with real-time constrains at the kernel level in order to control hardware, while maintaining their parallel processing abilities in high level software applications.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Consider the problem of scheduling sporadic tasks on a multiprocessor platform under mutual exclusion constraints. We present an approach which appears promising for allowing large amounts of parallel task executions and still ensures low amounts of blocking.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Nos últimos anos começaram a ser vulgares os computadores dotados de multiprocessadores e multi-cores. De modo a aproveitar eficientemente as novas características desse hardware começaram a surgir ferramentas para facilitar o desenvolvimento de software paralelo, através de linguagens e frameworks, adaptadas a diferentes linguagens. Com a grande difusão de redes de alta velocidade, tal como Gigabit Ethernet e a última geração de redes Wi-Fi, abre-se a oportunidade de, além de paralelizar o processamento entre processadores e cores, poder em simultâneo paralelizá-lo entre máquinas diferentes. Ao modelo que permite paralelizar processamento localmente e em simultâneo distribuí-lo para máquinas que também têm capacidade de o paralelizar, chamou-se “modelo paralelo distribuído”. Nesta dissertação foram analisadas técnicas e ferramentas utilizadas para fazer programação paralela e o trabalho que está feito dentro da área de programação paralela e distribuída. Tendo estes dois factores em consideração foi proposta uma framework que tenta aplicar a simplicidade da programação paralela ao conceito paralelo distribuído. A proposta baseia-se na disponibilização de uma framework em Java com uma interface de programação simples, de fácil aprendizagem e legibilidade que, de forma transparente, é capaz de paralelizar e distribuir o processamento. Apesar de simples, existiu um esforço para a tornar configurável de forma a adaptar-se ao máximo de situações possível. Nesta dissertação serão exploradas especialmente as questões relativas à execução e distribuição de trabalho, e a forma como o código é enviado de forma automática pela rede, para outros nós cooperantes, evitando assim a instalação manual das aplicações em todos os nós da rede. Para confirmar a validade deste conceito e das ideias defendidas nesta dissertação foi implementada esta framework à qual se chamou DPF4j (Distributed Parallel Framework for JAVA) e foram feitos testes e retiradas métricas para verificar a existência de ganhos de performance em relação às soluções já existentes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper proposes and reports the development of an open source solution for the integrated management of Infrastructure as a Service (IaaS) cloud computing resources, through the use of a common API taxonomy, to incorporate open source and proprietary platforms. This research included two surveys on open source IaaS platforms (OpenNebula, OpenStack and CloudStack) and a proprietary platform (Parallels Automation for Cloud Infrastructure - PACI) as well as on IaaS abstraction solutions (jClouds, Libcloud and Deltacloud), followed by a thorough comparison to determine the best approach. The adopted implementation reuses the Apache Deltacloud open source abstraction framework, which relies on the development of software driver modules to interface with different IaaS platforms, and involved the development of a new Deltacloud driver for PACI. The resulting interoperable solution successfully incorporates OpenNebula, OpenStack (reuses pre-existing drivers) and PACI (includes the developed Deltacloud PACI driver) nodes and provides a Web dashboard and a Representational State Transfer (REST) interface library. The results of the exchanged data payload and time response tests performed are presented and discussed. The conclusions show that open source abstraction tools like Deltacloud allow the modular and integrated management of IaaS platforms (open source and proprietary), introduce relevant time and negligible data overheads and, as a result, can be adopted by Small and Medium-sized Enterprise (SME) cloud providers to circumvent the vendor lock-in problem whenever service response time is not critical.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Sparse matrix-vector multiplication (SMVM) is a fundamental operation in many scientific and engineering applications. In many cases sparse matrices have thousands of rows and columns where most of the entries are zero, while non-zero data is spread over the matrix. This sparsity of data locality reduces the effectiveness of data cache in general-purpose processors quite reducing their performance efficiency when compared to what is achieved with dense matrix multiplication. In this paper, we propose a parallel processing solution for SMVM in a many-core architecture. The architecture is tested with known benchmarks using a ZYNQ-7020 FPGA. The architecture is scalable in the number of core elements and limited only by the available memory bandwidth. It achieves performance efficiencies up to almost 70% and better performances than previous FPGA designs.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Trabalho apresentado no âmbito do Mestrado em Engenharia Informática, como requisito parcial para obtenção do grau de Mestre em Engenharia Informática

Relevância:

60.00% 60.00%

Publicador:

Resumo:

El avance en la potencia de cómputo en nuestros días viene dado por la paralelización del procesamiento, dadas las características que disponen las nuevas arquitecturas de hardware. Utilizar convenientemente este hardware impacta en la aceleración de los algoritmos en ejecución (programas). Sin embargo, convertir de forma adecuada el algoritmo en su forma paralela es complejo, y a su vez, esta forma, es específica para cada tipo de hardware paralelo. En la actualidad los procesadores de uso general más comunes son los multicore, procesadores paralelos, también denominados Symmetric Multi-Processors (SMP). Hoy en día es difícil hallar un procesador para computadoras de escritorio que no tengan algún tipo de paralelismo del caracterizado por los SMP, siendo la tendencia de desarrollo, que cada día nos encontremos con procesadores con mayor numero de cores disponibles. Por otro lado, los dispositivos de procesamiento de video (Graphics Processor Units - GPU), a su vez, han ido desarrollando su potencia de cómputo por medio de disponer de múltiples unidades de procesamiento dentro de su composición electrónica, a tal punto que en la actualidad no es difícil encontrar placas de GPU con capacidad de 200 a 400 hilos de procesamiento paralelo. Estos procesadores son muy veloces y específicos para la tarea que fueron desarrollados, principalmente el procesamiento de video. Sin embargo, como este tipo de procesadores tiene muchos puntos en común con el procesamiento científico, estos dispositivos han ido reorientándose con el nombre de General Processing Graphics Processor Unit (GPGPU). A diferencia de los procesadores SMP señalados anteriormente, las GPGPU no son de propósito general y tienen sus complicaciones para uso general debido al límite en la cantidad de memoria que cada placa puede disponer y al tipo de procesamiento paralelo que debe realizar para poder ser productiva su utilización. Los dispositivos de lógica programable, FPGA, son dispositivos capaces de realizar grandes cantidades de operaciones en paralelo, por lo que pueden ser usados para la implementación de algoritmos específicos, aprovechando el paralelismo que estas ofrecen. Su inconveniente viene derivado de la complejidad para la programación y el testing del algoritmo instanciado en el dispositivo. Ante esta diversidad de procesadores paralelos, el objetivo de nuestro trabajo está enfocado en analizar las características especificas que cada uno de estos tienen, y su impacto en la estructura de los algoritmos para que su utilización pueda obtener rendimientos de procesamiento acordes al número de recursos utilizados y combinarlos de forma tal que su complementación sea benéfica. Específicamente, partiendo desde las características del hardware, determinar las propiedades que el algoritmo paralelo debe tener para poder ser acelerado. Las características de los algoritmos paralelos determinará a su vez cuál de estos nuevos tipos de hardware son los mas adecuados para su instanciación. En particular serán tenidos en cuenta el nivel de dependencia de datos, la necesidad de realizar sincronizaciones durante el procesamiento paralelo, el tamaño de datos a procesar y la complejidad de la programación paralela en cada tipo de hardware. Today´s advances in high-performance computing are driven by parallel processing capabilities of available hardware architectures. These architectures enable the acceleration of algorithms when thes ealgorithms are properly parallelized and exploit the specific processing power of the underneath architecture. Most current processors are targeted for general pruposes and integrate several processor cores on a single chip, resulting in what is known as a Symmetric Multiprocessing (SMP) unit. Nowadays even desktop computers make use of multicore processors. Meanwhile, the industry trend is to increase the number of integrated rocessor cores as technology matures. On the other hand, Graphics Processor Units (GPU), originally designed to handle only video processing, have emerged as interesting alternatives to implement algorithm acceleration. Current available GPUs are able to implement from 200 to 400 threads for parallel processing. Scientific computing can be implemented in these hardware thanks to the programability of new GPUs that have been denoted as General Processing Graphics Processor Units (GPGPU).However, GPGPU offer little memory with respect to that available for general-prupose processors; thus, the implementation of algorithms need to be addressed carefully. Finally, Field Programmable Gate Arrays (FPGA) are programmable devices which can implement hardware logic with low latency, high parallelism and deep pipelines. Thes devices can be used to implement specific algorithms that need to run at very high speeds. However, their programmability is harder that software approaches and debugging is typically time-consuming. In this context where several alternatives for speeding up algorithms are available, our work aims at determining the main features of thes architectures and developing the required know-how to accelerate algorithm execution on them. We look at identifying those algorithms that may fit better on a given architecture as well as compleme

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we investigate various algorithms for performing Fast Fourier Transformation (FFT)/Inverse Fast Fourier Transformation (IFFT), and proper techniques for maximizing the FFT/IFFT execution speed, such as pipelining or parallel processing, and use of memory structures with pre-computed values (look up tables -LUT) or other dedicated hardware components (usually multipliers). Furthermore, we discuss the optimal hardware architectures that best apply to various FFT/IFFT algorithms, along with their abilities to exploit parallel processing with minimal data dependences of the FFT/IFFT calculations. An interesting approach that is also considered in this paper is the application of the integrated processing-in-memory Intelligent RAM (IRAM) chip to high speed FFT/IFFT computing. The results of the assessment study emphasize that the execution speed of the FFT/IFFT algorithms is tightly connected to the capabilities of the FFT/IFFT hardware to support the provided parallelism of the given algorithm. Therefore, we suggest that the basic Discrete Fourier Transform (DFT)/Inverse Discrete Fourier Transform (IDFT) can also provide high performances, by utilizing a specialized FFT/IFFT hardware architecture that can exploit the provided parallelism of the DFT/IDF operations. The proposed improvements include simplified multiplications over symbols given in polar coordinate system, using sinе and cosine look up tables, and an approach for performing parallel addition of N input symbols.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we investigate various algorithms for performing Fast Fourier Transformation (FFT)/Inverse Fast Fourier Transformation (IFFT), and proper techniquesfor maximizing the FFT/IFFT execution speed, such as pipelining or parallel processing, and use of memory structures with pre-computed values (look up tables -LUT) or other dedicated hardware components (usually multipliers). Furthermore, we discuss the optimal hardware architectures that best apply to various FFT/IFFT algorithms, along with their abilities to exploit parallel processing with minimal data dependences of the FFT/IFFT calculations. An interesting approach that is also considered in this paper is the application of the integrated processing-in-memory Intelligent RAM (IRAM) chip to high speed FFT/IFFT computing. The results of the assessment study emphasize that the execution speed of the FFT/IFFT algorithms is tightly connected to the capabilities of the FFT/IFFT hardware to support the provided parallelism of the given algorithm. Therefore, we suggest that the basic Discrete Fourier Transform (DFT)/Inverse Discrete Fourier Transform (IDFT) can also provide high performances, by utilizing a specialized FFT/IFFT hardware architecture that can exploit the provided parallelism of the DFT/IDF operations. The proposed improvements include simplified multiplications over symbols given in polar coordinate system, using sinе and cosine look up tables,and an approach for performing parallel addition of N input symbols.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This note shows how to set up a Linux cluster for MPI parallel processing using ParallelKnoppix, a bootable CD.