43 resultados para Parallel application


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia Mecânica

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recent integrated circuit technologies have opened the possibility to design parallel architectures with hundreds of cores on a single chip. The design space of these parallel architectures is huge with many architectural options. Exploring the design space gets even more difficult if, beyond performance and area, we also consider extra metrics like performance and area efficiency, where the designer tries to design the architecture with the best performance per chip area and the best sustainable performance. In this paper we present an algorithm-oriented approach to design a many-core architecture. Instead of doing the design space exploration of the many core architecture based on the experimental execution results of a particular benchmark of algorithms, our approach is to make a formal analysis of the algorithms considering the main architectural aspects and to determine how each particular architectural aspect is related to the performance of the architecture when running an algorithm or set of algorithms. The architectural aspects considered include the number of cores, the local memory available in each core, the communication bandwidth between the many-core architecture and the external memory and the memory hierarchy. To exemplify the approach we did a theoretical analysis of a dense matrix multiplication algorithm and determined an equation that relates the number of execution cycles with the architectural parameters. Based on this equation a many-core architecture has been designed. The results obtained indicate that a 100 mm(2) integrated circuit design of the proposed architecture, using a 65 nm technology, is able to achieve 464 GFLOPs (double precision floating-point) for a memory bandwidth of 16 GB/s. This corresponds to a performance efficiency of 71 %. Considering a 45 nm technology, a 100 mm(2) chip attains 833 GFLOPs which corresponds to 84 % of peak performance These figures are better than those obtained by previous many-core architectures, except for the area efficiency which is limited by the lower memory bandwidth considered. The results achieved are also better than those of previous state-of-the-art many-cores architectures designed specifically to achieve high performance for matrix multiplication.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Amorphous and crystalline sputtered boron carbide thin films have a very high hardness even surpassing that of bulk crystalline boron carbide (≈41 GPa). However, magnetron sputtered B-C films have high friction coefficients (C.o.F) which limit their industrial application. Nanopatterning of materials surfaces has been proposed as a solution to decrease the C.o.F. The contact area of the nanopatterned surfaces is decreased due to the nanometre size of the asperities which results in a significant reduction of adhesion and friction. In the present work, the surface of amorphous and polycrystalline B-C thin films deposited by magnetron sputtering was nanopatterned using infrared femtosecond laser radiation. Successive parallel laser tracks 10 μm apart were overlapped in order to obtain a processed area of about 3 mm2. Sinusoidal-like undulations with the same spatial period as the laser tracks were formed on the surface of the amorphous boron carbide films after laser processing. The undulations amplitude increases with increasing laser fluence. The formation of undulations with a 10 μm period was also observed on the surface of the crystalline boron carbide film processed with a pulse energy of 72 μJ. The amplitude of the undulations is about 10 times higher than in the amorphous films processed at the same pulse energy due to the higher roughness of the films and consequent increase in laser radiation absorption. LIPSS formation on the surface of the films was achieved for the three B-C films under study. However, LIPSS are formed under different circumstances. Processing of the amorphous films at low fluence (72 μJ) results in LIPSS formation only on localized spots on the film surface. LIPSS formation was also observed on the top of the undulations formed after laser processing with 78 μJ of the amorphous film deposited at 800 °C. Finally, large-area homogeneous LIPSS coverage of the boron carbide crystalline films surface was achieved within a large range of laser fluences although holes are also formed at higher laser fluences.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The synthesis of nanocomposite materials combining titanate nanofibers (TNF) with nanocrystalline ZnS and Bi2S3 semiconductors is described in this work. The TNF were produced via hydrothermal synthesis and sensitized with the semiconductor nanoparticles, through a single-source precursor decomposition method. ZnS and Bi2S3 nanoparticles were successfully grown onto the TNF's surface and Bi2S3-ZnS/TNF nanocomposite materials with different layouts. The samples' photocatalytic performance was first evaluated through the production of the hydroxyl radical using terephthalic acid as probe molecule. All the tested samples show photocatalytic ability for the production of this oxidizing species. Afterwards, the samples were investigated for the removal of methylene blue. The nanocomposite materials with best adsorption ability were the ZnS/TNF and Bi2S3ZnS/TNF. The dye removal was systematically studied, and the most promising results were obtained considering a sequential combination of an adsorption-photocatalytic degradation process using the Bi2S3ZnS/TNF powder as a highly adsorbent and photocatalyst material. (C) 2015 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The interest of the study on the implementation of expanded agglomerated cork as exterior wall covering derives from two critical factors in a perspective of sustainable development: the use of a product consisting of a renewable natural material-cork-and the concern to contribute to greater sustainability in construction. The study aims to assess the feasibility of its use by analyzing the corresponding behaviour under different conditions. Since this application is relatively recent, only about ten years old, there is still much to learn about the reliability of its long-term properties. In this context, this study aims to deepen and approach aspects, some of them poorly studied and even unknown, that deal with characteristics that will make the agglomerate a good choice for exterior wall covering. The analysis of these and other characteristics is being performed by testing both under actual exposure conditions, on an experimental cell at LNEC, and on laboratory. In this paper the main laboratory tests are presented and the obtained results are compared with the outcome of the field study. © (2015) Trans Tech Publications, Switzerland.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Even though Software Transactional Memory (STM) is one of the most promising approaches to simplify concurrent programming, current STM implementations incur significant overheads that render them impractical for many real-sized programs. The key insight of this work is that we do not need to use the same costly barriers for all the memory managed by a real-sized application, if only a small fraction of the memory is under contention lightweight barriers may be used in this case. In this work, we propose a new solution based on an approach of adaptive object metadata (AOM) to promote the use of a fast path to access objects that are not under contention. We show that this approach is able to make the performance of an STM competitive with the best fine-grained lock-based approaches in some of the more challenging benchmarks. (C) 2015 Elsevier Inc. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Single processor architectures are unable to provide the required performance of high performance embedded systems. Parallel processing based on general-purpose processors can achieve these performances with a considerable increase of required resources. However, in many cases, simplified optimized parallel cores can be used instead of general-purpose processors achieving better performance at lower resource utilization. In this paper, we propose a configurable many-core architecture to serve as a co-processor for high-performance embedded computing on Field-Programmable Gate Arrays. The architecture consists of an array of configurable simple cores with support for floating-point operations interconnected with a configurable interconnection network. For each core it is possible to configure the size of the internal memory, the supported operations and number of interfacing ports. The architecture was tested in a ZYNQ-7020 FPGA in the execution of several parallel algorithms. The results show that the proposed many-core architecture achieves better performance than that achieved with a parallel generalpurpose processor and that up to 32 floating-point cores can be implemented in a ZYNQ-7020 SoC FPGA.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Hyperspectral imaging has become one of the main topics in remote sensing applications, which comprise hundreds of spectral bands at different (almost contiguous) wavelength channels over the same area generating large data volumes comprising several GBs per flight. This high spectral resolution can be used for object detection and for discriminate between different objects based on their spectral characteristics. One of the main problems involved in hyperspectral analysis is the presence of mixed pixels, which arise when the spacial resolution of the sensor is not able to separate spectrally distinct materials. Spectral unmixing is one of the most important task for hyperspectral data exploitation. However, the unmixing algorithms can be computationally very expensive, and even high power consuming, which compromises the use in applications under on-board constraints. In recent years, graphics processing units (GPUs) have evolved into highly parallel and programmable systems. Specifically, several hyperspectral imaging algorithms have shown to be able to benefit from this hardware taking advantage of the extremely high floating-point processing performance, compact size, huge memory bandwidth, and relatively low cost of these units, which make them appealing for onboard data processing. In this paper, we propose a parallel implementation of an augmented Lagragian based method for unsupervised hyperspectral linear unmixing on GPUs using CUDA. The method called simplex identification via split augmented Lagrangian (SISAL) aims to identify the endmembers of a scene, i.e., is able to unmix hyperspectral data sets in which the pure pixel assumption is violated. The efficient implementation of SISAL method presented in this work exploits the GPU architecture at low level, using shared memory and coalesced accesses to memory.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Remote hyperspectral sensors collect large amounts of data per flight usually with low spatial resolution. It is known that the bandwidth connection between the satellite/airborne platform and the ground station is reduced, thus a compression onboard method is desirable to reduce the amount of data to be transmitted. This paper presents a parallel implementation of an compressive sensing method, called parallel hyperspectral coded aperture (P-HYCA), for graphics processing units (GPU) using the compute unified device architecture (CUDA). This method takes into account two main properties of hyperspectral dataset, namely the high correlation existing among the spectral bands and the generally low number of endmembers needed to explain the data, which largely reduces the number of measurements necessary to correctly reconstruct the original data. Experimental results conducted using synthetic and real hyperspectral datasets on two different GPU architectures by NVIDIA: GeForce GTX 590 and GeForce GTX TITAN, reveal that the use of GPUs can provide real-time compressive sensing performance. The achieved speedup is up to 20 times when compared with the processing time of HYCA running on one core of the Intel i7-2600 CPU (3.4GHz), with 16 Gbyte memory.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

One of the main problems of hyperspectral data analysis is the presence of mixed pixels due to the low spatial resolution of such images. Linear spectral unmixing aims at inferring pure spectral signatures and their fractions at each pixel of the scene. The huge data volumes acquired by hyperspectral sensors put stringent requirements on processing and unmixing methods. This letter proposes an efficient implementation of the method called simplex identification via split augmented Lagrangian (SISAL) which exploits the graphics processing unit (GPU) architecture at low level using Compute Unified Device Architecture. SISAL aims to identify the endmembers of a scene, i.e., is able to unmix hyperspectral data sets in which the pure pixel assumption is violated. The proposed implementation is performed in a pixel-by-pixel fashion using coalesced accesses to memory and exploiting shared memory to store temporary data. Furthermore, the kernels have been optimized to minimize the threads divergence, therefore achieving high GPU occupancy. The experimental results obtained for the simulated and real hyperspectral data sets reveal speedups up to 49 times, which demonstrates that the GPU implementation can significantly accelerate the method's execution over big data sets while maintaining the methods accuracy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Parallel hyperspectral unmixing problem is considered in this paper. A semisupervised approach is developed under the linear mixture model, where the abundance's physical constraints are taken into account. The proposed approach relies on the increasing availability of spectral libraries of materials measured on the ground instead of resorting to endmember extraction methods. Since Libraries are potentially very large and hyperspectral datasets are of high dimensionality a parallel implementation in a pixel-by-pixel fashion is derived to properly exploits the graphics processing units (GPU) architecture at low level, thus taking full advantage of the computational power of GPUs. Experimental results obtained for real hyperspectral datasets reveal significant speedup factors, up to 164 times, with regards to optimized serial implementation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Many Hyperspectral imagery applications require a response in real time or near-real time. To meet this requirement this paper proposes a parallel unmixing method developed for graphics processing units (GPU). This method is based on the vertex component analysis (VCA), which is a geometrical based method highly parallelizable. VCA is a very fast and accurate method that extracts endmember signatures from large hyperspectral datasets without the use of any a priori knowledge about the constituent spectra. Experimental results obtained for simulated and real hyperspectral datasets reveal considerable acceleration factors, up to 24 times.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, a new parallel method for sparse spectral unmixing of remotely sensed hyperspectral data on commodity graphics processing units (GPUs) is presented. A semi-supervised approach is adopted, which relies on the increasing availability of spectral libraries of materials measured on the ground instead of resorting to endmember extraction methods. This method is based on the spectral unmixing by splitting and augmented Lagrangian (SUNSAL) that estimates the material's abundance fractions. The parallel method is performed in a pixel-by-pixel fashion and its implementation properly exploits the GPU architecture at low level, thus taking full advantage of the computational power of GPUs. Experimental results obtained for simulated and real hyperspectral datasets reveal significant speedup factors, up to 1 64 times, with regards to optimized serial implementation.