27 resultados para Distributed shared memory

em Repositório Científico do Instituto Politécnico de Lisboa - Portugal


Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents a new parallel implementation of a previously hyperspectral coded aperture (HYCA) algorithm for compressive sensing on graphics processing units (GPUs). HYCA method combines the ideas of spectral unmixing and compressive sensing exploiting the high spatial correlation that can be observed in the data and the generally low number of endmembers needed in order to explain the data. The proposed implementation exploits the GPU architecture at low level, thus taking full advantage of the computational power of GPUs using shared memory and coalesced accesses to memory. The proposed algorithm is evaluated not only in terms of reconstruction error but also in terms of computational performance using two different GPU architectures by NVIDIA: GeForce GTX 590 and GeForce GTX TITAN. Experimental results using real data reveals signficant speedups up with regards to serial implementation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Hyperspectral imaging can be used for object detection and for discriminating between different objects based on their spectral characteristics. One of the main problems of hyperspectral data analysis is the presence of mixed pixels, due to the low spatial resolution of such images. This means that several spectrally pure signatures (endmembers) are combined into the same mixed pixel. Linear spectral unmixing follows an unsupervised approach which aims at inferring pure spectral signatures and their material fractions at each pixel of the scene. The huge data volumes acquired by such sensors put stringent requirements on processing and unmixing methods. This paper proposes an efficient implementation of a unsupervised linear unmixing method on GPUs using CUDA. The method finds the smallest simplex by solving a sequence of nonsmooth convex subproblems using variable splitting to obtain a constraint formulation, and then applying an augmented Lagrangian technique. The parallel implementation of SISAL presented in this work exploits the GPU architecture at low level, using shared memory and coalesced accesses to memory. The results herein presented indicate that the GPU implementation can significantly accelerate the method's execution over big datasets while maintaining the methods accuracy.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Hyperspectral imaging has become one of the main topics in remote sensing applications, which comprise hundreds of spectral bands at different (almost contiguous) wavelength channels over the same area generating large data volumes comprising several GBs per flight. This high spectral resolution can be used for object detection and for discriminate between different objects based on their spectral characteristics. One of the main problems involved in hyperspectral analysis is the presence of mixed pixels, which arise when the spacial resolution of the sensor is not able to separate spectrally distinct materials. Spectral unmixing is one of the most important task for hyperspectral data exploitation. However, the unmixing algorithms can be computationally very expensive, and even high power consuming, which compromises the use in applications under on-board constraints. In recent years, graphics processing units (GPUs) have evolved into highly parallel and programmable systems. Specifically, several hyperspectral imaging algorithms have shown to be able to benefit from this hardware taking advantage of the extremely high floating-point processing performance, compact size, huge memory bandwidth, and relatively low cost of these units, which make them appealing for onboard data processing. In this paper, we propose a parallel implementation of an augmented Lagragian based method for unsupervised hyperspectral linear unmixing on GPUs using CUDA. The method called simplex identification via split augmented Lagrangian (SISAL) aims to identify the endmembers of a scene, i.e., is able to unmix hyperspectral data sets in which the pure pixel assumption is violated. The efficient implementation of SISAL method presented in this work exploits the GPU architecture at low level, using shared memory and coalesced accesses to memory.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

One of the main problems of hyperspectral data analysis is the presence of mixed pixels due to the low spatial resolution of such images. Linear spectral unmixing aims at inferring pure spectral signatures and their fractions at each pixel of the scene. The huge data volumes acquired by hyperspectral sensors put stringent requirements on processing and unmixing methods. This letter proposes an efficient implementation of the method called simplex identification via split augmented Lagrangian (SISAL) which exploits the graphics processing unit (GPU) architecture at low level using Compute Unified Device Architecture. SISAL aims to identify the endmembers of a scene, i.e., is able to unmix hyperspectral data sets in which the pure pixel assumption is violated. The proposed implementation is performed in a pixel-by-pixel fashion using coalesced accesses to memory and exploiting shared memory to store temporary data. Furthermore, the kernels have been optimized to minimize the threads divergence, therefore achieving high GPU occupancy. The experimental results obtained for the simulated and real hyperspectral data sets reveal speedups up to 49 times, which demonstrates that the GPU implementation can significantly accelerate the method's execution over big data sets while maintaining the methods accuracy.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A dissertation submitted in fulfillment of the requirements to the degree of Master in Computer Science and Computer Engineering

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a distributed model predictive control (DMPC) for indoor thermal comfort that simultaneously optimizes the consumption of a limited shared energy resource. The control objective of each subsystem is to minimize the heating/cooling energy cost while maintaining the indoor temperature and used power inside bounds. In a distributed coordinated environment, the control uses multiple dynamically decoupled agents (one for each subsystem/house) aiming to achieve satisfaction of coupling constraints. According to the hourly power demand profile, each house assigns a priority level that indicates how much is willing to bid in auction for consume the limited clean resource. This procedure allows the bidding value vary hourly and consequently, the agents order to access to the clean energy also varies. Despite of power constraints, all houses have also thermal comfort constraints that must be fulfilled. The system is simulated with several houses in a distributed environment.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a distributed predictive control methodology for indoor thermal comfort that optimizes the consumption of a limited shared energy resource using an integrated demand-side management approach that involves a power price auction and an appliance loads allocation scheme. The control objective for each subsystem (house or building) aims to minimize the energy cost while maintaining the indoor temperature inside comfort limits. In a distributed coordinated multi-agent ecosystem, each house or building control agent achieves its objectives while sharing, among them, the available energy through the introduction of particular coupling constraints in their underlying optimization problem. Coordination is maintained by a daily green energy auction bring in a demand-side management approach. Also the implemented distributed MPC algorithm is described and validated with simulation studies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Even though Software Transactional Memory (STM) is one of the most promising approaches to simplify concurrent programming, current STM implementations incur significant overheads that render them impractical for many real-sized programs. The key insight of this work is that we do not need to use the same costly barriers for all the memory managed by a real-sized application, if only a small fraction of the memory is under contention lightweight barriers may be used in this case. In this work, we propose a new solution based on an approach of adaptive object metadata (AOM) to promote the use of a fast path to access objects that are not under contention. We show that this approach is able to make the performance of an STM competitive with the best fine-grained lock-based approaches in some of the more challenging benchmarks. (C) 2015 Elsevier Inc. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

One of the most efficient approaches to generate the side information (SI) in distributed video codecs is through motion compensated frame interpolation where the current frame is estimated based on past and future reference frames. However, this approach leads to significant spatial and temporal variations in the correlation noise between the source at the encoder and the SI at the decoder. In such scenario, it would be useful to design an architecture where the SI can be more robustly generated at the block level, avoiding the creation of SI frame regions with lower correlation, largely responsible for some coding efficiency losses. In this paper, a flexible framework to generate SI at the block level in two modes is presented: while the first mode corresponds to a motion compensated interpolation (MCI) technique, the second mode corresponds to a motion compensated quality enhancement (MCQE) technique where a low quality Intra block sent by the encoder is used to generate the SI by doing motion estimation with the help of the reference frames. The novel MCQE mode can be overall advantageous from the rate-distortion point of view, even if some rate has to be invested in the low quality Intra coding blocks, for blocks where the MCI produces SI with lower correlation. The overall solution is evaluated in terms of RD performance with improvements up to 2 dB, especially for high motion video sequences and long Group of Pictures (GOP) sizes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Motion compensated frame interpolation (MCFI) is one of the most efficient solutions to generate side information (SI) in the context of distributed video coding. However, it creates SI with rather significant motion compensated errors for some frame regions while rather small for some other regions depending on the video content. In this paper, a low complexity Infra mode selection algorithm is proposed to select the most 'critical' blocks in the WZ frame and help the decoder with some reliable data for those blocks. For each block, the novel coding mode selection algorithm estimates the encoding rate for the Intra based and WZ coding modes and determines the best coding mode while maintaining a low encoder complexity. The proposed solution is evaluated in terms of rate-distortion performance with improvements up to 1.2 dB regarding a WZ coding mode only solution.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The advances made in channel-capacity codes, such as turbo codes and low-density parity-check (LDPC) codes, have played a major role in the emerging distributed source coding paradigm. LDPC codes can be easily adapted to new source coding strategies due to their natural representation as bipartite graphs and the use of quasi-optimal decoding algorithms, such as belief propagation. This paper tackles a relevant scenario in distributedvideo coding: lossy source coding when multiple side information (SI) hypotheses are available at the decoder, each one correlated with the source according to different correlation noise channels. Thus, it is proposed to exploit multiple SI hypotheses through an efficient joint decoding technique withmultiple LDPC syndrome decoders that exchange information to obtain coding efficiency improvements. At the decoder side, the multiple SI hypotheses are created with motion compensated frame interpolation and fused together in a novel iterative LDPC based Slepian-Wolf decoding algorithm. With the creation of multiple SI hypotheses and the proposed decoding algorithm, bitrate savings up to 8.0% are obtained for similar decoded quality.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we propose a new technique that can identify transaction-local memory (i.e. captured memory), in managed environments, while having a low runtime overhead. We implemented our proposal in a well known STM framework (Deuce) and we tested it in STMBench7 with two different STMs: TL2 and LSA. In both STMs the performance improved significantly (4 times and 2.6 times, respectively). Moreover, running the STAMP benchmarks with our approach shows improvements of 7 times in the best case for the Vacation application.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In distributed video coding, motion estimation is typically performed at the decoder to generate the side information, increasing the decoder complexity while providing low complexity encoding in comparison with predictive video coding. Motion estimation can be performed once to create the side information or several times to refine the side information quality along the decoding process. In this paper, motion estimation is performed at the decoder side to generate multiple side information hypotheses which are adaptively and dynamically combined, whenever additional decoded information is available. The proposed iterative side information creation algorithm is inspired in video denoising filters and requires some statistics of the virtual channel between each side information hypothesis and the original data. With the proposed denoising algorithm for side information creation, a RD performance gain up to 1.2 dB is obtained for the same bitrate.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Low-density parity-check (LDPC) codes are nowadays one of the hottest topics in coding theory, notably due to their advantages in terms of bit error rate performance and low complexity. In order to exploit the potential of the Wyner-Ziv coding paradigm, practical distributed video coding (DVC) schemes should use powerful error correcting codes with near-capacity performance. In this paper, new ways to design LDPC codes for the DVC paradigm are proposed and studied. The new LDPC solutions rely on merging parity-check nodes, which corresponds to reduce the number of rows in the parity-check matrix. This allows to change gracefully the compression ratio of the source (DCT coefficient bitplane) according to the correlation between the original and the side information. The proposed LDPC codes reach a good performance for a wide range of source correlations and achieve a better RD performance when compared to the popular turbo codes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Processes are a central entity in enterprise collaboration. Collaborative processes need to be executed and coordinated in a distributed Computational platform where computers are connected through heterogeneous networks and systems. Life cycle management of such collaborative processes requires a framework able to handle their diversity based on different computational and communication requirements. This paper proposes a rational for such framework, points out key requirements and proposes it strategy for a supporting technological infrastructure. Beyond the portability of collaborative process definitions among different technological bindings, a framework to handle different life cycle phases of those definitions is presented and discussed. (c) 2007 Elsevier Ltd. All rights reserved.