21 resultados para scalable parallel programming
em Repositório Científico do Instituto Politécnico de Lisboa - Portugal
Resumo:
A novel high throughput and scalable unified architecture for the computation of the transform operations in video codecs for advanced standards is presented in this paper. This structure can be used as a hardware accelerator in modern embedded systems to efficiently compute all the two-dimensional 4 x 4 and 2 x 2 transforms of the H.264/AVC standard. Moreover, its highly flexible design and hardware efficiency allows it to be easily scaled in terms of performance and hardware cost to meet the specific requirements of any given video coding application. Experimental results obtained using a Xilinx Virtex-5 FPGA demonstrated the superior performance and hardware efficiency levels provided by the proposed structure, which presents a throughput per unit of area relatively higher than other similar recently published designs targeting the H.264/AVC standard. Such results also showed that, when integrated in a multi-core embedded system, this architecture provides speedup factors of about 120x concerning pure software implementations of the transform algorithms, therefore allowing the computation, in real-time, of all the above mentioned transforms for Ultra High Definition Video (UHDV) sequences (4,320 x 7,680 @ 30 fps).
Resumo:
This paper is on the problem of short-term hydro, scheduling, particularly concerning head-dependent cascaded hydro systems. We propose a novel mixed-integer quadratic programming approach, considering not only head-dependency, but also discontinuous operating regions and discharge ramping constraints. Thus, an enhanced short-term hydro scheduling is provided due to the more realistic modeling presented in this paper. Numerical results from two case studies, based on Portuguese cascaded hydro systems, illustrate the proficiency of the proposed approach.
Resumo:
International Conference with Peer Review 2012 IEEE International Conference in Geoscience and Remote Sensing Symposium (IGARSS), 22-27 July 2012, Munich, Germany
Resumo:
Conferência: IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors (ASAP)- Jun 05-07, 2013
Resumo:
Relatório do Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia de Electrónica e Telecomunicações
Resumo:
This paper proposes an efficient scalable Residue Number System (RNS) architecture supporting moduli sets with an arbitrary number of channels, allowing to achieve larger dynamic range and a higher level of parallelism. The proposed architecture allows the forward and reverse RNS conversion, by reusing the arithmetic channel units. The arithmetic operations supported at the channel level include addition, subtraction, and multiplication with accumulation capability. For the reverse conversion two algorithms are considered, one based on the Chinese Remainder Theorem and the other one on Mixed-Radix-Conversion, leading to implementations optimized for delay and required circuit area. With the proposed architecture a complete and compact RNS platform is achieved. Experimental results suggest gains of 17 % in the delay in the arithmetic operations, with an area reduction of 23 % regarding the RNS state of the art. When compared with a binary system the proposed architecture allows to perform the same computation 20 times faster alongside with only 10 % of the circuit area resources.
Resumo:
The growing heterogeneity of networks, devices and consumption conditions asks for flexible and adaptive video coding solutions. The compression power of the HEVC standard and the benefits of the distributed video coding paradigm allow designing novel scalable coding solutions with improved error robustness and low encoding complexity while still achieving competitive compression efficiency. In this context, this paper proposes a novel scalable video coding scheme using a HEVC Intra compliant base layer and a distributed coding approach in the enhancement layers (EL). This design inherits the HEVC compression efficiency while providing low encoding complexity at the enhancement layers. The temporal correlation is exploited at the decoder to create the EL side information (SI) residue, an estimation of the original residue. The EL encoder sends only the data that cannot be inferred at the decoder, thus exploiting the correlation between the original and SI residues; however, this correlation must be characterized with an accurate correlation model to obtain coding efficiency improvements. Therefore, this paper proposes a correlation modeling solution to be used at both encoder and decoder, without requiring a feedback channel. Experiments results confirm that the proposed scalable coding scheme has lower encoding complexity and provides BD-Rate savings up to 3.43% in comparison with the HEVC Intra scalable extension under development. © 2014 IEEE.
Resumo:
As high dynamic range video is gaining popularity, video coding solutions able to efficiently provide both low and high dynamic range video, notably with a single bitstream, are increasingly important. While simulcasting can provide both dynamic range videos at the cost of some compression efficiency penalty, bit-depth scalable video coding can provide a better trade-off between compression efficiency, adaptation flexibility and computational complexity. Considering the widespread use of H.264/AVC video, this paper proposes a H.264/AVC backward compatible bit-depth scalable video coding solution offering a low dynamic range base layer and two high dynamic range enhancement layers with different qualities, at low complexity. Experimental results show that the proposed solution has an acceptable rate-distortion performance penalty regarding the HDR H.264/AVC single-layer coding solution.
Resumo:
This letter presents a new parallel method for hyperspectral unmixing composed by the efficient combination of two popular methods: vertex component analysis (VCA) and sparse unmixing by variable splitting and augmented Lagrangian (SUNSAL). First, VCA extracts the endmember signatures, and then, SUNSAL is used to estimate the abundance fractions. Both techniques are highly parallelizable, which significantly reduces the computing time. A design for the commodity graphics processing units of the two methods is presented and evaluated. Experimental results obtained for simulated and real hyperspectral data sets reveal speedups up to 100 times, which grants real-time response required by many remotely sensed hyperspectral applications.
Resumo:
Data analytic applications are characterized by large data sets that are subject to a series of processing phases. Some of these phases are executed sequentially but others can be executed concurrently or in parallel on clusters, grids or clouds. The MapReduce programming model has been applied to process large data sets in cluster and cloud environments. For developing an application using MapReduce there is a need to install/configure/access specific frameworks such as Apache Hadoop or Elastic MapReduce in Amazon Cloud. It would be desirable to provide more flexibility in adjusting such configurations according to the application characteristics. Furthermore the composition of the multiple phases of a data analytic application requires the specification of all the phases and their orchestration. The original MapReduce model and environment lacks flexible support for such configuration and composition. Recognizing that scientific workflows have been successfully applied to modeling complex applications, this paper describes our experiments on implementing MapReduce as subworkflows in the AWARD framework (Autonomic Workflow Activities Reconfigurable and Dynamic). A text mining data analytic application is modeled as a complex workflow with multiple phases, where individual workflow nodes support MapReduce computations. As in typical MapReduce environments, the end user only needs to define the application algorithms for input data processing and for the map and reduce functions. In the paper we present experimental results when using the AWARD framework to execute MapReduce workflows deployed over multiple Amazon EC2 (Elastic Compute Cloud) instances.
Resumo:
A stochastic programming approach is proposed in this paper for the development of offering strategies for a wind power producer. The optimization model is characterized by making the analysis of several scenarios and treating simultaneously two kinds of uncertainty: wind power and electricity market prices. The approach developed allows evaluating alternative production and offers strategies to submit to the electricity market with the ultimate goal of maximizing profits. An innovative comparative study is provided, where the imbalances are treated differently. Also, an application to two new realistic case studies is presented. Finally, conclusions are duly drawn.
Resumo:
Even though Software Transactional Memory (STM) is one of the most promising approaches to simplify concurrent programming, current STM implementations incur significant overheads that render them impractical for many real-sized programs. The key insight of this work is that we do not need to use the same costly barriers for all the memory managed by a real-sized application, if only a small fraction of the memory is under contention lightweight barriers may be used in this case. In this work, we propose a new solution based on an approach of adaptive object metadata (AOM) to promote the use of a fast path to access objects that are not under contention. We show that this approach is able to make the performance of an STM competitive with the best fine-grained lock-based approaches in some of the more challenging benchmarks. (C) 2015 Elsevier Inc. All rights reserved.
Resumo:
Single processor architectures are unable to provide the required performance of high performance embedded systems. Parallel processing based on general-purpose processors can achieve these performances with a considerable increase of required resources. However, in many cases, simplified optimized parallel cores can be used instead of general-purpose processors achieving better performance at lower resource utilization. In this paper, we propose a configurable many-core architecture to serve as a co-processor for high-performance embedded computing on Field-Programmable Gate Arrays. The architecture consists of an array of configurable simple cores with support for floating-point operations interconnected with a configurable interconnection network. For each core it is possible to configure the size of the internal memory, the supported operations and number of interfacing ports. The architecture was tested in a ZYNQ-7020 FPGA in the execution of several parallel algorithms. The results show that the proposed many-core architecture achieves better performance than that achieved with a parallel generalpurpose processor and that up to 32 floating-point cores can be implemented in a ZYNQ-7020 SoC FPGA.
Resumo:
Sparse matrix-vector multiplication (SMVM) is a fundamental operation in many scientific and engineering applications. In many cases sparse matrices have thousands of rows and columns where most of the entries are zero, while non-zero data is spread over the matrix. This sparsity of data locality reduces the effectiveness of data cache in general-purpose processors quite reducing their performance efficiency when compared to what is achieved with dense matrix multiplication. In this paper, we propose a parallel processing solution for SMVM in a many-core architecture. The architecture is tested with known benchmarks using a ZYNQ-7020 FPGA. The architecture is scalable in the number of core elements and limited only by the available memory bandwidth. It achieves performance efficiencies up to almost 70% and better performances than previous FPGA designs.
Resumo:
Hyperspectral imaging has become one of the main topics in remote sensing applications, which comprise hundreds of spectral bands at different (almost contiguous) wavelength channels over the same area generating large data volumes comprising several GBs per flight. This high spectral resolution can be used for object detection and for discriminate between different objects based on their spectral characteristics. One of the main problems involved in hyperspectral analysis is the presence of mixed pixels, which arise when the spacial resolution of the sensor is not able to separate spectrally distinct materials. Spectral unmixing is one of the most important task for hyperspectral data exploitation. However, the unmixing algorithms can be computationally very expensive, and even high power consuming, which compromises the use in applications under on-board constraints. In recent years, graphics processing units (GPUs) have evolved into highly parallel and programmable systems. Specifically, several hyperspectral imaging algorithms have shown to be able to benefit from this hardware taking advantage of the extremely high floating-point processing performance, compact size, huge memory bandwidth, and relatively low cost of these units, which make them appealing for onboard data processing. In this paper, we propose a parallel implementation of an augmented Lagragian based method for unsupervised hyperspectral linear unmixing on GPUs using CUDA. The method called simplex identification via split augmented Lagrangian (SISAL) aims to identify the endmembers of a scene, i.e., is able to unmix hyperspectral data sets in which the pure pixel assumption is violated. The efficient implementation of SISAL method presented in this work exploits the GPU architecture at low level, using shared memory and coalesced accesses to memory.