806 resultados para structured parallel computations
Resumo:
Multicore platforms have transformed parallelism into a main concern. Parallel programming models are being put forward to provide a better approach for application programmers to expose the opportunities for parallelism by pointing out potentially parallel regions within tasks, leaving the actual and dynamic scheduling of these regions onto processors to be performed at runtime, exploiting the maximum amount of parallelism. It is in this context that this paper proposes a scheduling approach that combines the constant-bandwidth server abstraction with a priority-aware work-stealing load balancing scheme which, while ensuring isolation among tasks, enables parallel tasks to be executed on more than one processor at a given time instant.
Resumo:
The recent trends of chip architectures with higher number of heterogeneous cores, and non-uniform memory/non-coherent caches, brings renewed attention to the use of Software Transactional Memory (STM) as a fundamental building block for developing parallel applications. Nevertheless, although STM promises to ease concurrent and parallel software development, it relies on the possibility of aborting conflicting transactions to maintain data consistency, which impacts on the responsiveness and timing guarantees required by embedded real-time systems. In these systems, contention delays must be (efficiently) limited so that the response times of tasks executing transactions are upper-bounded and task sets can be feasibly scheduled. In this paper we assess the use of STM in the development of embedded real-time software, defending that the amount of contention can be reduced if read-only transactions access recent consistent data snapshots, progressing in a wait-free manner. We show how the required number of versions of a shared object can be calculated for a set of tasks. We also outline an algorithm to manage conflicts between update transactions that prevents starvation.
Resumo:
Over the last three decades, computer architects have been able to achieve an increase in performance for single processors by, e.g., increasing clock speed, introducing cache memories and using instruction level parallelism. However, because of power consumption and heat dissipation constraints, this trend is going to cease. In recent times, hardware engineers have instead moved to new chip architectures with multiple processor cores on a single chip. With multi-core processors, applications can complete more total work than with one core alone. To take advantage of multi-core processors, parallel programming models are proposed as promising solutions for more effectively using multi-core processors. This paper discusses some of the existent models and frameworks for parallel programming, leading to outline a draft parallel programming model for Ada.
Resumo:
The definition and programming of distributed applications has become a major research issue due to the increasing availability of (large scale) distributed platforms and the requirements posed by the economical globalization. However, such a task requires a huge effort due to the complexity of the distributed environments: large amount of users may communicate and share information across different authority domains; moreover, the “execution environment” or “computations” are dynamic since the number of users and the computational infrastructure change in time. Grid environments, in particular, promise to be an answer to deal with such complexity, by providing high performance execution support to large amount of users, and resource sharing across different organizations. Nevertheless, programming in Grid environments is still a difficult task. There is a lack of high level programming paradigms and support tools that may guide the application developer and allow reusability of state-of-the-art solutions. Specifically, the main goal of the work presented in this thesis is to contribute to the simplification of the development cycle of applications for Grid environments by bringing structure and flexibility to three stages of that cycle through a commonmodel. The stages are: the design phase, the execution phase, and the reconfiguration phase. The common model is based on the manipulation of patterns through pattern operators, and the division of both patterns and operators into two categories, namely structural and behavioural. Moreover, both structural and behavioural patterns are first class entities at each of the aforesaid stages. At the design phase, patterns can be manipulated like other first class entities such as components. This allows a more structured way to build applications by reusing and composing state-of-the-art patterns. At the execution phase, patterns are units of execution control: it is possible, for example, to start or stop and to resume the execution of a pattern as a single entity. At the reconfiguration phase, patterns can also be manipulated as single entities with the additional advantage that it is possible to perform a structural reconfiguration while keeping some of the behavioural constraints, and vice-versa. For example, it is possible to replace a behavioural pattern, which was applied to some structural pattern, with another behavioural pattern. In this thesis, besides the proposal of the methodology for distributed application development, as sketched above, a definition of a relevant set of pattern operators was made. The methodology and the expressivity of the pattern operators were assessed through the development of several representative distributed applications. To support this validation, a prototype was designed and implemented, encompassing some relevant patterns and a significant part of the patterns operators defined. This prototype was based in the Triana environment; Triana supports the development and deployment of distributed applications in the Grid through a dataflow-based programming model. Additionally, this thesis also presents the analysis of a mapping of some operators for execution control onto the Distributed Resource Management Application API (DRMAA). This assessment confirmed the suitability of the proposed model, as well as the generality and flexibility of the defined pattern operators
Resumo:
We focus on large-scale and dense deeply embedded systems where, due to the large amount of information generated by all nodes, even simple aggregate computations such as the minimum value (MIN) of the sensor readings become notoriously expensive to obtain. Recent research has exploited a dominance-based medium access control(MAC) protocol, the CAN bus, for computing aggregated quantities in wired systems. For example, MIN can be computed efficiently and an interpolation function which approximates sensor data in an area can be obtained efficiently as well. Dominance-based MAC protocols have recently been proposed for wireless channels and these protocols can be expected to be used for achieving highly scalable aggregate computations in wireless systems. But no experimental demonstration is currently available in the research literature. In this paper, we demonstrate that highly scalable aggregate computations in wireless networks are possible. We do so by (i) building a new wireless hardware platform with appropriate characteristics for making dominance-based MAC protocols efficient, (ii) implementing dominance-based MAC protocols on this platform, (iii) implementing distributed algorithms for aggregate computations (MIN, MAX, Interpolation) using the new implementation of the dominance-based MAC protocol and (iv) performing experiments to prove that such highly scalable aggregate computations in wireless networks are possible.
Resumo:
In this paper, we focus on large-scale and dense Cyber- Physical Systems, and discuss methods that tightly integrate communication and computing with the underlying physical environment. We present Physical Dynamic Priority Dominance ((PD)2) protocol that exemplifies a key mechanism to devise low time-complexity communication protocols for large-scale networked sensor systems. We show that using this mechanism, one can compute aggregate quantities such as the maximum or minimum of sensor readings in a time-complexity that is equivalent to essentially one message exchange. We also illustrate the use of this mechanism in a more complex task of computing the interpolation of smooth as well as non-smooth sensor data in very low timecomplexity.
Resumo:
This letter presents a new parallel method for hyperspectral unmixing composed by the efficient combination of two popular methods: vertex component analysis (VCA) and sparse unmixing by variable splitting and augmented Lagrangian (SUNSAL). First, VCA extracts the endmember signatures, and then, SUNSAL is used to estimate the abundance fractions. Both techniques are highly parallelizable, which significantly reduces the computing time. A design for the commodity graphics processing units of the two methods is presented and evaluated. Experimental results obtained for simulated and real hyperspectral data sets reveal speedups up to 100 times, which grants real-time response required by many remotely sensed hyperspectral applications.
Resumo:
Trabalho apresentado no âmbito do Mestrado em Engenharia Informática, como requisito parcial para obtenção do grau de Mestre em Engenharia Informática
Resumo:
Dissertação apresentada para obtenção do Grau de Doutor em Informática Pela Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia
Resumo:
Dissertação apresentada para a obtenção do Grau de Doutor em Informática pela Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia.
Resumo:
Data analytic applications are characterized by large data sets that are subject to a series of processing phases. Some of these phases are executed sequentially but others can be executed concurrently or in parallel on clusters, grids or clouds. The MapReduce programming model has been applied to process large data sets in cluster and cloud environments. For developing an application using MapReduce there is a need to install/configure/access specific frameworks such as Apache Hadoop or Elastic MapReduce in Amazon Cloud. It would be desirable to provide more flexibility in adjusting such configurations according to the application characteristics. Furthermore the composition of the multiple phases of a data analytic application requires the specification of all the phases and their orchestration. The original MapReduce model and environment lacks flexible support for such configuration and composition. Recognizing that scientific workflows have been successfully applied to modeling complex applications, this paper describes our experiments on implementing MapReduce as subworkflows in the AWARD framework (Autonomic Workflow Activities Reconfigurable and Dynamic). A text mining data analytic application is modeled as a complex workflow with multiple phases, where individual workflow nodes support MapReduce computations. As in typical MapReduce environments, the end user only needs to define the application algorithms for input data processing and for the map and reduce functions. In the paper we present experimental results when using the AWARD framework to execute MapReduce workflows deployed over multiple Amazon EC2 (Elastic Compute Cloud) instances.
Resumo:
Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para obtenção do grau de Mestre em Engenharia Informática
Resumo:
IEEE International Symposium on Circuits and Systems, pp. 724 – 727, Seattle, EUA
Resumo:
Single processor architectures are unable to provide the required performance of high performance embedded systems. Parallel processing based on general-purpose processors can achieve these performances with a considerable increase of required resources. However, in many cases, simplified optimized parallel cores can be used instead of general-purpose processors achieving better performance at lower resource utilization. In this paper, we propose a configurable many-core architecture to serve as a co-processor for high-performance embedded computing on Field-Programmable Gate Arrays. The architecture consists of an array of configurable simple cores with support for floating-point operations interconnected with a configurable interconnection network. For each core it is possible to configure the size of the internal memory, the supported operations and number of interfacing ports. The architecture was tested in a ZYNQ-7020 FPGA in the execution of several parallel algorithms. The results show that the proposed many-core architecture achieves better performance than that achieved with a parallel generalpurpose processor and that up to 32 floating-point cores can be implemented in a ZYNQ-7020 SoC FPGA.
Resumo:
Hyperspectral imaging has become one of the main topics in remote sensing applications, which comprise hundreds of spectral bands at different (almost contiguous) wavelength channels over the same area generating large data volumes comprising several GBs per flight. This high spectral resolution can be used for object detection and for discriminate between different objects based on their spectral characteristics. One of the main problems involved in hyperspectral analysis is the presence of mixed pixels, which arise when the spacial resolution of the sensor is not able to separate spectrally distinct materials. Spectral unmixing is one of the most important task for hyperspectral data exploitation. However, the unmixing algorithms can be computationally very expensive, and even high power consuming, which compromises the use in applications under on-board constraints. In recent years, graphics processing units (GPUs) have evolved into highly parallel and programmable systems. Specifically, several hyperspectral imaging algorithms have shown to be able to benefit from this hardware taking advantage of the extremely high floating-point processing performance, compact size, huge memory bandwidth, and relatively low cost of these units, which make them appealing for onboard data processing. In this paper, we propose a parallel implementation of an augmented Lagragian based method for unsupervised hyperspectral linear unmixing on GPUs using CUDA. The method called simplex identification via split augmented Lagrangian (SISAL) aims to identify the endmembers of a scene, i.e., is able to unmix hyperspectral data sets in which the pure pixel assumption is violated. The efficient implementation of SISAL method presented in this work exploits the GPU architecture at low level, using shared memory and coalesced accesses to memory.