Biblioteca Digital

79 resultados para openmp

TProf: An energy profiler for task-parallel programs

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present TProf, an energy profiling tool for OpenMP-like task-parallel programs. To compute the energy consumed by each task in a parallel application, TProf dynamically traces the parallel execution and uses a novel technique to estimate the per-task energy consumption. To achieve this estimation, TProf apportions the total processor energy among cores and overcomes the limitation of current works which would otherwise make parallel accounting impossible to achieve. We demonstrate the value of TProf by characterizing a set of task parallel programs, where we find that data locality, memory access patterns and task working sets are responsible for significant variance in energy consumption between seemingly homogeneous tasks. In addition, we identify opportunities for fine-grain energy optimization by applying per-task Dynamic Voltage and Frequency Scaling (DVFS).

Runtime Support for Adaptive Power Capping on Heterogeneous SoCs

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Power capping is a fundamental method for reducing the energy consumption of a wide range of modern computing environments, ranging from mobile embedded systems to datacentres. Unfortunately, maximising performance and system efficiency under static power caps remains challenging, while maximising performance under dynamic power caps has been largely unexplored. We present an adaptive power capping method that reduces the power consumption and maximizes the performance of heterogeneous SoCs for mobile and server platforms. Our technique combines power capping with coordinated DVFS, data partitioning and core allocations on a heterogeneous SoC with ARM processors and FPGA resources. We design our framework as a run-time system based on OpenMP and OpenCL to utilise the heterogeneous resources. We evaluate it through five data-parallel benchmarks on the Xilinx SoC which allows fully voltage and frequency control. Our experiments show a significant performance boost of 30% under dynamic power caps with concurrent execution on ARM and FPGA, compared to a naive separate approach.

Design and Qualitative/Quantitative Analysis of Multi-Agent Spatial Simulation Library

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Thesis (Master's)--University of Washington, 2012

A framework for the development of parallel and distributed real-time embedded systems

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Embedded real-time applications increasingly present high computation requirements, which need to be completed within specific deadlines, but that present highly variable patterns, depending on the set of data available in a determined instant. The current trend to provide parallel processing in the embedded domain allows providing higher processing power; however, it does not address the variability in the processing pattern. Dimensioning each device for its worst-case scenario implies lower average utilization, and increased available, but unusable, processing in the overall system. A solution for this problem is to extend the parallel execution of the applications, allowing networked nodes to distribute the workload, on peak situations, to neighbour nodes. In this context, this report proposes a framework to develop parallel and distributed real-time embedded applications, transparently using OpenMP and Message Passing Interface (MPI), within a programming model based on OpenMP. The technical report also devises an integrated timing model, which enables the structured reasoning on the timing behaviour of these hybrid architectures.

Evaluating the state of the art of parallel programming systems

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes our plans to evaluate the present state of affairs concerning parallel programming and its systems. Three subprojects are proposed: a survey among programmers and scientists, a comparison of parallel programming systems using a standard set of test programs, and a wiki resource for the parallel programming community - the Parawiki. We would like to invite you to participate and turn these subprojects into true community efforts.

Observations on the Publicity and Usage of Parallel Programming Systems and Languages: A Survey Approach

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this publication, we report on an online survey that was carried out among parallel programmers. More than 250 people worldwide have submitted answers to our questions, and their responses are analyzed here. Although not statistically sound, the data we provide give useful insights about which parallel programming systems and languages are known and in actual use. For instance, the collected data indicate that for our survey group MPI and (to a lesser extent) C are the most widely used parallel programming system and language, respectively.

Nested parallelism for multi-core HPC systems using Java

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Since its introduction in 1993, the Message Passing Interface (MPI) has become a de facto standard for writing High Performance Computing (HPC) applications on clusters and Massively Parallel Processors (MPPs). The recent emergence of multi-core processor systems presents a new challenge for established parallel programming paradigms, including those based on MPI. This paper presents a new Java messaging system called MPJ Express. Using this system, we exploit multiple levels of parallelism - messaging and threading - to improve application performance on multi-core processors. We refer to our approach as nested parallelism. This MPI-like Java library can support nested parallelism by using Java or Java OpenMP (JOMP) threads within an MPJ Express process. Practicality of this approach is assessed by porting to Java a massively parallel structure formation code from Cosmology called Gadget-2. We introduce nested parallelism in the Java version of the simulation code and report good speed-ups. To the best of our knowledge it is the first time this kind of hybrid parallelism is demonstrated in a high performance Java application. (C) 2009 Elsevier Inc. All rights reserved.

Análise de desempenho da rede neural artificial do tipo multilayer perceptron na era multicore

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Artificial neural networks are usually applied to solve complex problems. In problems with more complexity, by increasing the number of layers and neurons, it is possible to achieve greater functional efficiency. Nevertheless, this leads to a greater computational effort. The response time is an important factor in the decision to use neural networks in some systems. Many argue that the computational cost is higher in the training period. However, this phase is held only once. Once the network trained, it is necessary to use the existing computational resources efficiently. In the multicore era, the problem boils down to efficient use of all available processing cores. However, it is necessary to consider the overhead of parallel computing. In this sense, this paper proposes a modular structure that proved to be more suitable for parallel implementations. It is proposed to parallelize the feedforward process of an RNA-type MLP, implemented with OpenMP on a shared memory computer architecture. The research consistes on testing and analizing execution times. Speedup, efficiency and parallel scalability are analyzed. In the proposed approach, by reducing the number of connections between remote neurons, the response time of the network decreases and, consequently, so does the total execution time. The time required for communication and synchronization is directly linked to the number of remote neurons in the network, and so it is necessary to investigate which one is the best distribution of remote connections

Escalabilidade Paralela de um Algoritmo de Migração Reversa no Tempo (RTM) Pré-empilhamento

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The seismic method is of extreme importance in geophysics. Mainly associated with oil exploration, this line of research focuses most of all investment in this area. The acquisition, processing and interpretation of seismic data are the parts that instantiate a seismic study. Seismic processing in particular is focused on the imaging that represents the geological structures in subsurface. Seismic processing has evolved significantly in recent decades due to the demands of the oil industry, and also due to the technological advances of hardware that achieved higher storage and digital information processing capabilities, which enabled the development of more sophisticated processing algorithms such as the ones that use of parallel architectures. One of the most important steps in seismic processing is imaging. Migration of seismic data is one of the techniques used for imaging, with the goal of obtaining a seismic section image that represents the geological structures the most accurately and faithfully as possible. The result of migration is a 2D or 3D image which it is possible to identify faults and salt domes among other structures of interest, such as potential hydrocarbon reservoirs. However, a migration fulfilled with quality and accuracy may be a long time consuming process, due to the mathematical algorithm heuristics and the extensive amount of data inputs and outputs involved in this process, which may take days, weeks and even months of uninterrupted execution on the supercomputers, representing large computational and financial costs, that could derail the implementation of these methods. Aiming at performance improvement, this work conducted the core parallelization of a Reverse Time Migration (RTM) algorithm, using the parallel programming model Open Multi-Processing (OpenMP), due to the large computational effort required by this migration technique. Furthermore, analyzes such as speedup, efficiency were performed, and ultimately, the identification of the algorithmic scalability degree with respect to the technological advancement expected by future processors

C programs for solving the time-dependent Gross-Pitaevskii equation in a fully anisotropic trap

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Low-energy electron collisions with glycine

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We report cross sections for elastic electron scattering by gas phase glycine (neutral form), obtained with the Schwinger multichannel method. The present results are the first obtained with a new implementation that combines parallelization with OpenMP directives and pseudopotentials. The position of the well known pi* shape resonance ranged from 2.3 eV to 2.8 eV depending on the polarization model and conformer. For the most stable isomer, the present result (2.4 eV) is in fair agreement with electron transmission spectroscopy assignments (1.93 +/- 0.05 eV) and available calculations. Our results also point out a shape resonance around 9.5 eV in the A' symmetry that would be weakly coupled to vibrations of the hydroxyl group. Since electron attachment to a broad and lower lying sigma* orbital located on the OH bond has been suggested the underlying mechanism leading to dissociative electron attachment at low energies, we sought for a shape resonance around similar to 4 eV. Though we obtained cross sections with the target molecule at the equilibrium geometry and with stretched OH bond lengths, least-squares fits to the calculated eigenphase sums did not point out signatures of this anion state (though, in principle, it could be hidden in the large background). The low energy (similar to 1 eV) integral cross section strongly scales as the bond length is stretched, and this could indicate a virtual state pole, since dipole supported bound states are not expected at the geometries addressed here. (C) 2012 American Institute of Physics. [http://dx.doi.org/10.1063/1.3687345]

IDM - A New Parallel Methodology to Calculate the Determinant of Matrices of the Order n, with Computational Complexity O(n)

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a new parallel methodology for calculating the determinant of matrices of the order n, with computational complexity O(n), using the Gauss-Jordan Elimination Method and Chio's Rule as references. We intend to present our step-by-step methodology using clear mathematical language, where we will demonstrate how to calculate the determinant of a matrix of the order n in an analytical format. We will also present a computational model with one sequential algorithm and one parallel algorithm using a pseudo-code.

Electron Interactions with Disulfide Bridges

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Because of its electronic properties, sulfur plays a major role in a variety of metabolic processes and, more in general, in the chemistry of life. In particular, S-S bridges between cysteines are present in the amino acid backbone of proteins. Protein disulfur radical anions may decay following different paths through competing intra and intermolecular routes, including bond cleavage, disproportionation, protein-protein cross linking, and electron transfer. Indeed, mass spectrometry ECD (electron capture dissociation massspectroscopy) studies have shown that capture of low-energy (<0.2 eV) electrons by multiply protonated proteins is followed by dissociation of S-S bonds holding two peptide chains together. In view of the importance of organic sulfur chemistry, we report on electron interactions with disulphide bridges. To study these interactions we used as prototypes the molecules dimethyl sulfide [(CH3)2S] and dimethyl disulfide [(H3C)S2(CH3)]. We seek to better understand the electron-induced cleavage of the disulfide bond. To explore dissociative processes we performed electron scattering calculations with the Schwinger Multichannel Method with pseudopotentials (SMCPP), recently parallelized with OpenMP directives and optimized with subroutines for linear algebra (BLAS) and LAPACK routines. Elastic cross sections obtained for different S-S bond lengths indicate stabilization of the anion formed by electron attachment to a σ*SS antibonding orbital, such that dissociation would be expected.

Paralelización del algoritmo Monte Carlo Ray Tracing para Comunicaciones Ópticas Submarinas

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Máster Universitario en Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería (SIANI)

Paralelización de un algoritmo de ray tracing para arrays de mililentes

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Máster Universitario en Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería (SIANI)

«
1
2
3
4
5
6
»