Biblioteca Digital

882 resultados para parallel processing systems

Estudos de algumas ferramentas de coleta e visualização de dados e desempenho de aplicações paralelas no ambiente MPI

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The last years have presented an increase in the acceptance and adoption of the parallel processing, as much for scientific computation of high performance as for applications of general intention. This acceptance has been favored mainly for the development of environments with massive parallel processing (MPP - Massively Parallel Processing) and of the distributed computation. A common point between distributed systems and MPPs architectures is the notion of message exchange, that allows the communication between processes. An environment of message exchange consists basically of a communication library that, acting as an extension of the programming languages that allow to the elaboration of applications parallel, such as C, C++ and Fortran. In the development of applications parallel, a basic aspect is on to the analysis of performance of the same ones. Several can be the metric ones used in this analysis: time of execution, efficiency in the use of the processing elements, scalability of the application with respect to the increase in the number of processors or to the increase of the instance of the treat problem. The establishment of models or mechanisms that allow this analysis can be a task sufficiently complicated considering parameters and involved degrees of freedom in the implementation of the parallel application. An joined alternative has been the use of collection tools and visualization of performance data, that allow the user to identify to points of strangulation and sources of inefficiency in an application. For an efficient visualization one becomes necessary to identify and to collect given relative to the execution of the application, stage this called instrumentation. In this work it is presented, initially, a study of the main techniques used in the collection of the performance data, and after that a detailed analysis of the main available tools is made that can be used in architectures parallel of the type to cluster Beowulf with Linux on X86 platform being used libraries of communication based in applications MPI - Message Passing Interface, such as LAM and MPICH. This analysis is validated on applications parallel bars that deal with the problems of the training of neural nets of the type perceptrons using retro-propagation. The gotten conclusions show to the potentiality and easinesses of the analyzed tools.

Escalabilidade Paralela de um Algoritmo de Migração Reversa no Tempo (RTM) Pré-empilhamento

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The seismic method is of extreme importance in geophysics. Mainly associated with oil exploration, this line of research focuses most of all investment in this area. The acquisition, processing and interpretation of seismic data are the parts that instantiate a seismic study. Seismic processing in particular is focused on the imaging that represents the geological structures in subsurface. Seismic processing has evolved significantly in recent decades due to the demands of the oil industry, and also due to the technological advances of hardware that achieved higher storage and digital information processing capabilities, which enabled the development of more sophisticated processing algorithms such as the ones that use of parallel architectures. One of the most important steps in seismic processing is imaging. Migration of seismic data is one of the techniques used for imaging, with the goal of obtaining a seismic section image that represents the geological structures the most accurately and faithfully as possible. The result of migration is a 2D or 3D image which it is possible to identify faults and salt domes among other structures of interest, such as potential hydrocarbon reservoirs. However, a migration fulfilled with quality and accuracy may be a long time consuming process, due to the mathematical algorithm heuristics and the extensive amount of data inputs and outputs involved in this process, which may take days, weeks and even months of uninterrupted execution on the supercomputers, representing large computational and financial costs, that could derail the implementation of these methods. Aiming at performance improvement, this work conducted the core parallelization of a Reverse Time Migration (RTM) algorithm, using the parallel programming model Open Multi-Processing (OpenMP), due to the large computational effort required by this migration technique. Furthermore, analyzes such as speedup, efficiency were performed, and ultimately, the identification of the algorithmic scalability degree with respect to the technological advancement expected by future processors

Avaliação da execução de aplicações orientadas à dados na arquitetura de redes em chip IPNoSys

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The increasing complexity of integrated circuits has boosted the development of communications architectures like Networks-on-Chip (NoCs), as an architecture; alternative for interconnection of Systems-on-Chip (SoC). Networks-on-Chip complain for component reuse, parallelism and scalability, enhancing reusability in projects of dedicated applications. In the literature, lots of proposals have been made, suggesting different configurations for networks-on-chip architectures. Among all networks-on-chip considered, the architecture of IPNoSys is a non conventional one, since it allows the execution of operations, while the communication process is performed. This study aims to evaluate the execution of data-flow based applications on IPNoSys, focusing on their adaptation against the design constraints. Data-flow based applications are characterized by the flowing of continuous stream of data, on which operations are executed. We expect that these type of applications can be improved when running on IPNoSys, because they have a programming model similar to the execution model of this network. By observing the behavior of these applications when running on IPNoSys, were performed changes in the execution model of the network IPNoSys, allowing the implementation of an instruction level parallelism. For these purposes, analysis of the implementations of dataflow applications were performed and compared

A W-matrix methodology for solving sparse network equations on multiprocessor computers

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper describes a methodology for solving efficiently the sparse network equations on multiprocessor computers. The methodology is based on the matrix inverse factors (W-matrix) approach to the direct solution phase of A(x) = b systems. A partitioning scheme of W-matrix , based on the leaf-nodes of the factorization path tree, is proposed. The methodology allows the performance of all the updating operations on vector b in parallel, within each partition, using a row-oriented processing. The approach takes advantage of the processing power of the individual processors. Performance results are presented and discussed.

A heuristic method for reactive power planning

Relevância:

90.00% 90.00%

Publicador:

Resumo:

An approach for solving reactive power planning problems is presented, which is based on binary search techniques and the use of a special heuristic to obtain a discrete solution. Two versions were developed, one to run on conventional (sequential) computers and the other to run on a distributed memory (hypercube) machine. This latter parallel processing version employs an asynchronous programming model. Once the set of candidate buses has been defined, the program gives the location and size of the reactive sources needed(if any) in keeping with operating and security constraints.

The Owner Share scheduler for a distributed system

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In large distributed systems, where shared resources are owned by distinct entities, there is a need to reflect resource ownership in resource allocation. An appropriate resource management system should guarantee that resource's owners have access to a share of resources proportional to the share they provide. In order to achieve that some policies can be used for revoking access to resources currently used by other users. In this paper, a scheduling policy based in the concept of distributed ownership is introduced called Owner Share Enforcement Policy (OSEP). OSEP goal is to guarantee that owner do not have their jobs postponed for longer periods of time. We evaluate the results achieved with the application of this policy using metrics that describe policy violation, loss of capacity, policy cost and user satisfaction in environments with and without job checkpointing. We also evaluate and compare the OSEP policy with the Fair-Share policy, and from these results it is possible to capture the trade-offs from different ways to achieve fairness based on the user satisfaction. © 2009 IEEE.

Using GPU to exploit parallelism on cryptography

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this article we explore the NVIDIA graphical processing units (GPU) computational power in cryptography using CUDA (Compute Unified Device Architecture) technology. CUDA makes the general purpose computing easy using the parallel processing presents in GPUs. To do this, the NVIDIA GPUs architectures and CUDA are presented, besides cryptography concepts. Furthermore, we do the comparison between the versions executed in CPU with the parallel version of the cryptography algorithms Advanced Encryption Standard (AES) and Message-digest Algorithm 5 (MD5) wrote in CUDA. © 2011 AISTI.

Transformações de Lorentz e seus invariantes

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper, we will show the types of Lorentz transformations, from the most described in books, special Lorentz transformation that relates two inertial systems whose relative velocities are directed along an axis of the respective bases systems. However, we will see a peculiarity that goes unnoticed in this transformation, although they have reported in many books a parallel between the transformation inertial systems, due to the fact that the speed is parallel to an axis, it is actually a semi-parallel processing. The next transformation that we will see is one in which a system moves with a relative speed that has arbitrary direction with respect to a given system, we will show that this transformation may be appointed as non-rotational Lorentz transformation. Before obtain, the later type of transformation, the rotational Lorentz transformation, which is the interface between Special Relativity and General Relativity, we will describe the systems to be rotated, not just inertial systems, show what the characteristics are that define the non-rotational and rotational transformations. The in last topic of this chapter we will also show how the idea of Thoma’s theorythat uses this transformation to create what he defines as the proper coordinate axes of the particleused to obtain the factor 1/2 electron spin. In the last chapter we show how the Lorentz invariants are obtained, quantities measures that are also in different Lorentz reference, with the focus on mass that has erroneously been described in many books, that varies according to the agreement reference system

A rigorous approach to facilitate and guarantee the correctness of the genetic testing management in human genome information systems

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Abstract Background Recent medical and biological technology advances have stimulated the development of new testing systems that have been providing huge, varied amounts of molecular and clinical data. Growing data volumes pose significant challenges for information processing systems in research centers. Additionally, the routines of genomics laboratory are typically characterized by high parallelism in testing and constant procedure changes. Results This paper describes a formal approach to address this challenge through the implementation of a genetic testing management system applied to human genome laboratory. We introduced the Human Genome Research Center Information System (CEGH) in Brazil, a system that is able to support constant changes in human genome testing and can provide patients updated results based on the most recent and validated genetic knowledge. Our approach uses a common repository for process planning to ensure reusability, specification, instantiation, monitoring, and execution of processes, which are defined using a relational database and rigorous control flow specifications based on process algebra (ACP). The main difference between our approach and related works is that we were able to join two important aspects: 1) process scalability achieved through relational database implementation, and 2) correctness of processes using process algebra. Furthermore, the software allows end users to define genetic testing without requiring any knowledge about business process notation or process algebra. Conclusions This paper presents the CEGH information system that is a Laboratory Information Management System (LIMS) based on a formal framework to support genetic testing management for Mendelian disorder studies. We have proved the feasibility and showed usability benefits of a rigorous approach that is able to specify, validate, and perform genetic testing using easy end user interfaces.

Multi Processor Systems On Chip with Configurable Hardware Acceleration

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The evolution of the electronics embedded applications forces electronics systems designers to match their ever increasing requirements. This evolution pushes the computational power of digital signal processing systems, as well as the energy required to accomplish the computations, due to the increasing mobility of such applications. Current approaches used to match these requirements relies on the adoption of application specific signal processors. Such kind of devices exploits powerful accelerators, which are able to match both performance and energy requirements. On the other hand, the too high specificity of such accelerators often results in a lack of flexibility which affects non-recurrent engineering costs, time to market, and market volumes too. The state of the art mainly proposes two solutions to overcome these issues with the ambition of delivering reasonable performance and energy efficiency: reconfigurable computing and multi-processors computing. All of these solutions benefits from the post-fabrication programmability, that definitively results in an increased flexibility. Nevertheless, the gap between these approaches and dedicated hardware is still too high for many application domains, especially when targeting the mobile world. In this scenario, flexible and energy efficient acceleration can be achieved by merging these two computational paradigms, in order to address all the above introduced constraints. This thesis focuses on the exploration of the design and application spectrum of reconfigurable computing, exploited as application specific accelerators for multi-processors systems on chip. More specifically, it introduces a reconfigurable digital signal processor featuring a heterogeneous set of reconfigurable engines, and a homogeneous multi-core system, exploiting three different flavours of reconfigurable and mask-programmable technologies as implementation platform for applications specific accelerators. In this work, the various trade-offs concerning the utilization multi-core platforms and the different configuration technologies are explored, characterizing the design space of the proposed approach in terms of programmability, performance, energy efficiency and manufacturing costs.

HIGH PERFORMANCE, LOW COST SUBSPACE DECOMPOSITION AND POLYNOMIAL ROOTING FOR REAL TIME DIRECTION OF ARRIVAL ESTIMATION: ANALYSIS AND IMPLEMENTATION

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This thesis develops high performance real-time signal processing modules for direction of arrival (DOA) estimation for localization systems. It proposes highly parallel algorithms for performing subspace decomposition and polynomial rooting, which are otherwise traditionally implemented using sequential algorithms. The proposed algorithms address the emerging need for real-time localization for a wide range of applications. As the antenna array size increases, the complexity of signal processing algorithms increases, making it increasingly difficult to satisfy the real-time constraints. This thesis addresses real-time implementation by proposing parallel algorithms, that maintain considerable improvement over traditional algorithms, especially for systems with larger number of antenna array elements. Singular value decomposition (SVD) and polynomial rooting are two computationally complex steps and act as the bottleneck to achieving real-time performance. The proposed algorithms are suitable for implementation on field programmable gated arrays (FPGAs), single instruction multiple data (SIMD) hardware or application specific integrated chips (ASICs), which offer large number of processing elements that can be exploited for parallel processing. The designs proposed in this thesis are modular, easily expandable and easy to implement. Firstly, this thesis proposes a fast converging SVD algorithm. The proposed method reduces the number of iterations it takes to converge to correct singular values, thus achieving closer to real-time performance. A general algorithm and a modular system design are provided making it easy for designers to replicate and extend the design to larger matrix sizes. Moreover, the method is highly parallel, which can be exploited in various hardware platforms mentioned earlier. A fixed point implementation of proposed SVD algorithm is presented. The FPGA design is pipelined to the maximum extent to increase the maximum achievable frequency of operation. The system was developed with the objective of achieving high throughput. Various modern cores available in FPGAs were used to maximize the performance and details of these modules are presented in detail. Finally, a parallel polynomial rooting technique based on Newton’s method applicable exclusively to root-MUSIC polynomials is proposed. Unique characteristics of root-MUSIC polynomial’s complex dynamics were exploited to derive this polynomial rooting method. The technique exhibits parallelism and converges to the desired root within fixed number of iterations, making this suitable for polynomial rooting of large degree polynomials. We believe this is the first time that complex dynamics of root-MUSIC polynomial were analyzed to propose an algorithm. In all, the thesis addresses two major bottlenecks in a direction of arrival estimation system, by providing simple, high throughput, parallel algorithms.

DESIGN AND IMPLEMENT DYNAMIC PROGRAMMING BASED DISCRETE POWER LEVEL SMART HOME SCHEDULING USING FPGA

Relevância:

90.00% 90.00%

Publicador:

Resumo:

With the development and capabilities of the Smart Home system, people today are entering an era in which household appliances are no longer just controlled by people, but also operated by a Smart System. This results in a more efficient, convenient, comfortable, and environmentally friendly living environment. A critical part of the Smart Home system is Home Automation, which means that there is a Micro-Controller Unit (MCU) to control all the household appliances and schedule their operating times. This reduces electricity bills by shifting amounts of power consumption from the on-peak hour consumption to the off-peak hour consumption, in terms of different “hour price”. In this paper, we propose an algorithm for scheduling multi-user power consumption and implement it on an FPGA board, using it as the MCU. This algorithm for discrete power level tasks scheduling is based on dynamic programming, which could find a scheduling solution close to the optimal one. We chose FPGA as our system’s controller because FPGA has low complexity, parallel processing capability, a large amount of I/O interface for further development and is programmable on both software and hardware. In conclusion, it costs little time running on FPGA board and the solution obtained is good enough for the consumers.

A Parallel Solver for the Time-Periodic Navier–Stokes Equations

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We investigate parallel algorithms for the solution of the Navier–Stokes equations in space-time. For periodic solutions, the discretized problem can be written as a large non-linear system of equations. This system of equations is solved by a Newton iteration. The Newton correction is computed using a preconditioned GMRES solver. The parallel performance of the algorithm is illustrated.

Cycling in the nucleus: regulation of RNA 3' processing and nuclear organization of replication-dependent histone genes.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The histones which pack new DNA during the S phase of animal cells are made from mRNAs that are cleaved at their 3' end but not polyadenylated. Some of the factors used in this reaction are unique to it while others are shared with the polyadenylation process that generates all other mRNAs. Recent work has begun to shed light on how the cell manages the assignment of these common components to the two 3' processing systems, and how it achieves their cell cycle-regulation and recruitment to the histone pre-mRNA. Moreover, recent and older findings reveal multiple connections between the nuclear organization of histone genes, their transcription and 3' end processing as well as the control of cell proliferation.

Parallel and Distributed Data Management. Introduction

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The manipulation and handling of an ever increasing volume of data by current data-intensive applications require novel techniques for e?cient data management. Despite recent advances in every aspect of data management (storage, access, querying, analysis, mining), future applications are expected to scale to even higher degrees, not only in terms of volumes of data handled but also in terms of users and resources, often making use of multiple, pre-existing autonomous, distributed or heterogeneous resources.

«
1
2
...
5
6
7
8
9
10
11
...
58
59
»