877 resultados para Processamento paralelo


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Clusters de computadores são geralmente utilizados para se obter alto desempenho na execução de aplicações paralelas. Sua utilização tem aumentado significativamente ao longo dos anos e resulta hoje em uma presença de quase 60% entre as 500 máquinas mais rápidas do mundo. Embora a utilização de clusters seja bastante difundida, a tarefa de monitoramento de recursos dessas máquinas é considerada complexa. Essa complexidade advém do fato de existirem diferentes configurações de software e hardware que podem ser caracterizadas como cluster. Diferentes configurações acabam por fazer com que o administrador de um cluster necessite de mais de uma ferramenta de monitoramento para conseguir obter informações suficientes para uma tomada de decisão acerca de eventuais problemas que possam estar acontecendo no seu cluster. Outra situação que demonstra a complexidade da tarefa de monitoramento acontece quando o desenvolvedor de aplicações paralelas necessita de informações relativas ao ambiente de execução da sua aplicação para entender melhor o seu comportamento. A execução de aplicações paralelas em ambientes multi-cluster e grid juntamente com a necessidade de informações externas à aplicação é outra situação que necessita da tarefa de monitoramento. Em todas essas situações, verifica-se a existência de múltiplas fontes de dados independentes e que podem ter informações relacionadas ou complementares. O objetivo deste trabalho é propor um modelo de integração de dados que pode se adaptar a diferentes fontes de informação e gerar como resultado informações integradas que sejam passíveis de uma visualização conjunta por alguma ferramenta. Esse modelo é baseado na depuração offline de aplicações paralelas e é dividido em duas etapas: a coleta de dados e uma posterior integração das informações. Um protótipo baseado nesse modelo de integração é descrito neste trabalho Esse protótipo utiliza como fontes de informação as ferramentas de monitoramento de cluster Ganglia e Performance Co-Pilot, bibliotecas de rastreamento de aplicações DECK e MPI e uma instrumentação do Sistema operacional Linux para registrar as trocas de contexto de um conjunto de processos. Pajé é a ferramenta escolhida para a visualização integrada das informações. Os resultados do processo de integração de dados pelo protótipo apresentado neste trabalho são caracterizados em três tipos: depuração de aplicações DECK, depuração de aplicações MPI e monitoramento de cluster. Ao final do texto, são delineadas algumas conclusões e contribuições desse trabalho, assim como algumas sugestões de trabalhos futuros.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Estratégias para descoberta de recursos permitem a localização automática de dispositivos e serviços em rede, e seu estudo é motivado pelo elevado enriquecimento computacional dos ambientes com os quais interage-se. Essa situação se deve principalmente à popularização de dispositivos pessoais móveis e de infra-estruturas de comunicação baseadas em redes sem-fio. Associado à rede fixa, esse ambiente computacional proporciona um novo paradigma conhecido como computação pervasiva. No escopo de estudo da computação pervasiva, o Grupo de Processamento Paralelo e Distribuído da Universidade Federal do Rio Grande do Sul desenvolve o projeto ISAM. Este engloba frentes de pesquisa que tratam tanto da programação de aplicações pervasivas como também do suporte à execução dessas. Esse suporte é provido pelo middleware EXEHDA, o qual disponibiliza um conjunto de serviços que podem ser utilizados por essas aplicações ou por outros serviços do ambiente de execução. Essa dissertação aborda especificamente o Pervasive Discovery Service (PerDiS), o qual atua como um mecanismo para descoberta de recursos no ambiente pervasivo proporcionado pelo ISAM. A concepção do PerDiS baseou-se na identificação dos principais requisitos de uma solução para descoberta de recursos apropriada para utilização em um cenário de computação pervasiva Resumidamente, os requisitos identificados nessa pesquisa e considerados pelo PerDiS tratam de questões relacionadas aos seguintes aspectos: a) utilização de informações do contexto de execução, b) utilização de estratégias para manutenção automática da consistência, c) expressividade na descrição de recursos e critérios de pesquisa, d) possibilidade de interoperabilidade com outras estratégias de descoberta, e) suporte à descoberta de recursos em larga-escala, e f) utilização de preferências por usuário. A arquitetura PerDiS para descoberta de recursos utiliza em sua concepção outros serviços disponibilizados pelo ambiente de execução do ISAM para atingir seus objetivos, e ao mesmo tempo provê um serviço que também pode ser utilizado por esses. O modelo proposto é validado através da implementação de um protótipo, integrado à plataforma ISAM. Os resultados obtidos mostram que o PerDiS é apropriado para utilização em ambientes pervasivos, mesmo considerando os desafios impostos por esse paradigma.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The number of applications based on embedded systems grows significantly every year, even with the fact that embedded systems have restrictions, and simple processing units, the performance of these has improved every day. However the complexity of applications also increase, a better performance will always be necessary. So even such advances, there are cases, which an embedded system with a single unit of processing is not sufficient to achieve the information processing in real time. To improve the performance of these systems, an implementation with parallel processing can be used in more complex applications that require high performance. The idea is to move beyond applications that already use embedded systems, exploring the use of a set of units processing working together to implement an intelligent algorithm. The number of existing works in the areas of parallel processing, systems intelligent and embedded systems is wide. However works that link these three areas to solve any problem are reduced. In this context, this work aimed to use tools available for FPGA architectures, to develop a platform with multiple processors to use in pattern classification with artificial neural networks

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The metaheuristics techiniques are known to solve optimization problems classified as NP-complete and are successful in obtaining good quality solutions. They use non-deterministic approaches to generate solutions that are close to the optimal, without the guarantee of finding the global optimum. Motivated by the difficulties in the resolution of these problems, this work proposes the development of parallel hybrid methods using the reinforcement learning, the metaheuristics GRASP and Genetic Algorithms. With the use of these techniques, we aim to contribute to improved efficiency in obtaining efficient solutions. In this case, instead of using the Q-learning algorithm by reinforcement learning, just as a technique for generating the initial solutions of metaheuristics, we use it in a cooperative and competitive approach with the Genetic Algorithm and GRASP, in an parallel implementation. In this context, was possible to verify that the implementations in this study showed satisfactory results, in both strategies, that is, in cooperation and competition between them and the cooperation and competition between groups. In some instances were found the global optimum, in others theses implementations reach close to it. In this sense was an analyze of the performance for this proposed approach was done and it shows a good performance on the requeriments that prove the efficiency and speedup (gain in speed with the parallel processing) of the implementations performed

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This study shows the implementation and the embedding of an Artificial Neural Network (ANN) in hardware, or in a programmable device, as a field programmable gate array (FPGA). This work allowed the exploration of different implementations, described in VHDL, of multilayer perceptrons ANN. Due to the parallelism inherent to ANNs, there are disadvantages in software implementations due to the sequential nature of the Von Neumann architectures. As an alternative to this problem, there is a hardware implementation that allows to exploit all the parallelism implicit in this model. Currently, there is an increase in use of FPGAs as a platform to implement neural networks in hardware, exploiting the high processing power, low cost, ease of programming and ability to reconfigure the circuit, allowing the network to adapt to different applications. Given this context, the aim is to develop arrays of neural networks in hardware, a flexible architecture, in which it is possible to add or remove neurons, and mainly, modify the network topology, in order to enable a modular network of fixed-point arithmetic in a FPGA. Five synthesis of VHDL descriptions were produced: two for the neuron with one or two entrances, and three different architectures of ANN. The descriptions of the used architectures became very modular, easily allowing the increase or decrease of the number of neurons. As a result, some complete neural networks were implemented in FPGA, in fixed-point arithmetic, with a high-capacity parallel processing

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The last years have presented an increase in the acceptance and adoption of the parallel processing, as much for scientific computation of high performance as for applications of general intention. This acceptance has been favored mainly for the development of environments with massive parallel processing (MPP - Massively Parallel Processing) and of the distributed computation. A common point between distributed systems and MPPs architectures is the notion of message exchange, that allows the communication between processes. An environment of message exchange consists basically of a communication library that, acting as an extension of the programming languages that allow to the elaboration of applications parallel, such as C, C++ and Fortran. In the development of applications parallel, a basic aspect is on to the analysis of performance of the same ones. Several can be the metric ones used in this analysis: time of execution, efficiency in the use of the processing elements, scalability of the application with respect to the increase in the number of processors or to the increase of the instance of the treat problem. The establishment of models or mechanisms that allow this analysis can be a task sufficiently complicated considering parameters and involved degrees of freedom in the implementation of the parallel application. An joined alternative has been the use of collection tools and visualization of performance data, that allow the user to identify to points of strangulation and sources of inefficiency in an application. For an efficient visualization one becomes necessary to identify and to collect given relative to the execution of the application, stage this called instrumentation. In this work it is presented, initially, a study of the main techniques used in the collection of the performance data, and after that a detailed analysis of the main available tools is made that can be used in architectures parallel of the type to cluster Beowulf with Linux on X86 platform being used libraries of communication based in applications MPI - Message Passing Interface, such as LAM and MPICH. This analysis is validated on applications parallel bars that deal with the problems of the training of neural nets of the type perceptrons using retro-propagation. The gotten conclusions show to the potentiality and easinesses of the analyzed tools.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper analyzes the performance of a parallel implementation of Coupled Simulated Annealing (CSA) for the unconstrained optimization of continuous variables problems. Parallel processing is an efficient form of information processing with emphasis on exploration of simultaneous events in the execution of software. It arises primarily due to high computational performance demands, and the difficulty in increasing the speed of a single processing core. Despite multicore processors being easily found nowadays, several algorithms are not yet suitable for running on parallel architectures. The algorithm is characterized by a group of Simulated Annealing (SA) optimizers working together on refining the solution. Each SA optimizer runs on a single thread executed by different processors. In the analysis of parallel performance and scalability, these metrics were investigated: the execution time; the speedup of the algorithm with respect to increasing the number of processors; and the efficient use of processing elements with respect to the increasing size of the treated problem. Furthermore, the quality of the final solution was verified. For the study, this paper proposes a parallel version of CSA and its equivalent serial version. Both algorithms were analysed on 14 benchmark functions. For each of these functions, the CSA is evaluated using 2-24 optimizers. The results obtained are shown and discussed observing the analysis of the metrics. The conclusions of the paper characterize the CSA as a good parallel algorithm, both in the quality of the solutions and the parallel scalability and parallel efficiency

Relevância:

60.00% 60.00%

Publicador:

Resumo:

It bet on the next generation of computers as architecture with multiple processors and/or multicore processors. In this sense there are challenges related to features interconnection, operating frequency, the area on chip, power dissipation, performance and programmability. The mechanism of interconnection and communication it was considered ideal for this type of architecture are the networks-on-chip, due its scalability, reusability and intrinsic parallelism. The networks-on-chip communication is accomplished by transmitting packets that carry data and instructions that represent requests and responses between the processing elements interconnected by the network. The transmission of packets is accomplished as in a pipeline between the routers in the network, from source to destination of the communication, even allowing simultaneous communications between pairs of different sources and destinations. From this fact, it is proposed to transform the entire infrastructure communication of network-on-chip, using the routing mechanisms, arbitration and storage, in a parallel processing system for high performance. In this proposal, the packages are formed by instructions and data that represent the applications, which are executed on routers as well as they are transmitted, using the pipeline and parallel communication transmissions. In contrast, traditional processors are not used, but only single cores that control the access to memory. An implementation of this idea is called IPNoSys (Integrated Processing NoC System), which has an own programming model and a routing algorithm that guarantees the execution of all instructions in the packets, preventing situations of deadlock, livelock and starvation. This architecture provides mechanisms for input and output, interruption and operating system support. As proof of concept was developed a programming environment and a simulator for this architecture in SystemC, which allows configuration of various parameters and to obtain several results to evaluate it

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This work presents the concept, design and implementation of a MP-SoC platform, named STORM (MP-SoC DirecTory-Based PlatfORM). Currently the platform is composed of the following modules: SPARC V8 processor, GPOP processor, Cache module, Memory module, Directory module and two different modles of Network-on-Chip, NoCX4 and Obese Tree. All modules were implemented using SystemC, simulated and validated, individually or in group. The modules description is presented in details. For programming the platform in C it was implemented a SPARC assembler, fully compatible with gcc s generated assembly code. For the parallel programming it was implemented a library for mutex managing, using the due assembler s support. A total of 10 simulations of increasing complexity are presented for the validation of the presented concepts. The simulations include real parallel applications, such as matrix multiplication, Mergesort, KMP, Motion Estimation and DCT 2D

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The increasing complexity of integrated circuits has boosted the development of communications architectures like Networks-on-Chip (NoCs), as an architecture; alternative for interconnection of Systems-on-Chip (SoC). Networks-on-Chip complain for component reuse, parallelism and scalability, enhancing reusability in projects of dedicated applications. In the literature, lots of proposals have been made, suggesting different configurations for networks-on-chip architectures. Among all networks-on-chip considered, the architecture of IPNoSys is a non conventional one, since it allows the execution of operations, while the communication process is performed. This study aims to evaluate the execution of data-flow based applications on IPNoSys, focusing on their adaptation against the design constraints. Data-flow based applications are characterized by the flowing of continuous stream of data, on which operations are executed. We expect that these type of applications can be improved when running on IPNoSys, because they have a programming model similar to the execution model of this network. By observing the behavior of these applications when running on IPNoSys, were performed changes in the execution model of the network IPNoSys, allowing the implementation of an instruction level parallelism. For these purposes, analysis of the implementations of dataflow applications were performed and compared

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Pós-graduação em Engenharia Elétrica - FEIS

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)