853 resultados para parallel execution


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis presents DCE, or Dynamic Conditional Execution, as an alternative to reduce the cost of mispredicted branches. The basic idea is to fetch all paths produced by a branch that obey certain restrictions regarding complexity and size. As a result, a smaller number of predictions is performed, and therefore, a lesser number of branches are mispredicted. DCE fetches through selected branches avoiding disruptions in the fetch flow when these branches are fetched. Both paths of selected branches are executed but only the correct path commits. In this thesis we propose an architecture to execute multiple paths of selected branches. Branches are selected based on the size and other conditions. Simple and complex branches can be dynamically predicated without requiring a special instruction set nor special compiler optimizations. Furthermore, a technique to reduce part of the overhead generated by the execution of multiple paths is proposed. The performance achieved reaches levels of up to 12% when comparing a Local predictor used in DCE against a Global predictor used in the reference machine. When both machines use a Local predictor, the speedup is increased by an average of 3-3.5%.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A Execução Condicional Dinâmica (DCE) é uma alternativa para redução dos custos relacionados a desvios previstos incorretamente. A idéia básica é buscar todos os fluxos produzidos por um desvio que obedecem algumas restrições relativas à complexidade e tamanho. Como conseqüência, um número menor de previsões é executado, e assim, um número mais baixo de desvios é incorretamente previsto. Contudo, tal como outras soluções multi-fluxo, o DCE requer uma estrutura de controle mais complexa. Na arquitetura DCE, é observado que várias réplicas da mesma instrução são despachadas para as unidades funcionais, bloqueando recursos que poderiam ser utilizados por outras instruções. Essas réplicas são geradas após o ponto de convergência dos diversos fluxos em execução e são necessárias para garantir a semântica correta entre instruções dependentes de dados. Além disso, o DCE continua produzindo réplicas até que o desvio que gerou os fluxos seja resolvido. Assim, uma seção completa do código pode ser replicado, reduzindo o desempenho. Uma alternativa natural para esse problema é reusar essas seções (ou traços) que são replicadas. O objetivo desse trabalho é analisar e avaliar a efetividade do reuso de valores na arquitetura DCE. Como será apresentado, o princípio do reuso, em diferentes granularidades, pode reduzir efetivamente o problema das réplicas e levar a aumentos de desempenho.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A presente pesquisa foi conduzida na forma de um estudo de caso de duas instituições culturais no contexto francês e brasileiro. O Centro Pompidou é um projeto presidencial de museu financiado pelo Estado, com a missão de tornar a arte moderna em todas as suas expressões acessíveis ao público em geral. O Sesc Pompeia é um centro multidisciplinar de cultura e esporte - financiado pelo dinheiro dos impostos e administrado pela Federação do Comércio . O Sesc Pompéia é dedicado à oferta de educação informal através do cultivo da mente e do corpo. O estudo examina se as teorias de dependência de recursos e de poder podem ser utilizadas para conceituar a relação que o Centro Pompidou e do Sesc Pompéia tem com seus stakeholders financeiros. Mais especificamente, será discutido em que medida o grau de dependência influencia a estratégia de gestão das instituições. O objetivo é de responder a pergunta seguinte: quais são as estratégias que as instituições adotam para reduzir sua dependência com relação a seus principais stakeholders financeiros? Finalmente algumas implicações práticas de gestão serão elaboradas a partir do paralelo entre as estratégias das duas instituições.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

O tema saúde é o centro do debate nacional e internacional acerca da necessidade de evolução das políticas públicas a serem adotadas pelos órgãos públicos. Portanto, o Estado tem obrigação de executar programas que forneçam, a todos, ações concretas voltadas ao resguardo do direito à saúde. Nessa perspectiva, o objetivo da pesquisa é avaliar as implicações dos restos a pagar na gestão da saúde pública de Mato Grosso, nos anos de 2008 a 2014. Nesse intento, a partir de pesquisa documental, bibliográfica e de campo, observou-se que o Estado está inserido em um ciclo vicioso de inscrição de restos a pagar. As despesas represadas no período mantiveram uma dinâmica de evolução, prejudicando a execução financeira dos programas prioritários da saúde mato-grossense. Segundo os dados, a realização financeira programática deixou de ser considerada ótima em 2008, com 92% de realização, para caracterizar-se como regular em 2014, com 66% de execução. Por meio do estudo de caso, identificou-se que não há como Mato Grosso obter resultados excelentes na implementação dos interesses de sua sociedade se o Estado encontra-se com a credibilidade abalada em relação aos credores, por postergar seus compromissos financeiros, sem respeitar, ou ter a capacidade de executar o orçamento aprovado, adquirindo bens e contratando serviços lançando mão de mecanismos emergenciais que elevam o custo da compra pública e potencializam o poder das empresas na execução do orçamento. Além de deteriorar a programação orçamentária e financeira, criando verdadeiros orçamentos paralelos, conclui-se que o excesso de despesas repassadas do exercício em que deveriam ocorrer para os subsequentes, prejudicou a qualidade dos serviços públicos executados na saúde do Estado, dificultando a realização deste direito fundamental, imprescindível à vida.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

One of the challenges presented by the current conjecture in Global Companies is to recognize and understand that the culture and levels in structure of the Power Distance in Organizations in different countries contribute, significantly, toward the failure or success of their strategies. The alignment between the implementation and execution of new strategies for projects intended for the success of the Organization as a whole, rather than as an individual part thereof, is an important step towards reducing the impacts of Power Distance (PDI) on the success of business strategies. A position at odds with this understanding by Companies creates boundaries that increase organizational chasms, also taking into consideration relevant aspects such as, FSAs (Firm-Specific Advantages) and CSAs (Country-Specific Advantages). It is also important that the Organizations based in countries or regions of low Power Distance (PDI) between its individuals be more flexible and prepared to ask and to hear the suggestions from Regional and Local Offices. Thus, the purpose of this study is to highlight the elements of effective strategy implementation considering the relevant aspects at all levels of global corporate culture that justify the influences of power distance when implementing new strategies and also to minimize the impacts of this internal business relationship. This study also recognizes that other corporate and cultural aspects are relevant for the success of business strategies so consider, for instance, the lack of alignment between global and regional/local organizations, the need for competent leadership resources, as well as the challenges that indicate the distance between the hierarchical levels ─ Headquarters and Regional Office ─ as some of the various causes that prevent the successful execution of global strategies. Finally, we show that the execution of the strategy cannot be treated as a construction solely created by the Headquarters or by only one Board and that it needs to be understood as a system aimed at interacting with the surroundings.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Particle Swarm Optimization is a metaheuristic that arose in order to simulate the behavior of a number of birds in flight, with its random movement locally, but globally determined. This technique has been widely used to address non-liner continuous problems and yet little explored in discrete problems. This paper presents the operation of this metaheuristic, and propose strategies for implementation of optimization discret problems as form of execution parallel as sequential. The computational experiments were performed to instances of the TSP, selected in the library TSPLIB contenct to 3038 nodes, showing the improvement of performance of parallel methods for their sequential versions, in executation time and results

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Large efforts have been maden by the scientific community on tasks involving locomotion of mobile robots. To execute this kind of task, we must develop to the robot the ability of navigation through the environment in a safe way, that is, without collisions with the objects. In order to perform this, it is necessary to implement strategies that makes possible to detect obstacles. In this work, we deal with this problem by proposing a system that is able to collect sensory information and to estimate the possibility for obstacles to occur in the mobile robot path. Stereo cameras positioned in parallel to each other in a structure coupled to the robot are employed as the main sensory device, making possible the generation of a disparity map. Code optimizations and a strategy for data reduction and abstraction are applied to the images, resulting in a substantial gain in the execution time. This makes possible to the high level decision processes to execute obstacle deviation in real time. This system can be employed in situations where the robot is remotely operated, as well as in situations where it depends only on itself to generate trajectories (the autonomous case)

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The last years have presented an increase in the acceptance and adoption of the parallel processing, as much for scientific computation of high performance as for applications of general intention. This acceptance has been favored mainly for the development of environments with massive parallel processing (MPP - Massively Parallel Processing) and of the distributed computation. A common point between distributed systems and MPPs architectures is the notion of message exchange, that allows the communication between processes. An environment of message exchange consists basically of a communication library that, acting as an extension of the programming languages that allow to the elaboration of applications parallel, such as C, C++ and Fortran. In the development of applications parallel, a basic aspect is on to the analysis of performance of the same ones. Several can be the metric ones used in this analysis: time of execution, efficiency in the use of the processing elements, scalability of the application with respect to the increase in the number of processors or to the increase of the instance of the treat problem. The establishment of models or mechanisms that allow this analysis can be a task sufficiently complicated considering parameters and involved degrees of freedom in the implementation of the parallel application. An joined alternative has been the use of collection tools and visualization of performance data, that allow the user to identify to points of strangulation and sources of inefficiency in an application. For an efficient visualization one becomes necessary to identify and to collect given relative to the execution of the application, stage this called instrumentation. In this work it is presented, initially, a study of the main techniques used in the collection of the performance data, and after that a detailed analysis of the main available tools is made that can be used in architectures parallel of the type to cluster Beowulf with Linux on X86 platform being used libraries of communication based in applications MPI - Message Passing Interface, such as LAM and MPICH. This analysis is validated on applications parallel bars that deal with the problems of the training of neural nets of the type perceptrons using retro-propagation. The gotten conclusions show to the potentiality and easinesses of the analyzed tools.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Artificial neural networks are usually applied to solve complex problems. In problems with more complexity, by increasing the number of layers and neurons, it is possible to achieve greater functional efficiency. Nevertheless, this leads to a greater computational effort. The response time is an important factor in the decision to use neural networks in some systems. Many argue that the computational cost is higher in the training period. However, this phase is held only once. Once the network trained, it is necessary to use the existing computational resources efficiently. In the multicore era, the problem boils down to efficient use of all available processing cores. However, it is necessary to consider the overhead of parallel computing. In this sense, this paper proposes a modular structure that proved to be more suitable for parallel implementations. It is proposed to parallelize the feedforward process of an RNA-type MLP, implemented with OpenMP on a shared memory computer architecture. The research consistes on testing and analizing execution times. Speedup, efficiency and parallel scalability are analyzed. In the proposed approach, by reducing the number of connections between remote neurons, the response time of the network decreases and, consequently, so does the total execution time. The time required for communication and synchronization is directly linked to the number of remote neurons in the network, and so it is necessary to investigate which one is the best distribution of remote connections

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The seismic method is of extreme importance in geophysics. Mainly associated with oil exploration, this line of research focuses most of all investment in this area. The acquisition, processing and interpretation of seismic data are the parts that instantiate a seismic study. Seismic processing in particular is focused on the imaging that represents the geological structures in subsurface. Seismic processing has evolved significantly in recent decades due to the demands of the oil industry, and also due to the technological advances of hardware that achieved higher storage and digital information processing capabilities, which enabled the development of more sophisticated processing algorithms such as the ones that use of parallel architectures. One of the most important steps in seismic processing is imaging. Migration of seismic data is one of the techniques used for imaging, with the goal of obtaining a seismic section image that represents the geological structures the most accurately and faithfully as possible. The result of migration is a 2D or 3D image which it is possible to identify faults and salt domes among other structures of interest, such as potential hydrocarbon reservoirs. However, a migration fulfilled with quality and accuracy may be a long time consuming process, due to the mathematical algorithm heuristics and the extensive amount of data inputs and outputs involved in this process, which may take days, weeks and even months of uninterrupted execution on the supercomputers, representing large computational and financial costs, that could derail the implementation of these methods. Aiming at performance improvement, this work conducted the core parallelization of a Reverse Time Migration (RTM) algorithm, using the parallel programming model Open Multi-Processing (OpenMP), due to the large computational effort required by this migration technique. Furthermore, analyzes such as speedup, efficiency were performed, and ultimately, the identification of the algorithmic scalability degree with respect to the technological advancement expected by future processors

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper analyzes the performance of a parallel implementation of Coupled Simulated Annealing (CSA) for the unconstrained optimization of continuous variables problems. Parallel processing is an efficient form of information processing with emphasis on exploration of simultaneous events in the execution of software. It arises primarily due to high computational performance demands, and the difficulty in increasing the speed of a single processing core. Despite multicore processors being easily found nowadays, several algorithms are not yet suitable for running on parallel architectures. The algorithm is characterized by a group of Simulated Annealing (SA) optimizers working together on refining the solution. Each SA optimizer runs on a single thread executed by different processors. In the analysis of parallel performance and scalability, these metrics were investigated: the execution time; the speedup of the algorithm with respect to increasing the number of processors; and the efficient use of processing elements with respect to the increasing size of the treated problem. Furthermore, the quality of the final solution was verified. For the study, this paper proposes a parallel version of CSA and its equivalent serial version. Both algorithms were analysed on 14 benchmark functions. For each of these functions, the CSA is evaluated using 2-24 optimizers. The results obtained are shown and discussed observing the analysis of the metrics. The conclusions of the paper characterize the CSA as a good parallel algorithm, both in the quality of the solutions and the parallel scalability and parallel efficiency