935 resultados para parallel systems


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Communication and synchronization stand as the dual bottlenecks in the performance of parallel systems, and especially those that attempt to alleviate the programming burden by incurring overhead in these two domains. We formulate the notions of communicable memory and lazy barriers to help achieve efficient communication and synchronization. These concepts are developed in the context of BSPk, a toolkit library for programming networks of workstations|and other distributed memory architectures in general|based on the Bulk Synchronous Parallel (BSP) model. BSPk emphasizes efficiency in communication by minimizing local memory-to-memory copying, and in barrier synchronization by not forcing a process to wait unless it needs remote data. Both the message passing (MP) and distributed shared memory (DSM) programming styles are supported in BSPk. MP helps processes efficiently exchange short-lived unnamed data values, when the identity of either the sender or receiver is known to the other party. By contrast, DSM supports communication between processes that may be mutually anonymous, so long as they can agree on variable names in which to store shared temporary or long-lived data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This article describes further evidence for a new neural network theory of biological motion perception that is called a Motion Boundary Contour System. This theory clarifies why parallel streams Vl-> V2 and Vl-> MT exist for static form and motion form processing among the areas Vl, V2, and MT of visual cortex. The Motion Boundary Contour System consists of several parallel copies, such that each copy is activated by a different range of receptive field sizes. Each copy is further subdivided into two hierarchically organized subsystems: a Motion Oriented Contrast Filter, or MOC Filter, for preprocessing moving images; and a Cooperative-Competitive Feedback Loop, or CC Loop, for generating emergent boundary segmentations of the filtered signals. The present article uses the MOC Filter to explain a variety of classical and recent data about short-range and long-range apparent motion percepts that have not yet been explained by alternative models. These data include split motion; reverse-contrast gamma motion; delta motion; visual inertia; group motion in response to a reverse-contrast Ternus display at short interstimulus intervals; speed-up of motion velocity as interfiash distance increases or flash duration decreases; dependence of the transition from element motion to group motion on stimulus duration and size; various classical dependencies between flash duration, spatial separation, interstimulus interval, and motion threshold known as Korte's Laws; and dependence of motion strength on stimulus orientation and spatial frequency. These results supplement earlier explanations by the model of apparent motion data that other models have not explained; a recent proposed solution of the global aperture problem, including explanations of motion capture and induced motion; an explanation of how parallel cortical systems for static form perception and motion form perception may develop, including a demonstration that these parallel systems are variations on a common cortical design; an explanation of why the geometries of static form and motion form differ, in particular why opposite orientations differ by 90°, whereas opposite directions differ by 180°, and why a cortical stream Vl -> V2 -> MT is needed; and a summary of how the main properties of other motion perception models can be assimilated into different parts of the Motion Boundary Contour System design.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper discusses load-balancing issues when using heterogeneous cluster computers. There is a growing trend towards the use of commodity microprocessor clusters. Although today's microprocessors have reached a theoretical peak performance in the range of one GFLOPS/s, heterogeneous clusters of commodity processors are amongst the most challenging parallel systems to programme efficiently. We will outline an approach for optimising the performance of parallel mesh-based applications for heterogeneous cluster computers and present case studies with the GeoFEM code. The focus is on application cost monitoring and load balancing using the DRAMA library.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Computational modelling of dynamic fluid–structure interaction (DFSI) is a considerable challenge. Our approach to this class of problems involves the use of a single software framework for all the phenomena involved, employing finite volume methods on unstructured meshes in three dimensions. This method enables time and space accurate calculations in a consistent manner. One key application of DFSI simulation is the analysis of the onset of flutter in aircraft wings, where the work of Yates et al. [Measured and Calculated Subsonic and Transonic Flutter Characteristics of a 45° degree Sweptback Wing Planform in Air and Freon-12 in the Langley Transonic Dynamic Tunnel. NASA Technical Note D-1616, 1963] on the AGARD 445.6 wing planform still provides the most comprehensive benchmark data available. This paper presents the results of a significant effort to model the onset of flutter for the AGARD 445.6 wing planform geometry. A series of key issues needs to be addressed for this computational approach. • The advantage of using a single mesh, in order to eliminate numerical problems when applying boundary conditions at the fluid-structure interface, is counteracted by the challenge of generating a suitably high quality mesh in both the fluid and structural domains. • The computational effort for this DFSI procedure, in terms of run time and memory requirements, is very significant. Practical simulations require even finer meshes and shorter time steps, requiring parallel implementation for operation on large, high performance parallel systems. • The consistency and completeness of the AGARD data in the public domain is inadequate for use in the validation of DFSI codes when predicting the onset of flutter.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This chapter discusses the code parallelization environment, where a number of tools that address the main tasks, such as code parallelization, debugging, and optimization are available. The parallelization tools include ParaWise and CAPO, which enable the near automatic parallelization of real world scientific application codes for shared and distributed memory-based parallel systems. The chapter discusses the use of ParaWise and CAPO to transform the original serial code into an equivalent parallel code that contains appropriate OpenMP directives. Additionally, as user involvement can introduce errors, a relative debugging tool (P2d2) is also available and can be used to perform near automatic relative debugging of an OpenMP program that has been parallelized either using the tools or manually. In order for these tools to be effective in parallelizing a range of applications, a high quality fully inter-procedural dependence analysis, as well as user interaction is vital to the generation of efficient parallel code and in the optimization of the backtracking and speculation process used in relative debugging. Results of parallelized NASA codes are discussed and show the benefits of using the environment.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Despite the apparent simplicity of the OpenMP directive shared memory programming model and the sophisticated dependence analysis and code generation capabilities of the ParaWise/CAPO tools, experience shows that a level of expertise is required to produce efficient parallel code. In a real world application the investigation of a single loop in a generated parallel code can soon become an in-depth inspection of numerous dependencies in many routines. The additional understanding of dependencies is also needed to effectively interpret the information provided and supply the required feedback. The ParaWise Expert Assistant has been developed to automate this investigation and present questions to the user about, and in the context of, their application code. In this paper, we demonstrate that knowledge of dependence information and OpenMP are no longer essential to produce efficient parallel code with the Expert Assistant. It is hoped that this will enable a far wider audience to use the tools and subsequently, exploit the benefits of large parallel systems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The parallelization of real-world compute intensive Fortran application codes is generally not a trivial task. If the time to complete the parallelization is to be significantly reduced then an environment is needed that will assist the programmer in the various tasks of code parallelization. In this paper the authors present a code parallelization environment where a number of tools that address the main tasks such as code parallelization, debugging and optimization are available. The ParaWise and CAPO parallelization tools are discussed which enable the near automatic parallelization of real-world scientific application codes for shared and distributed memory-based parallel systems. As user involvement in the parallelization process can introduce errors, a relative debugging tool (P2d2) is also available and can be used to perform nearly automatic relative debugging of a program that has been parallelized using the tools. A high quality interprocedural dependence analysis as well as user-tool interaction are also highlighted and are vital to the generation of efficient parallel code and in the optimization of the backtracking and speculation process used in relative debugging. Results of benchmark and real-world application codes parallelized are presented and show the benefits of using the environment

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Hybrid multiprocessor architectures which combine re-configurable computing and multiprocessors on a chip are being proposed to transcend the performance of standard multi-core parallel systems. Both fine-grained and coarse-grained parallel algorithm implementations are feasible in such hybrid frameworks. A compositional strategy for designing fine-grained multi-phase regular processor arrays to target hybrid architectures is presented in this paper. The method is based on deriving component designs using classical regular array techniques and composing the components into a unified global design. Effective designs with phase-changes and data routing at run-time are characteristics of these designs. In order to describe the data transfer between phases, the concept of communication domain is introduced so that the producer–consumer relationship arising from multi-phase computation can be treated in a unified way as a data routing phase. This technique is applied to derive new designs of multi-phase regular arrays with different dataflow between phases of computation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Estuaries are coastal environments ephemeral life in geological time, derived from the drowning of the shoreline as a function of elevation relative sea level. Such parallel systems is characterized by having two sources of sediment, the river and the sea. The study area comprises the Acu River estuary, located on the northern coast of Rio Grande do Norte State, in a region of intense economic activity, mainly focused on the exploration of oil onshore and offshore, likely to accidental spills. In the oil sector are developed for salt production, shrimp farming, agriculture, fisheries and tourism, which by interacting with sensitive ecosystems, such as estuaries, may alter the natural conditions, thus making it an area susceptible to contamination is essential in understanding the morphodynamic variables that occur in this environment to obtain an environmental license. Information about the submarine relief the estuaries are of great importance for the planning of the activity of environmental monitoring, development and coastal systems, among others, allowing an easy management of risk areas, and assist in the creation of thematic maps of the main aspects of landscape. Morphodynamic studies were performed in this estuary in different seasonal periods in 2009 to observe and quantify morphological changes that have occurred and relate these to the hydrodynamic forcing from the river and its interaction with the tides. Thus, efforts in this area is possible to know the bottom morphology through records of good quality equipment acquired by high resolution geophysical (side-scan sonar and profiler current by doppler effect). The combination of these data enabled the identification of different forms of bed for the winter and summer that were framed in a lower flow regime and later may have been destroyed or modified forms of generating fund scheme than the number according Froude, with different characteristics due mainly to the variation of the depth and type of sedimentary material they are made, and other hydrodynamic parameters. Thus, these features background regions are printed in the channel, sandy banks and muddy plains that border the entire area

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Although cluster environments have an enormous potential processing power, real applications that take advantage of this power remain an elusive goal. This is due, in part, to the lack of understanding about the characteristics of the applications best suited for these environments. This paper focuses on Master/Slave applications for large heterogeneous clusters. It defines application, cluster and execution models to derive an analytic expression for the execution time. It defines speedup and derives speedup bounds based on the inherent parallelism of the application and the aggregated computing power of the cluster. The paper derives an analytical expression for efficiency and uses it to define scalability of the algorithm-cluster combination based on the isoefficiency metric. Furthermore, the paper establishes necessary and sufficient conditions for an algorithm-cluster combination to be scalable which are easy to verify and use in practice. Finally, it covers the impact of network contention as the number of processors grow. (C) 2007 Elsevier B.V. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Pós-graduação em Engenharia Elétrica - FEIS

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Performance studies of actual parallel systems usually tend to concéntrate on the effectiveness of a given implementation. This is often done in the absolute, without quantitave reference to the potential parallelism contained in the programs from the point of view of the execution paradigm. We feel that studying the parallelism inherent to the programs is interesting, as it gives information about the best possible behavior of any implementation and thus allows contrasting the results obtained. We propose a method for obtaining ideal speedups for programs through a combination of sequential or parallel execution and simulation, and the algorithms that allow implementing the method. Our approach is novel and, we argüe, more accurate than previously proposed methods, in that a crucial part of the data - the execution times of tasks - is obtained from actual executions, while speedup is computed by simulation. This allows obtaining speedup (and other) data under controlled and ideal assumptions regarding issues such as number of processor, scheduling algorithm and overheads, etc. The results obtained can be used for example to evalúate the ideal parallelism that a program contains for a given model of execution and to compare such "perfect" parallelism to that obtained by a given implementation of that model. We also present a tool, IDRA, which implements the proposed method, and results obtained with IDRA for benchmark programs, which are then compared with those obtained in actual executions on real parallel systems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we examine the issue of memory management in the parallel execution of logic programs. We concentrate on non-deterministic and-parallel schemes which we believe present a relatively general set of problems to be solved, including most of those encountered in the memory management of or-parallel systems. We present a distributed stack memory management model which allows flexible scheduling of goals. Previously proposed models (based on the "Marker model") are lacking in that they impose restrictions on the selection of goals to be executed or they may require consume a large amount of virtual memory. This paper first presents results which imply that the above mentioned shortcomings can have significant performance impacts. An extension of the Marker Model is then proposed which allows flexible scheduling of goals while keeping (virtual) memory consumption down. Measurements are presented which show the advantage of this solution. Methods for handling forward and backward execution, cut and roll back are discussed in the context of the proposed scheme. In addition, the paper shows how the same mechanism for flexible scheduling can be applied to allow the efficient handling of the very general form of suspension that can occur in systems which combine several types of and-parallelism and more sophisticated methods of executing logic programs. We believe that the results are applicable to many and- and or-parallel systems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-06

Relevância:

60.00% 60.00%

Publicador:

Resumo:

As the efficiency of parallel software increases it is becoming common to measure near linear speedup for many applications. For a problem size N on P processors then with software running at O(N=P ) the performance restrictions due to file i/o systems and mesh decomposition running at O(N) become increasingly apparent especially for large P . For distributed memory parallel systems an additional limit to scalability results from the finite memory size available for i/o scatter/gather operations. Simple strategies developed to address the scalability of scatter/gather operations for unstructured mesh based applications have been extended to provide scalable mesh decomposition through the development of a parallel graph partitioning code, JOSTLE [8]. The focus of this work is directed towards the development of generic strategies that can be incorporated into the Computer Aided Parallelisation Tools (CAPTools) project.