872 resultados para Parallel Computations
Resumo:
In this paper, we introduce an analytical technique based on queueing networks and Petri nets for making a performance analysis of dataflow computations when executed on the Manchester machine. This technique is also applicable for the analysis of parallel computations on multiprocessors. We characterize the parallelism in dataflow computations through a four-parameter characterization, namely, the minimum parallelism, the maximum parallelism, the average parallelism and the variance in parallelism. We observe through detailed investigation of our analytical models that the average parallelism is a good characterization of the dataflow computations only as long as the variance in parallelism is small. However, significant difference in performance measures will result when the variance in parallelism is comparable to or higher than the average parallelism.
Resumo:
We describe an approach aimed at addressing the issue of joint exploitation of control (stream) and data parallelism in a skeleton based parallel programming environment, based on annotations and refactoring. Annotations drive efficient implementation of a parallel computation. Refactoring is used to transform the associated skeleton tree into a more efficient, functionally equivalent skeleton tree. In most cases, cost models are used to drive the refactoring process. We show how sample use case applications/kernels may be optimized and discuss preliminary experiments with FastFlow assessing the theoretical results. © 2013 Springer-Verlag.
Resumo:
For communication-intensive parallel applications, the maximum degree of concurrency achievable is limited by the communication throughput made available by the network. In previous work [HPS94], we showed experimentally that the performance of certain parallel applications running on a workstation network can be improved significantly if a congestion control protocol is used to enhance network performance. In this paper, we characterize and analyze the communication requirements of a large class of supercomputing applications that fall under the category of fixed-point problems, amenable to solution by parallel iterative methods. This results in a set of interface and architectural features sufficient for the efficient implementation of the applications over a large-scale distributed system. In particular, we propose a direct link between the application and network layer, supporting congestion control actions at both ends. This in turn enhances the system's responsiveness to network congestion, improving performance. Measurements are given showing the efficacy of our scheme to support large-scale parallel computations.
Resumo:
We propose a methodology for optimizing the execution of data parallel (sub-)tasks on CPU and GPU cores of the same heterogeneous architecture. The methodology is based on two main components: i) an analytical performance model for scheduling tasks among CPU and GPU cores, such that the global execution time of the overall data parallel pattern is optimized; and ii) an autonomic module which uses the analytical performance model to implement the data parallel computations in a completely autonomic way, requiring no programmer intervention to optimize the computation across CPU and GPU cores. The analytical performance model uses a small set of simple parameters to devise a partitioning-between CPU and GPU cores-of the tasks derived from structured data parallel patterns/algorithmic skeletons. The model takes into account both hardware related and application dependent parameters. It computes the percentage of tasks to be executed on CPU and GPU cores such that both kinds of cores are exploited and performance figures are optimized. The autonomic module, implemented in FastFlow, executes a generic map (reduce) data parallel pattern scheduling part of the tasks to the GPU and part to CPU cores so as to achieve optimal execution time. Experimental results on state-of-the-art CPU/GPU architectures are shown that assess both performance model properties and autonomic module effectiveness. © 2013 IEEE.
Resumo:
In this work we show how automatic relative debugging can be used to find differences in computation between a correct serial program and an OpenMP parallel version of that program that does not yield correct results. Backtracking and re-execution are used to determine the first OpenMP parallel region that produces a difference in computation that may lead to an incorrect value the user has indicated. Our approach also lends itself to finding differences between parallel computations, where executing with M threads produces expected results but an N thread execution does not (M, N > 1, M ≠ N). OpenMP programs created using a parallelization tool are addressed by utilizing static analysis and directive information from the tool. Hand-parallelized programs, where OpenMP directives are inserted by the user, are addressed by performing data dependence and directive analysis.
Resumo:
When implementing autonomic management of multiple non-functional concerns a trade-off must be found between the ability to develop independently management of the individual concerns (following the separation of concerns principle) and the detection and resolution of conflicts that may arise when combining the independently developed management code. Here we discuss strategies to establish this trade-off and introduce a model checking based methodology aimed at simplifying the discovery and handling of conflicts arising from deployment-within the same parallel application-of independently developed management policies. Preliminary results are shown demonstrating the feasibility of the approach.
Resumo:
The motion instability is an important issue that occurs during the operation of towed underwater vehicles (TUV), which considerably affects the accuracy of high precision acoustic instrumentations housed inside the same. Out of the various parameters responsible for this, the disturbances from the tow-ship are the most significant one. The present study focus on the motion dynamics of an underwater towing system with ship induced disturbances as the input. The study focus on an innovative system called two-part towing. The methodology involves numerical modeling of the tow system, which consists of modeling of the tow-cables and vehicles formulation. Previous study in this direction used a segmental approach for the modeling of the cable. Even though, the model was successful in predicting the heave response of the tow-body, instabilities were observed in the numerical solution. The present study devises a simple approach called lumped mass spring model (LMSM) for the cable formulation. In this work, the traditional LMSM has been modified in two ways. First, by implementing advanced time integration procedures and secondly, use of a modified beam model which uses only translational degrees of freedoms for solving beam equation. A number of time integration procedures, such as Euler, Houbolt, Newmark and HHT-α were implemented in the traditional LMSM and the strength and weakness of each scheme were numerically estimated. In most of the previous studies, hydrodynamic forces acting on the tow-system such as drag and lift etc. are approximated as analytical expression of velocities. This approach restricts these models to use simple cylindrical shaped towed bodies and may not be applicable modern tow systems which are diversed in shape and complexity. Hence, this particular study, hydrodynamic parameters such as drag and lift of the tow-system are estimated using CFD techniques. To achieve this, a RANS based CFD code has been developed. Further, a new convection interpolation scheme for CFD simulation, called BNCUS, which is blend of cell based and node based formulation, was proposed in the study and numerically tested. To account for the fact that simulation takes considerable time in solving fluid dynamic equations, a dedicated parallel computing setup has been developed. Two types of computational parallelisms are explored in the current study, viz; the model for shared memory processors and distributed memory processors. In the present study, shared memory model was used for structural dynamic analysis of towing system, distributed memory one was devised in solving fluid dynamic equations.
Resumo:
n this paper we propose the use of Networks of Bio-inspired Processors (NBP) to model some biological phenomena within a computational framework. In particular, we propose the use of an extension of NBP named Network Evolutionary Processors Transducers to simulate chemical transformations of substances. Within a biological process, chemical transformations of substances are basic operations in the change of the state of the cell. Previously, it has been proved that NBP are computationally complete, that is, they are able to solve NP complete problems in linear time, using massively parallel computations. In addition, we propose a multilayer architecture that will allow us to design models of biological processes related to cellular communication as well as their implications in the metabolic pathways. Subsequently, these models can be applied not only to biological-cellular instances but, possibly, also to configure instances of interactive processes in many other fields like population interactions, ecological trophic networks, in dustrial ecosystems, etc.
Resumo:
A new method for solving some hard combinatorial optimization problems is suggested, admitting a certain reformulation. Considering such a problem, several different similar problems are prepared which have the same set of solutions. They are solved on computer in parallel until one of them will be solved, and that solution is accepted. Notwithstanding the evident overhead, the whole run-time could be significantly reduced due to dispersion of velocities of combinatorial search in regarded cases. The efficiency of this approach is investigated on the concrete problem of finding short solutions of non-deterministic system of linear logical equations.
Resumo:
The paper describes an extension of the cognitive architecture DUAL with a model of visual attention and perception. The goal of this attempt is to account for the construction and the categorization of object and scene representations derived from visual stimuli in the TextWorld microdomain. Low-level parallel computations are combined with an active serial deployment of visual attention enabling the construction of abstract symbolic representations. A limited-capacity short-term visual store holding information across attention shifts forms the core of the model interfacing between the low-level representation of the stimulus and DUAL’s semantic memory. The model is validated by comparing the results of a simulation with real data from an eye movement experiment with human subjects.
Resumo:
Parallel execution of computational mechanics codes requires efficient mesh-partitioning techniques. These mesh-partitioning techniques divide the mesh into specified number of submeshes of approximately the same size and at the same time, minimise the interface nodes of the submeshes. This paper describes a new mesh partitioning technique, employing Genetic Algorithms. The proposed algorithm operates on the deduced graph (dual or nodal graph) of the given finite element mesh rather than directly on the mesh itself. The algorithm works by first constructing a coarse graph approximation using an automatic graph coarsening method. The coarse graph is partitioned and the results are interpolated onto the original graph to initialise an optimisation of the graph partition problem. In practice, hierarchy of (usually more than two) graphs are used to obtain the final graph partition. The proposed partitioning algorithm is applied to graphs derived from unstructured finite element meshes describing practical engineering problems and also several example graphs related to finite element meshes given in the literature. The test results indicate that the proposed GA based graph partitioning algorithm generates high quality partitions and are superior to spectral and multilevel graph partitioning algorithms.
Resumo:
In this paper we consider hybrid (fast stochastic approximation and deterministic refinement) algorithms for Matrix Inversion (MI) and Solving Systems of Linear Equations (SLAE). Monte Carlo methods are used for the stochastic approximation, since it is known that they are very efficient in finding a quick rough approximation of the element or a row of the inverse matrix or finding a component of the solution vector. We show how the stochastic approximation of the MI can be combined with a deterministic refinement procedure to obtain MI with the required precision and further solve the SLAE using MI. We employ a splitting A = D – C of a given non-singular matrix A, where D is a diagonal dominant matrix and matrix C is a diagonal matrix. In our algorithm for solving SLAE and MI different choices of D can be considered in order to control the norm of matrix T = D –1C, of the resulting SLAE and to minimize the number of the Markov Chains required to reach given precision. Further we run the algorithms on a mini-Grid and investigate their efficiency depending on the granularity. Corresponding experimental results are presented.
Resumo:
In this paper we introduce a new algorithm, based on the successful work of Fathi and Alexandrov, on hybrid Monte Carlo algorithms for matrix inversion and solving systems of linear algebraic equations. This algorithm consists of two parts, approximate inversion by Monte Carlo and iterative refinement using a deterministic method. Here we present a parallel hybrid Monte Carlo algorithm, which uses Monte Carlo to generate an approximate inverse and that improves the accuracy of the inverse with an iterative refinement. The new algorithm is applied efficiently to sparse non-singular matrices. When we are solving a system of linear algebraic equations, Bx = b, the inverse matrix is used to compute the solution vector x = B(-1)b. We present results that show the efficiency of the parallel hybrid Monte Carlo algorithm in the case of sparse matrices.