41 resultados para parallel systems
Resumo:
This paper presents a paralleled Two-Pass Hexagonal (TPA) algorithm constituted by Linear Hashtable Motion Estimation Algorithm (LHMEA) and Hexagonal Search (HEXBS) for motion estimation. In the TPA, Motion Vectors (MV) are generated from the first-pass LHMEA and are used as predictors for second-pass HEXBS motion estimation, which only searches a small number of Macroblocks (MBs). We introduced hashtable into video processing and completed parallel implementation. We propose and evaluate parallel implementations of the LHMEA of TPA on clusters of workstations for real time video compression. It discusses how parallel video coding on load balanced multiprocessor systems can help, especially on motion estimation. The effect of load balancing for improved performance is discussed. The performance of the algorithm is evaluated by using standard video sequences and the results are compared to current algorithms.
Resumo:
Space applications demand the need for building reliable systems. Autonomic computing defines such reliable systems as self-managing systems. The work reported in this paper combines agent based and swarm robotic approaches leading to swarm-array computing, a novel technique to achieve autonomy for distributed parallel computing systems. Two swarm-array computing approaches based on swarms of computational resources and swarms of tasks are explored. FPGA is considered as the computing system. The feasibility of the two proposed approaches that binds the computing system and the task together is simulated on the SeSAm multi-agent simulator.
Resumo:
In this paper we consider hybrid (fast stochastic approximation and deterministic refinement) algorithms for Matrix Inversion (MI) and Solving Systems of Linear Equations (SLAE). Monte Carlo methods are used for the stochastic approximation, since it is known that they are very efficient in finding a quick rough approximation of the element or a row of the inverse matrix or finding a component of the solution vector. We show how the stochastic approximation of the MI can be combined with a deterministic refinement procedure to obtain MI with the required precision and further solve the SLAE using MI. We employ a splitting A = D – C of a given non-singular matrix A, where D is a diagonal dominant matrix and matrix C is a diagonal matrix. In our algorithm for solving SLAE and MI different choices of D can be considered in order to control the norm of matrix T = D –1C, of the resulting SLAE and to minimize the number of the Markov Chains required to reach given precision. Further we run the algorithms on a mini-Grid and investigate their efficiency depending on the granularity. Corresponding experimental results are presented.
Resumo:
In this paper we introduce a new algorithm, based on the successful work of Fathi and Alexandrov, on hybrid Monte Carlo algorithms for matrix inversion and solving systems of linear algebraic equations. This algorithm consists of two parts, approximate inversion by Monte Carlo and iterative refinement using a deterministic method. Here we present a parallel hybrid Monte Carlo algorithm, which uses Monte Carlo to generate an approximate inverse and that improves the accuracy of the inverse with an iterative refinement. The new algorithm is applied efficiently to sparse non-singular matrices. When we are solving a system of linear algebraic equations, Bx = b, the inverse matrix is used to compute the solution vector x = B(-1)b. We present results that show the efficiency of the parallel hybrid Monte Carlo algorithm in the case of sparse matrices.
Resumo:
In models of complicated physical-chemical processes operator splitting is very often applied in order to achieve sufficient accuracy as well as efficiency of the numerical solution. The recently rediscovered weighted splitting schemes have the great advantage of being parallelizable on operator level, which allows us to reduce the computational time if parallel computers are used. In this paper, the computational times needed for the weighted splitting methods are studied in comparison with the sequential (S) splitting and the Marchuk-Strang (MSt) splitting and are illustrated by numerical experiments performed by use of simplified versions of the Danish Eulerian model (DEM).
Resumo:
Since its introduction in 1993, the Message Passing Interface (MPI) has become a de facto standard for writing High Performance Computing (HPC) applications on clusters and Massively Parallel Processors (MPPs). The recent emergence of multi-core processor systems presents a new challenge for established parallel programming paradigms, including those based on MPI. This paper presents a new Java messaging system called MPJ Express. Using this system, we exploit multiple levels of parallelism - messaging and threading - to improve application performance on multi-core processors. We refer to our approach as nested parallelism. This MPI-like Java library can support nested parallelism by using Java or Java OpenMP (JOMP) threads within an MPJ Express process. Practicality of this approach is assessed by porting to Java a massively parallel structure formation code from Cosmology called Gadget-2. We introduce nested parallelism in the Java version of the simulation code and report good speed-ups. To the best of our knowledge it is the first time this kind of hybrid parallelism is demonstrated in a high performance Java application. (C) 2009 Elsevier Inc. All rights reserved.
Resumo:
The work reported in this paper is motivated by the fact that there is a need to apply autonomic computing concepts to parallel computing systems. Advancing on prior work based on intelligent cores [36], a swarm-array computing approach, this paper focuses on ‘Intelligent agents’ another swarm-array computing approach in which the task to be executed on a parallel computing core is considered as a swarm of autonomous agents. A task is carried to a computing core by carrier agents and is seamlessly transferred between cores in the event of a predicted failure, thereby achieving self-ware objectives of autonomic computing. The feasibility of the proposed swarm-array computing approach is validated on a multi-agent simulator.
Resumo:
This paper is concerned with the uniformization of a system of afine recurrence equations. This transformation is used in the design (or compilation) of highly parallel embedded systems (VLSI systolic arrays, signal processing filters, etc.). In this paper, we present and implement an automatic system to achieve uniformization of systems of afine recurrence equations. We unify the results from many earlier papers, develop some theoretical extensions, and then propose effective uniformization algorithms. Our results can be used in any high level synthesis tool based on polyhedral representation of nested loop computations.
Resumo:
Recent research in multi-agent systems incorporate fault tolerance concepts. However, the research does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely ‘Intelligent Agents’. In the approach considered a task to be executed on a parallel computing system is decomposed to sub-tasks and mapped onto agents that traverse an abstracted hardware layer. The agents intercommunicate across processors to share information during the event of a predicted core/processor failure and for successfully completing the task. The agents hence contribute towards fault tolerance and towards building reliable systems. The feasibility of the approach is validated by simulations on an FPGA using a multi-agent simulator and implementation of a parallel reduction algorithm on a computer cluster using the Message Passing Interface.
Resumo:
A connection between a fuzzy neural network model with the mixture of experts network (MEN) modelling approach is established. Based on this linkage, two new neuro-fuzzy MEN construction algorithms are proposed to overcome the curse of dimensionality that is inherent in the majority of associative memory networks and/or other rule based systems. The first construction algorithm employs a function selection manager module in an MEN system. The second construction algorithm is based on a new parallel learning algorithm in which each model rule is trained independently, for which the parameter convergence property of the new learning method is established. As with the first approach, an expert selection criterion is utilised in this algorithm. These two construction methods are equivalent in their effectiveness in overcoming the curse of dimensionality by reducing the dimensionality of the regression vector, but the latter has the additional computational advantage of parallel processing. The proposed algorithms are analysed for effectiveness followed by numerical examples to illustrate their efficacy for some difficult data based modelling problems.
Resumo:
One major assumption in all orthogonal space-time block coding (O-STBC) schemes is that the channel remains static over the length of the code word. However, time-selective fading channels do exist, and in such case conventional O-STBC detectors can suffer from a large error floor in the high signal-to-noise ratio (SNR) cases. As a sequel to the authors' previous papers on this subject, this paper aims to eliminate the error floor of the H(i)-coded O-STBC system (i = 3 and 4) by employing the techniques of: 1) zero forcing (ZF) and 2) parallel interference cancellation (PIC). It is. shown that for an H(i)-coded system the PIC is a much better choice than the ZF in terms of both performance and computational complexity. Compared with the, conventional H(i) detector, the PIC detector incurs a moderately higher computational complexity, but this can well be justified by the enormous improvement.
Resumo:
One major assumption in all orthogonal space-time block coding (O-STBC) schemes is that the channel remains static over the entire length of the codeword. However, time selective fading channels do exist, and in such case the conventional O-STBC detectors can suffer from a large error floor in the high signal-to-noise ratio (SNR) cases. This paper addresses such an issue by introducing a parallel interference cancellation (PIC) based detector for the Gi coded systems (i=3 and 4).
Resumo:
All the orthogonal space-time block coding (O-STBC) schemes are based on the following assumption: the channel remains static over the entire length of the codeword. However, time selective fading channels do exist, and in many cases the conventional O-STBC detectors can suffer from a large error floor in the high signal-to-noise ratio (SNR) cases. This paper addresses such an issue by introducing a parallel interference cancellation (PIC) based detector for the Gi coded systems (i=3 and 4).
Resumo:
This paper proposes the subspace-based space-time (ST) dual-rate blind linear detectors for synchronous DS/CDMA systems, which can be viewed as the ST extension of our previously presented purely temporal dual-rate blind linear detectors. The theoretical analyses on their performances are also carried out. Finally, the two-stage ST blind detectors are presented, which combine the adaptive purely temporal dual-rate blind MMSE filters with the non-adaptive beamformer. Their adaptive stages with parallel structure converge much faster than the corresponding adaptive ST dual-rate blind MMSE detectors, while having a comparable computational complexity to the latter.