967 resultados para Parallel or distributed processing


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Recently, two approaches have been introduced that distribute the molecular fragment mining problem. The first approach applies a master/worker topology, the second approach, a completely distributed peer-to-peer system, solves the scalability problem due to the bottleneck at the master node. However, in many real world scenarios the participating computing nodes cannot communicate directly due to administrative policies such as security restrictions. Thus, potential computing power is not accessible to accelerate the mining run. To solve this shortcoming, this work introduces a hierarchical topology of computing resources, which distributes the management over several levels and adapts to the natural structure of those multi-domain architectures. The most important aspect is the load balancing scheme, which has been designed and optimized for the hierarchical structure. The approach allows dynamic aggregation of heterogenous computing resources and is applied to wide area network scenarios.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper presents a paralleled Two-Pass Hexagonal (TPA) algorithm constituted by Linear Hashtable Motion Estimation Algorithm (LHMEA) and Hexagonal Search (HEXBS) for motion estimation. In the TPA., Motion Vectors (MV) are generated from the first-pass LHMEA and are used as predictors for second-pass HEXBS motion estimation, which only searches a small number of Macroblocks (MBs). We introduced hashtable into video processing and completed parallel implementation. We propose and evaluate parallel implementations of the LHMEA of TPA on clusters of workstations for real time video compression. It discusses how parallel video coding on load balanced multiprocessor systems can help, especially on motion estimation. The effect of load balancing for improved performance is discussed. The performance or the algorithm is evaluated by using standard video sequences and the results are compared to current algorithms.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper presents a paralleled Two-Pass Hexagonal (TPA) algorithm constituted by Linear Hashtable Motion Estimation Algorithm (LHMEA) and Hexagonal Search (HEXBS) for motion estimation. In the TPA, Motion Vectors (MV) are generated from the first-pass LHMEA and are used as predictors for second-pass HEXBS motion estimation, which only searches a small number of Macroblocks (MBs). We introduced hashtable into video processing and completed parallel implementation. We propose and evaluate parallel implementations of the LHMEA of TPA on clusters of workstations for real time video compression. It discusses how parallel video coding on load balanced multiprocessor systems can help, especially on motion estimation. The effect of load balancing for improved performance is discussed. The performance of the algorithm is evaluated by using standard video sequences and the results are compared to current algorithms.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper presents an improved parallel Two-Pass Hexagonal (TPA) algorithm constituted by Linear Hashtable Motion Estimation Algorithm (LHMEA) and Hexagonal Search (HEXBS) for motion estimation. Motion Vectors (MV) are generated from the first-pass LHMEA and used as predictors for second-pass HEXBS motion estimation, which only searches a small number of Macroblocks (MBs). We used bashtable into video processing and completed parallel implementation. The hashtable structure of LHMEA is improved compared to the original TPA and LHMEA. We propose and evaluate parallel implementations of the LHMEA of TPA on clusters of workstations for real time video compression. The implementation contains spatial and temporal approaches. The performance of the algorithm is evaluated by using standard video sequences and the results are compared to current algorithms.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The sampling of certain solid angle is a fundamental operation in realistic image synthesis, where the rendering equation describing the light propagation in closed domains is solved. Monte Carlo methods for solving the rendering equation use sampling of the solid angle subtended by unit hemisphere or unit sphere in order to perform the numerical integration of the rendering equation. In this work we consider the problem for generation of uniformly distributed random samples over hemisphere and sphere. Our aim is to construct and study the parallel sampling scheme for hemisphere and sphere. First we apply the symmetry property for partitioning of hemisphere and sphere. The domain of solid angle subtended by a hemisphere is divided into a number of equal sub-domains. Each sub-domain represents solid angle subtended by orthogonal spherical triangle with fixed vertices and computable parameters. Then we introduce two new algorithms for sampling of orthogonal spherical triangles. Both algorithms are based on a transformation of the unit square. Similarly to the Arvo's algorithm for sampling of arbitrary spherical triangle the suggested algorithms accommodate the stratified sampling. We derive the necessary transformations for the algorithms. The first sampling algorithm generates a sample by mapping of the unit square onto orthogonal spherical triangle. The second algorithm directly compute the unit radius vector of a sampling point inside to the orthogonal spherical triangle. The sampling of total hemisphere and sphere is performed in parallel for all sub-domains simultaneously by using the symmetry property of partitioning. The applicability of the corresponding parallel sampling scheme for Monte Carlo and Quasi-D/lonte Carlo solving of rendering equation is discussed.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The popularity of wireless local area networks (WLANs) has resulted in their dense deployment in many cities around the world. The increased interference among different WLANs severely degrades the throughput achievable. This problem has been further exacerbated by the limited number of frequency channels available. An improved distributed and dynamic channel assignment scheme that is simple to implement and does not depend on the knowledge of the throughput function is proposed in this work. It also allows each access point (AP) to asynchronously switch to the new best channel. Simulation results show that our proposed scheme converges much faster than similar previously reported work, with a reduction in convergence time and channel switches as much as 77.3% and 52.3% respectively. When it is employed in dynamic environments, the throughput improves by up to 12.7%.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A parallel interference cancellation (PIC) detection scheme is proposed to suppress the impact of imperfect synchronisation. By treating as interference the extra components in the received signal caused by timing misalignment, the PIC detector not only offers much improved performance but also retains a low structural and computational complexity.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Most research on D-STBC has assumed that cooperative relay nodes are perfectly synchronised. Since such an assumption is difficult to achieve in many practical systems, this paper proposes a simple yet optimum detector for the case of two relay nodes, which proves to be much more robust against timing misalignment than the conventional STBC detector.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Most research on distributed space time block coding (STBC) has so far focused on the case of 2 relay nodes and assumed that the relay nodes are perfectly synchronised at the symbol level. By applying STBC to 3-or 4-relay node systems, this paper shows that imperfect synchronisation causes significant performance degradation to the conventional detector. To this end, we propose a new STBC detection solution based on the principle of parallel interference cancellation (PIC). The PIC detector is moderate in computational complexity but is very effective in suppressing the impact of imperfect synchronisation.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper is concerned with the uniformization of a system of afine recurrence equations. This transformation is used in the design (or compilation) of highly parallel embedded systems (VLSI systolic arrays, signal processing filters, etc.). In this paper, we present and implement an automatic system to achieve uniformization of systems of afine recurrence equations. We unify the results from many earlier papers, develop some theoretical extensions, and then propose effective uniformization algorithms. Our results can be used in any high level synthesis tool based on polyhedral representation of nested loop computations.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

With the transition to multicore processors almost complete, the parallel processing community is seeking efficient ways to port legacy message passing applications on shared memory and multicore processors. MPJ Express is our reference implementation of Message Passing Interface (MPI)-like bindings for the Java language. Starting with the current release, the MPJ Express software can be configured in two modes: the multicore and the cluster mode. In the multicore mode, parallel Java applications execute on shared memory or multicore processors. In the cluster mode, Java applications parallelized using MPJ Express can be executed on distributed memory platforms like compute clusters and clouds. The multicore device has been implemented using Java threads in order to satisfy two main design goals of portability and performance. We also discuss the challenges of integrating the multicore device in the MPJ Express software. This turned out to be a challenging task because the parallel application executes in a single JVM in the multicore mode. On the contrary in the cluster mode, the parallel user application executes in multiple JVMs. Due to these inherent architectural differences between the two modes, the MPJ Express runtime is modified to ensure correct semantics of the parallel program. Towards the end, we compare performance of MPJ Express (multicore mode) with other C and Java message passing libraries---including mpiJava, MPJ/Ibis, MPICH2, MPJ Express (cluster mode)---on shared memory and multicore processors. We found out that MPJ Express performs signicantly better in the multicore mode than in the cluster mode. Not only this but the MPJ Express software also performs better in comparison to other Java messaging libraries including mpiJava and MPJ/Ibis when used in the multicore mode on shared memory or multicore processors. We also demonstrate effectiveness of the MPJ Express multicore device in Gadget-2, which is a massively parallel astrophysics N-body siimulation code.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The past decade has witnessed explosive growth of mobile subscribers and services. With the purpose of providing better-swifter-cheaper services, radio network optimisation plays a crucial role but faces enormous challenges. The concept of Dynamic Network Optimisation (DNO), therefore, has been introduced to optimally and continuously adjust network configurations, in response to changes in network conditions and traffic. However, the realization of DNO has been seriously hindered by the bottleneck of optimisation speed performance. An advanced distributed parallel solution is presented in this paper, as to bridge the gap by accelerating the sophisticated proprietary network optimisation algorithm, while maintaining the optimisation quality and numerical consistency. The ariesoACP product from Arieso Ltd serves as the main platform for acceleration. This solution has been prototyped, implemented and tested. Real-project based results exhibit a high scalability and substantial acceleration at an average speed-up of 2.5, 4.9 and 6.1 on a distributed 5-core, 9-core and 16-core system, respectively. This significantly outperforms other parallel solutions such as multi-threading. Furthermore, augmented optimisation outcome, alongside high correctness and self-consistency, have also been fulfilled. Overall, this is a breakthrough towards the realization of DNO.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A connection between a fuzzy neural network model with the mixture of experts network (MEN) modelling approach is established. Based on this linkage, two new neuro-fuzzy MEN construction algorithms are proposed to overcome the curse of dimensionality that is inherent in the majority of associative memory networks and/or other rule based systems. The first construction algorithm employs a function selection manager module in an MEN system. The second construction algorithm is based on a new parallel learning algorithm in which each model rule is trained independently, for which the parameter convergence property of the new learning method is established. As with the first approach, an expert selection criterion is utilised in this algorithm. These two construction methods are equivalent in their effectiveness in overcoming the curse of dimensionality by reducing the dimensionality of the regression vector, but the latter has the additional computational advantage of parallel processing. The proposed algorithms are analysed for effectiveness followed by numerical examples to illustrate their efficacy for some difficult data based modelling problems.