774 resultados para GPGPU Parallel Computing


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Large scale air pollution models are powerful tools, designed to meet the increasing demand in different environmental studies. The atmosphere is the most dynamic component of the environment, where the pollutants can be moved quickly on far distnce. Therefore the air pollution modeling must be done in a large computational domain. Moreover, all relevant physical, chemical and photochemical processes must be taken into account. In such complex models operator splitting is very often applied in order to achieve sufficient accuracy as well as efficiency of the numerical solution. The Danish Eulerian Model (DEM) is one of the most advanced such models. Its space domain (4800 × 4800 km) covers Europe, most of the Mediterian and neighboring parts of Asia and the Atlantic Ocean. Efficient parallelization is crucial for the performance and practical capabilities of this huge computational model. Different splitting schemes, based on the main processes mentioned above, have been implemented and tested with respect to accuracy and performance in the new version of DEM. Some numerical results of these experiments are presented in this paper.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Since its introduction in 1993, the Message Passing Interface (MPI) has become a de facto standard for writing High Performance Computing (HPC) applications on clusters and Massively Parallel Processors (MPPs). The recent emergence of multi-core processor systems presents a new challenge for established parallel programming paradigms, including those based on MPI. This paper presents a new Java messaging system called MPJ Express. Using this system, we exploit multiple levels of parallelism - messaging and threading - to improve application performance on multi-core processors. We refer to our approach as nested parallelism. This MPI-like Java library can support nested parallelism by using Java or Java OpenMP (JOMP) threads within an MPJ Express process. Practicality of this approach is assessed by porting to Java a massively parallel structure formation code from Cosmology called Gadget-2. We introduce nested parallelism in the Java version of the simulation code and report good speed-ups. To the best of our knowledge it is the first time this kind of hybrid parallelism is demonstrated in a high performance Java application. (C) 2009 Elsevier Inc. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

As consumers demand more functionality) from their electronic devices and manufacturers supply the demand then electrical power and clock requirements tend to increase, however reassessing system architecture can fortunately lead to suitable counter reductions. To maintain low clock rates and therefore reduce electrical power, this paper presents a parallel convolutional coder for the transmit side in many wireless consumer devices. The coder accepts a parallel data input and directly computes punctured convolutional codes without the need for a separate puncturing operation while the coded bits are available at the output of the coder in a parallel fashion. Also as the computation is in parallel then the coder can be clocked at 7 times slower than the conventional shift-register based convolutional coder (using DVB 7/8 rate). The presented coder is directly relevant to the design of modern low-power consumer devices

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we consider bilinear forms of matrix polynomials and show that these polynomials can be used to construct solutions for the problems of solving systems of linear algebraic equations, matrix inversion and finding extremal eigenvalues. An almost Optimal Monte Carlo (MAO) algorithm for computing bilinear forms of matrix polynomials is presented. Results for the computational costs of a balanced algorithm for computing the bilinear form of a matrix power is presented, i.e., an algorithm for which probability and systematic errors are of the same order, and this is compared with the computational cost for a corresponding deterministic method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Fully connected cubic networks (FCCNs) are a class of newly proposed hierarchical interconnection networks for multicomputer systems, which enjoy the strengths of constant node degree and good expandability. The shortest path routing in FCCNs is an open problem. In this paper, we present an oblivious routing algorithm for n-level FCCN with N = 8(n) nodes, and prove that this algorithm creates a shortest path from the source to the destination. At the costs of both an O(N)-parallel-step off-line preprocessing phase and a list of size N stored at each node, the proposed algorithm is carried out at each related node in O(n) time. In some cases the proposed algorithm is superior to the one proposed by Chang and Wang in terms of the length of the routing path. This justifies the utility of our routing strategy. (C) 2006 Elsevier Inc. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A Blueprint for Affective Computing: A sourcebook and manual is the very first attempt to ground affective computing within the disciplines of psychology, affective neuroscience, and philosophy. This book illustrates the contributions of each of these disciplines to the development of the ever-growing field of affective computing. In addition, it demonstrates practical examples of cross-fertilization between disciplines in order to highlight the need for integration of computer science, engineering and the affective sciences.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a bridge between two important parallel programming paradigms: data parallelism and communicating sequential processes (CSP). Data parallel pipelined architectures obtained with the Alpha language can be embedded in a control intensive application expressed in CSP-based Handel formalism. The interface is formally defined from the semantics of the languages Alpha and Handel. This work will ease the design of compute intensive applications on FPGAs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper is concerned with the uniformization of a system of afine recurrence equations. This transformation is used in the design (or compilation) of highly parallel embedded systems (VLSI systolic arrays, signal processing filters, etc.). In this paper, we present and implement an automatic system to achieve uniformization of systems of afine recurrence equations. We unify the results from many earlier papers, develop some theoretical extensions, and then propose effective uniformization algorithms. Our results can be used in any high level synthesis tool based on polyhedral representation of nested loop computations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Purpose – The purpose of this paper is to consider Turing's two tests for machine intelligence: the parallel-paired, three-participants game presented in his 1950 paper, and the “jury-service” one-to-one measure described two years later in a radio broadcast. Both versions were instantiated in practical Turing tests during the 18th Loebner Prize for artificial intelligence hosted at the University of Reading, UK, in October 2008. This involved jury-service tests in the preliminary phase and parallel-paired in the final phase. Design/methodology/approach – Almost 100 test results from the final have been evaluated and this paper reports some intriguing nuances which arose as a result of the unique contest. Findings – In the 2008 competition, Turing's 30 per cent pass rate is not achieved by any machine in the parallel-paired tests but Turing's modified prediction: “at least in a hundred years time” is remembered. Originality/value – The paper presents actual responses from “modern Elizas” to human interrogators during contest dialogues that show considerable improvement in artificial conversational entities (ACE). Unlike their ancestor – Weizenbaum's natural language understanding system – ACE are now able to recall, share information and disclose personal interests.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The work reported in this paper is motivated towards handling single node failures for parallel summation algorithms in computer clusters. An agent based approach is proposed in which a task to be executed is decomposed to sub-tasks and mapped onto agents that traverse computing nodes. The agents intercommunicate across computing nodes to share information during the event of a predicted node failure. Two single node failure scenarios are considered. The Message Passing Interface is employed for implementing the proposed approach. Quantitative results obtained from experiments reveal that the agent based approach can handle failures more efficiently than traditional failure handling approaches.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The increasing demand for cheaper-faster-better services anytime and anywhere has made radio network optimisation much more complex than ever before. In order to dynamically optimise the serving network, Dynamic Network Optimisation (DNO), is proposed as the ultimate solution and future trend. The realization of DNO, however, has been hindered by a significant bottleneck of the optimisation speed as the network complexity grows. This paper presents a multi-threaded parallel solution to accelerate complicated proprietary network optimisation algorithms, under a rigid condition of numerical consistency. ariesoACP product from Arieso Ltd serves as the platform for parallelisation. This parallel solution has been benchmarked and results exhibit a high scalability and a run-time reduction by 11% to 42% based on the technology, subscriber density and blocking rate of a given network in comparison with the original version. Further, it is highly essential that the parallel version produces equivalent optimisation quality in terms of identical optimisation outputs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A connection between a fuzzy neural network model with the mixture of experts network (MEN) modelling approach is established. Based on this linkage, two new neuro-fuzzy MEN construction algorithms are proposed to overcome the curse of dimensionality that is inherent in the majority of associative memory networks and/or other rule based systems. The first construction algorithm employs a function selection manager module in an MEN system. The second construction algorithm is based on a new parallel learning algorithm in which each model rule is trained independently, for which the parameter convergence property of the new learning method is established. As with the first approach, an expert selection criterion is utilised in this algorithm. These two construction methods are equivalent in their effectiveness in overcoming the curse of dimensionality by reducing the dimensionality of the regression vector, but the latter has the additional computational advantage of parallel processing. The proposed algorithms are analysed for effectiveness followed by numerical examples to illustrate their efficacy for some difficult data based modelling problems.