858 resultados para parallel search
Resumo:
Large scale air pollution models are powerful tools, designed to meet the increasing demand in different environmental studies. The atmosphere is the most dynamic component of the environment, where the pollutants can be moved quickly on far distnce. Therefore the air pollution modeling must be done in a large computational domain. Moreover, all relevant physical, chemical and photochemical processes must be taken into account. In such complex models operator splitting is very often applied in order to achieve sufficient accuracy as well as efficiency of the numerical solution. The Danish Eulerian Model (DEM) is one of the most advanced such models. Its space domain (4800 × 4800 km) covers Europe, most of the Mediterian and neighboring parts of Asia and the Atlantic Ocean. Efficient parallelization is crucial for the performance and practical capabilities of this huge computational model. Different splitting schemes, based on the main processes mentioned above, have been implemented and tested with respect to accuracy and performance in the new version of DEM. Some numerical results of these experiments are presented in this paper.
Resumo:
In the 1990s the Message Passing Interface Forum defined MPI bindings for Fortran, C, and C++. With the success of MPI these relatively conservative languages have continued to dominate in the parallel computing community. There are compelling arguments in favour of more modern languages like Java. These include portability, better runtime error checking, modularity, and multi-threading. But these arguments have not converted many HPC programmers, perhaps due to the scarcity of full-scale scientific Java codes, and the lack of evidence for performance competitive with C or Fortran. This paper tries to redress this situation by porting two scientific applications to Java. Both of these applications are parallelized using our thread-safe Java messaging system—MPJ Express. The first application is the Gadget-2 code, which is a massively parallel structure formation code for cosmological simulations. The second application uses the finite-domain time-difference method for simulations in the area of computational electromagnetics. We evaluate and compare the performance of the Java and C versions of these two scientific applications, and demonstrate that the Java codes can achieve performance comparable with legacy applications written in conventional HPC languages. Copyright © 2009 John Wiley & Sons, Ltd.
Resumo:
We introduce a classification-based approach to finding occluding texture boundaries. The classifier is composed of a set of weak learners, which operate on image intensity discriminative features that are defined on small patches and are fast to compute. A database that is designed to simulate digitized occluding contours of textured objects in natural images is used to train the weak learners. The trained classifier score is then used to obtain a probabilistic model for the presence of texture transitions, which can readily be used for line search texture boundary detection in the direction normal to an initial boundary estimate. This method is fast and therefore suitable for real-time and interactive applications. It works as a robust estimator, which requires a ribbon-like search region and can handle complex texture structures without requiring a large number of observations. We demonstrate results both in the context of interactive 2D delineation and of fast 3D tracking and compare its performance with other existing methods for line search boundary detection.
Resumo:
As consumers demand more functionality) from their electronic devices and manufacturers supply the demand then electrical power and clock requirements tend to increase, however reassessing system architecture can fortunately lead to suitable counter reductions. To maintain low clock rates and therefore reduce electrical power, this paper presents a parallel convolutional coder for the transmit side in many wireless consumer devices. The coder accepts a parallel data input and directly computes punctured convolutional codes without the need for a separate puncturing operation while the coded bits are available at the output of the coder in a parallel fashion. Also as the computation is in parallel then the coder can be clocked at 7 times slower than the conventional shift-register based convolutional coder (using DVB 7/8 rate). The presented coder is directly relevant to the design of modern low-power consumer devices
Resumo:
The work reported in this paper is motivated by the fact that there is a need to apply autonomic computing concepts to parallel computing systems. Advancing on prior work based on intelligent cores [36], a swarm-array computing approach, this paper focuses on ‘Intelligent agents’ another swarm-array computing approach in which the task to be executed on a parallel computing core is considered as a swarm of autonomous agents. A task is carried to a computing core by carrier agents and is seamlessly transferred between cores in the event of a predicted failure, thereby achieving self-ware objectives of autonomic computing. The feasibility of the proposed swarm-array computing approach is validated on a multi-agent simulator.
Resumo:
A novel Linear Hashtable Method Predicted Hexagonal Search (LHMPHS) method for block based motion compensation is proposed. Fast block matching algorithms use the origin as the initial search center, which often does not track motion very well. To improve the accuracy of the fast BMA's, we employ a predicted starting search point, which reflects the motion trend of the current block. The predicted search centre is found closer to the global minimum. Thus the center-biased BMA's can be used to find the motion vector more efficiently. The performance of the algorithm is evaluated by using standard video sequences, considers the three important metrics: The results show that the proposed algorithm enhances the accuracy of current hexagonal algorithms and is better than Full Search, Logarithmic Search etc.
Resumo:
This paper presents a novel two-pass algorithm constituted by Linear Hashtable Motion Estimation Algorithm (LHMEA) and Hexagonal Search (HEXBS) for block base motion compensation. On the basis of research from previous algorithms, especially an on-the-edge motion estimation algorithm called hexagonal search (HEXBS), we propose the LHMEA and the Two-Pass Algorithm (TPA). We introduced hashtable into video compression. In this paper we employ LHMEA for the first-pass search in all the Macroblocks (MB) in the picture. Motion Vectors (MV) are then generated from the first-pass and are used as predictors for second-pass HEXBS motion estimation, which only searches a small number of MBs. The evaluation of the algorithm considers the three important metrics being time, compression rate and PSNR. The performance of the algorithm is evaluated by using standard video sequences and the results are compared to current algorithms, Experimental results show that the proposed algorithm can offer the same compression rate as the Full Search. LHMEA with TPA has significant improvement on HEXBS and shows a direction for improving other fast motion estimation algorithms, for example Diamond Search.
Resumo:
We propose a bridge between two important parallel programming paradigms: data parallelism and communicating sequential processes (CSP). Data parallel pipelined architectures obtained with the Alpha language can be embedded in a control intensive application expressed in CSP-based Handel formalism. The interface is formally defined from the semantics of the languages Alpha and Handel. This work will ease the design of compute intensive applications on FPGAs.
Resumo:
This paper is concerned with the uniformization of a system of afine recurrence equations. This transformation is used in the design (or compilation) of highly parallel embedded systems (VLSI systolic arrays, signal processing filters, etc.). In this paper, we present and implement an automatic system to achieve uniformization of systems of afine recurrence equations. We unify the results from many earlier papers, develop some theoretical extensions, and then propose effective uniformization algorithms. Our results can be used in any high level synthesis tool based on polyhedral representation of nested loop computations.
Resumo:
Purpose – The purpose of this paper is to consider Turing's two tests for machine intelligence: the parallel-paired, three-participants game presented in his 1950 paper, and the “jury-service” one-to-one measure described two years later in a radio broadcast. Both versions were instantiated in practical Turing tests during the 18th Loebner Prize for artificial intelligence hosted at the University of Reading, UK, in October 2008. This involved jury-service tests in the preliminary phase and parallel-paired in the final phase. Design/methodology/approach – Almost 100 test results from the final have been evaluated and this paper reports some intriguing nuances which arose as a result of the unique contest. Findings – In the 2008 competition, Turing's 30 per cent pass rate is not achieved by any machine in the parallel-paired tests but Turing's modified prediction: “at least in a hundred years time” is remembered. Originality/value – The paper presents actual responses from “modern Elizas” to human interrogators during contest dialogues that show considerable improvement in artificial conversational entities (ACE). Unlike their ancestor – Weizenbaum's natural language understanding system – ACE are now able to recall, share information and disclose personal interests.
Resumo:
Recent research in multi-agent systems incorporate fault tolerance concepts. However, the research does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely ‘Intelligent Agents’. In the approach considered a task to be executed on a parallel computing system is decomposed to sub-tasks and mapped onto agents that traverse an abstracted hardware layer. The agents intercommunicate across processors to share information during the event of a predicted core/processor failure and for successfully completing the task. The agents hence contribute towards fault tolerance and towards building reliable systems. The feasibility of the approach is validated by simulations on an FPGA using a multi-agent simulator and implementation of a parallel reduction algorithm on a computer cluster using the Message Passing Interface.
Resumo:
A connection between a fuzzy neural network model with the mixture of experts network (MEN) modelling approach is established. Based on this linkage, two new neuro-fuzzy MEN construction algorithms are proposed to overcome the curse of dimensionality that is inherent in the majority of associative memory networks and/or other rule based systems. The first construction algorithm employs a function selection manager module in an MEN system. The second construction algorithm is based on a new parallel learning algorithm in which each model rule is trained independently, for which the parameter convergence property of the new learning method is established. As with the first approach, an expert selection criterion is utilised in this algorithm. These two construction methods are equivalent in their effectiveness in overcoming the curse of dimensionality by reducing the dimensionality of the regression vector, but the latter has the additional computational advantage of parallel processing. The proposed algorithms are analysed for effectiveness followed by numerical examples to illustrate their efficacy for some difficult data based modelling problems.