Biblioteca Digital

36 resultados para intel processor

Nested parallelism for multi-core HPC systems using Java

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Since its introduction in 1993, the Message Passing Interface (MPI) has become a de facto standard for writing High Performance Computing (HPC) applications on clusters and Massively Parallel Processors (MPPs). The recent emergence of multi-core processor systems presents a new challenge for established parallel programming paradigms, including those based on MPI. This paper presents a new Java messaging system called MPJ Express. Using this system, we exploit multiple levels of parallelism - messaging and threading - to improve application performance on multi-core processors. We refer to our approach as nested parallelism. This MPI-like Java library can support nested parallelism by using Java or Java OpenMP (JOMP) threads within an MPJ Express process. Practicality of this approach is assessed by porting to Java a massively parallel structure formation code from Cosmology called Gadget-2. We introduce nested parallelism in the Java version of the simulation code and report good speed-ups. To the best of our knowledge it is the first time this kind of hybrid parallelism is demonstrated in a high performance Java application. (C) 2009 Elsevier Inc. All rights reserved.

Processors (WO 2010/082067 A2)

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A processing system comprises a plurality of processors (12) and communication means (20) arranged to carry messages between the processors, wherein each of the processors (12) has an operating instruction memory field (32, 34, 36) arranged to hold stored operating instructions including a re-routing target address. Each processor is arranged to receive a message (38) including operating instructions including a target address. On receipt of the message, each processor is arranged to: check the target address in the message to determine whether it corresponds to an address associated with the processor; if the target address in the message does correspond to an address associated with the processor, to check the operating instructions in the message to determine whether the message is to be re-routed; and, if the message is to be re-routed, to replace operating instructions within the message with the stored operating instructions, and place the message on the communication means for delivery to the re-routing target address.

An MPI-based implementation of intelligent agents on clusters

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Recent research in multi-agent systems incorporate fault tolerance concepts, but does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely 'Intelligent Agents'. A task to be executed on a parallel computing system is decomposed to sub-tasks and mapped onto agents that traverse an abstracted hardware layer. The agents intercommunicate across processors to share information during the event of a predicted core/processor failure and for successfully completing the task. The feasibility of the approach is validated by implementation of a parallel reduction algorithm on a computer cluster using the Message Passing Interface.

Intelligent agents for fault tolerance: from multi-agent simulation to cluster-based implementation

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Recent research in multi-agent systems incorporate fault tolerance concepts, but does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely 'Intelligent Agents'. A task to be executed on a parallel computing system is decomposed to sub-tasks and mapped onto agents that traverse an abstracted hardware layer. The agents intercommunicate across processors to share information during the event of a predicted core/processor failure and for successfully completing the task. The feasibility of the approach is validated by simulations on an FPGA using a multi-agent simulator, and implementation of a parallel reduction algorithm on a computer cluster using the Message Passing Interface.

A cluster-based implementation of a fault-tolerant parallel reduction algorithm using swarm-array computing

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Recent research in multi-agent systems incorporate fault tolerance concepts. However, the research does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely ‘Intelligent Agents’. In the approach considered a task to be executed on a parallel computing system is decomposed to sub-tasks and mapped onto agents that traverse an abstracted hardware layer. The agents intercommunicate across processors to share information during the event of a predicted core/processor failure and for successfully completing the task. The agents hence contribute towards fault tolerance and towards building reliable systems. The feasibility of the approach is validated by simulations on an FPGA using a multi-agent simulator and implementation of a parallel reduction algorithm on a computer cluster using the Message Passing Interface.

Parallelism in control

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A parallel structure is suggested for feedback control systems. Such a technique can be applied to either single or multi-sensor environments and is ideally suited for parallel processor implementation. The control input actually applied is based on a weighted summation of the different parallel controller values, the weightings being either fixed values or chosen by an adaptive decision-making mechanism. The effect of different controller combinations is a field now open to study.

Neural networks for power systems alarm handling

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A multi-layered architecture of self-organizing neural networks is being developed as part of an intelligent alarm processor to analyse a stream of power grid fault messages and provide a suggested diagnosis of the fault location. Feedback concerning the accuracy of the diagnosis is provided by an object-oriented grid simulator which acts as an external supervisor to the learning system. The utilization of artificial neural networks within this environment should result in a powerful generic alarm processor which will not require extensive training by a human expert to produce accurate results.

Microprocessor systems: an introduction

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The aim of this book is to provide and introduction to microprocessor systems, their operation and design. It covers those topics needed by engineers and computer scientists who are interested in applying microprocessors in practical situations, namely computer hardware including logic and interfacing, software, in particular high level and assembly language programming, and the design and testing of such systems. The fundamental principles of micrprocessor systems are described and these are illustrated with reference to two microprocessors, the 32-bit MC68020 from Motorola and a single chip microcomputer, the 8051 from Intel; and in addition, interfacing to the general purpose STE bus is described. The details of the processors and the bus are concentrated in three chapters, thus allowing the presentation of the material to be independent of the microprocessors if that is desired, and permitting the specific details to be found easily.

New fast SCFT algorithm applied to binary diblock copolymer/homopolymer blends

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present an efficient strategy for mapping out the classical phase behavior of block copolymer systems using self-consistent field theory (SCFT). With our new algorithm, the complete solution of a classical block copolymer phase can be evaluated typically in a fraction of a second on a single-processor computer, even for highly segregated melts. This is accomplished by implementing the standard unit-cell approximation (UCA) for the cylindrical and spherical phases, and solving the resulting equations using a Bessel function expansion. Here the method is used to investigate blends of AB diblock copolymer and A homopolymer, concentrating on the situation where the two molecules are of similar size.

PDM 2006: Proceedings of the International Workshop on Data Mining

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Recently major processor manufacturers have announced a dramatic shift in their paradigm to increase computing power over the coming years. Instead of focusing on faster clock speeds and more powerful single core CPUs, the trend clearly goes towards multi core systems. This will also result in a paradigm shift for the development of algorithms for computationally expensive tasks, such as data mining applications. Obviously, work on parallel algorithms is not new per se but concentrated efforts in the many application domains are still missing. Multi-core systems, but also clusters of workstations and even large-scale distributed computing infrastructures provide new opportunities and pose new challenges for the design of parallel and distributed algorithms. Since data mining and machine learning systems rely on high performance computing systems, research on the corresponding algorithms must be on the forefront of parallel algorithm research in order to keep pushing data mining and machine learning applications to be more powerful and, especially for the former, interactive. To bring together researchers and practitioners working in this exciting field, a workshop on parallel data mining was organized as part of PKDD/ECML 2006 (Berlin, Germany). The six contributions selected for the program describe various aspects of data mining and machine learning approaches featuring low to high degrees of parallelism: The first contribution focuses the classic problem of distributed association rule mining and focuses on communication efficiency to improve the state of the art. After this a parallelization technique for speeding up decision tree construction by means of thread-level parallelism for shared memory systems is presented. The next paper discusses the design of a parallel approach for dis- tributed memory systems of the frequent subgraphs mining problem. This approach is based on a hierarchical communication topology to solve issues related to multi-domain computational envi- ronments. The forth paper describes the combined use and the customization of software packages to facilitate a top down parallelism in the tuning of Support Vector Machines (SVM) and the next contribution presents an interesting idea concerning parallel training of Conditional Random Fields (CRFs) and motivates their use in labeling sequential data. The last contribution finally focuses on very efficient feature selection. It describes a parallel algorithm for feature selection from random subsets. Selecting the papers included in this volume would not have been possible without the help of an international Program Committee that has provided detailed reviews for each paper. We would like to also thank Matthew Otey who helped with publicity for the workshop.

Transputer based real time robot control

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A parallel processor architecture based on a communicating sequential processor chip, the transputer, is described. The architecture is easily linearly extensible to enable separate functions to be included in the controller. To demonstrate the power of the resulting controller some experimental results are presented comparing PID and full inverse dynamics on the first three joints of a Puma 560 robot. Also examined are some of the sample rate issues raised by the asynchronous updating of inertial parameters, and the need for full inverse dynamics at every sample interval is questioned.

Technical Note: The normal quantile transformation and its application in a flood forecasting system.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The Normal Quantile Transform (NQT) has been used in many hydrological and meteorological applications in order to make the Cumulated Distribution Function (CDF) of the observed, simulated and forecast river discharge, water level or precipitation data Gaussian. It is also the heart of the meta-Gaussian model for assessing the total predictive uncertainty of the Hydrological Uncertainty Processor (HUP) developed by Krzysztofowicz. In the field of geo-statistics this transformation is better known as the Normal-Score Transform. In this paper some possible problems caused by small sample sizes when applying the NQT in flood forecasting systems will be discussed and a novel way to solve the problem will be outlined by combining extreme value analysis and non-parametric regression methods. The method will be illustrated by examples of hydrological stream-flow forecasts.

Runge-Kutta IMEX schemes for the Horizontally Explicit/Vertically Implicit (HEVI) solution of wave equations

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Many operational weather forecasting centres use semi-implicit time-stepping schemes because of their good efficiency. However, as computers become ever more parallel, horizontally explicit solutions of the equations of atmospheric motion might become an attractive alternative due to the additional inter-processor communication of implicit methods. Implicit and explicit (IMEX) time-stepping schemes have long been combined in models of the atmosphere using semi-implicit, split-explicit or HEVI splitting. However, most studies of the accuracy and stability of IMEX schemes have been limited to the parabolic case of advection–diffusion equations. We demonstrate how a number of Runge–Kutta IMEX schemes can be used to solve hyperbolic wave equations either semi-implicitly or HEVI. A new form of HEVI splitting is proposed, UfPreb, which dramatically improves accuracy and stability of simulations of gravity waves in stratified flow. As a consequence it is found that there are HEVI schemes that do not lose accuracy in comparison to semi-implicit ones. The stability limits of a number of variations of trapezoidal implicit and some Runge–Kutta IMEX schemes are found and the schemes are tested on two vertical slice cases using the compressible Boussinesq equations split into various combinations of implicit and explicit terms. Some of the Runge–Kutta schemes are found to be beneficial over trapezoidal, especially since they damp high frequencies without dropping to first-order accuracy. We test schemes that are not formally accurate for stiff systems but in stiff limits (nearly incompressible) and find that they can perform well. The scheme ARK2(2,3,2) performs the best in the tests.

Fine-grained multi-phase array designs

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Hybrid multiprocessor architectures which combine re-configurable computing and multiprocessors on a chip are being proposed to transcend the performance of standard multi-core parallel systems. Both fine-grained and coarse-grained parallel algorithm implementations are feasible in such hybrid frameworks. A compositional strategy for designing fine-grained multi-phase regular processor arrays to target hybrid architectures is presented in this paper. The method is based on deriving component designs using classical regular array techniques and composing the components into a unified global design. Effective designs with phase-changes and data routing at run-time are characteristics of these designs. In order to describe the data transfer between phases, the concept of communication domain is introduced so that the producer–consumer relationship arising from multi-phase computation can be treated in a unified way as a data routing phase. This technique is applied to derive new designs of multi-phase regular arrays with different dataflow between phases of computation.

WCSC 2013: the 3rd World Chess Software Championship

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The 3rd World Chess Software Championship took place in Yokohama, Japan during August 2013. It pits chess engines against each other on a common hardware platform - in this instance, the Intel i7 2740 Ivy Bridge with 16GB RAM supporting a potential eight processing threads. It was narrowly won by HIARCS from JUNIOR and PANDIX with JONNY, SHREDDER and MERLIN taking the remaining places. Games, occasionally annotated, are available here.

«
1
2
3
»