894 resultados para HPC parallel computer architecture queues fault tolerance programmability ADAM


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The furious pace of Moore's Law is driving computer architecture into a realm where the the speed of light is the dominant factor in system latencies. The number of clock cycles to span a chip are increasing, while the number of bits that can be accessed within a clock cycle is decreasing. Hence, it is becoming more difficult to hide latency. One alternative solution is to reduce latency by migrating threads and data, but the overhead of existing implementations has previously made migration an unserviceable solution so far. I present an architecture, implementation, and mechanisms that reduces the overhead of migration to the point where migration is a viable supplement to other latency hiding mechanisms, such as multithreading. The architecture is abstract, and presents programmers with a simple, uniform fine-grained multithreaded parallel programming model with implicit memory management. In other words, the spatial nature and implementation details (such as the number of processors) of a parallel machine are entirely hidden from the programmer. Compiler writers are encouraged to devise programming languages for the machine that guide a programmer to express their ideas in terms of objects, since objects exhibit an inherent physical locality of data and code. The machine implementation can then leverage this locality to automatically distribute data and threads across the physical machine by using a set of high performance migration mechanisms. An implementation of this architecture could migrate a null thread in 66 cycles -- over a factor of 1000 improvement over previous work. Performance also scales well; the time required to move a typical thread is only 4 to 5 times that of a null thread. Data migration performance is similar, and scales linearly with data block size. Since the performance of the migration mechanism is on par with that of an L2 cache, the implementation simulated in my work has no data caches and relies instead on multithreading and the migration mechanism to hide and reduce access latencies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an architecture (Multi-μ) being implemented to study and develop software based fault tolerant mechanisms for Real-Time Systems, using the Ada language (Ada 95) and Commercial Off-The-Shelf (COTS) components. Several issues regarding fault tolerance are presented and mechanisms to achieve fault tolerance by software active replication in Ada 95 are discussed. The Multi-μ architecture, based on a specifically proposed Fault Tolerance Manager (FTManager), is then described. Finally, some considerations are made about the work being done and essential future developments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work shows the design, simulation, and analysis of two optical interconnection networks for a Dataflow parallel computer architecture. To verify the optical interconnection network performance on the Dataflow architecture, we have analyzed the load balancing among the processors during the parallel programs executions. The load balancing is a very important parameter because it is directly associated to the dataflow parallelism degree. This article proves that optical interconnection networks designed with simple optical devices can provide efficiently the dataflow requirements of a high performance communication system.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Processor virtualization for process migration in distributed parallel computing systems has formed a significant component of research on load balancing. In contrast, the potential of processor virtualization for fault tolerance has been addressed minimally. The work reported in this paper is motivated towards extending concepts of processor virtualization towards ‘intelligent cores’ as a means to achieve fault tolerance in distributed parallel computing systems. Intelligent cores are an abstraction of the hardware processing cores, with the incorporation of cognitive capabilities, on which parallel tasks can be executed and migrated. When a processing core executing a task is predicted to fail the task being executed is proactively transferred onto another core. A parallel reduction algorithm incorporating concepts of intelligent cores is implemented on a computer cluster using Adaptive MPI and Charm ++. Preliminary results confirm the feasibility of the approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent research in multi-agent systems incorporate fault tolerance concepts, but does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely 'Intelligent Agents'. A task to be executed on a parallel computing system is decomposed to sub-tasks and mapped onto agents that traverse an abstracted hardware layer. The agents intercommunicate across processors to share information during the event of a predicted core/processor failure and for successfully completing the task. The feasibility of the approach is validated by simulations on an FPGA using a multi-agent simulator, and implementation of a parallel reduction algorithm on a computer cluster using the Message Passing Interface.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The term "Logic Programming" refers to a variety of computer languages and execution models which are based on the traditional concept of Symbolic Logic. The expressive power of these languages offers promise to be of great assistance in facing the programming challenges of present and future symbolic processing applications in Artificial Intelligence, Knowledge-based systems, and many other areas of computing. The sequential execution speed of logic programs has been greatly improved since the advent of the first interpreters. However, higher inference speeds are still required in order to meet the demands of applications such as those contemplated for next generation computer systems. The execution of logic programs in parallel is currently considered a promising strategy for attaining such inference speeds. Logic Programming in turn appears as a suitable programming paradigm for parallel architectures because of the many opportunities for parallel execution present in the implementation of logic programs. This dissertation presents an efficient parallel execution model for logic programs. The model is described from the source language level down to an "Abstract Machine" level suitable for direct implementation on existing parallel systems or for the design of special purpose parallel architectures. Few assumptions are made at the source language level and therefore the techniques developed and the general Abstract Machine design are applicable to a variety of logic (and also functional) languages. These techniques offer efficient solutions to several areas of parallel Logic Programming implementation previously considered problematic or a source of considerable overhead, such as the detection and handling of variable binding conflicts in AND-Parallelism, the specification of control and management of the execution tree, the treatment of distributed backtracking, and goal scheduling and memory management issues, etc. A parallel Abstract Machine design is offered, specifying data areas, operation, and a suitable instruction set. This design is based on extending to a parallel environment the techniques introduced by the Warren Abstract Machine, which have already made very fast and space efficient sequential systems a reality. Therefore, the model herein presented is capable of retaining sequential execution speed similar to that of high performance sequential systems, while extracting additional gains in speed by efficiently implementing parallel execution. These claims are supported by simulations of the Abstract Machine on sample programs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work presents a theoretical-graph method of determining the fault tolerance degree of the computer network interconnections and nodes. Experimental results received from simulations of this method over a distributed computing network environment are also presented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hyperspectral imaging has become one of the main topics in remote sensing applications, which comprise hundreds of spectral bands at different (almost contiguous) wavelength channels over the same area generating large data volumes comprising several GBs per flight. This high spectral resolution can be used for object detection and for discriminate between different objects based on their spectral characteristics. One of the main problems involved in hyperspectral analysis is the presence of mixed pixels, which arise when the spacial resolution of the sensor is not able to separate spectrally distinct materials. Spectral unmixing is one of the most important task for hyperspectral data exploitation. However, the unmixing algorithms can be computationally very expensive, and even high power consuming, which compromises the use in applications under on-board constraints. In recent years, graphics processing units (GPUs) have evolved into highly parallel and programmable systems. Specifically, several hyperspectral imaging algorithms have shown to be able to benefit from this hardware taking advantage of the extremely high floating-point processing performance, compact size, huge memory bandwidth, and relatively low cost of these units, which make them appealing for onboard data processing. In this paper, we propose a parallel implementation of an augmented Lagragian based method for unsupervised hyperspectral linear unmixing on GPUs using CUDA. The method called simplex identification via split augmented Lagrangian (SISAL) aims to identify the endmembers of a scene, i.e., is able to unmix hyperspectral data sets in which the pure pixel assumption is violated. The efficient implementation of SISAL method presented in this work exploits the GPU architecture at low level, using shared memory and coalesced accesses to memory.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An n-dimensional Mobius cube, 0MQ(n) or 1MQ(n), is a variation of n-dimensional cube Q(n) which possesses many attractive properties such as significantly smaller communication delay and stronger graph-embedding capabilities. In some practical situations, the fault tolerance of a distributed memory multiprocessor system can be measured more precisely by the connectivity of the underlying graph under forbidden fault set models. This article addresses the connectivity of 0MQ(n)/1MQ(n), under two typical forbidden fault set models. We first prove that the connectivity of 0MQ(n)/1MQ(n) is 2n - 2 when the fault set does not contain the neighborhood of any vertex as a subset. We then prove that the connectivity of 0MQ(n)/1MQ(n) is 3n - 5 provided that the neighborhood of any vertex as well as that of any edge cannot fail simultaneously These results demonstrate that 0MQ(n)/1MQ(n) has the same connectivity as Q(n) under either of the previous assumptions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fault tolerance allows a system to remain operational to some degree when some of its components fail. One of the most common fault tolerance mechanisms consists on logging the system state periodically, and recovering the system to a consistent state in the event of a failure. This paper describes a general fault tolerance logging-based mechanism, which can be layered over deterministic systems. Our proposal describes how a logging mechanism can recover the underlying system to a consistent state, even if an action or set of actions were interrupted mid-way, due to a server crash. We also propose different methods of storing the logging information, and describe how to deploy a fault tolerant master-slave cluster for information replication. We adapt our model to a previously proposed framework, which provided common relational features, like transactions with atomic, consistent, isolated and durable properties, to NoSQL database management systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

It is imperative to accept that failures can and will occur, even in meticulously designed distributed systems, and design proper measures to counter those failures. Passive replication minimises resource consumption by only activating redundant replicas in case of failures, as typically providing and applying state updates is less resource demanding than requesting execution. However, most existing solutions for passive fault tolerance are usually designed and configured at design time, explicitly and statically identifying the most critical components and their number of replicas, lacking the needed flexibility to handle the runtime dynamics of distributed component-based embedded systems. This paper proposes a cost-effective adaptive fault tolerance solution with a significant lower overhead compared to a strict active redundancy-based approach, achieving a high error coverage with the minimum amount of redundancy. The activation of passive replicas is coordinated through a feedback-based coordination model that reduces the complexity of the needed interactions among components until a new collective global service solution is determined, improving the overall maintainability and robustness of the system.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents SiMR, a simulator of the Rudimentary Machine designed to be used in a first course of computer architecture of Software Engineering and Computer Engineering programmes. The Rudimentary Machine contains all the basic elements in a RISC computer, and SiMR allows editing, assembling and executing programmes for this processor. SiMR is used at the Universitat Oberta de Catalunya as one of the most important resources in the Virtual Computing Architecture and Organisation Laboratory, since students work at home with the simulator and reports containing their work are automatically generated to be evaluated by lecturers. The results obtained from a survey show that most of the students consider SiMR as a highly necessary or even an indispensable resource to learn the basic concepts about computer architecture.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present an algorithm for the computation of reducible invariant tori of discrete dynamical systems that is suitable for tori of dimensions larger than 1. It is based on a quadratically convergent scheme that approximates, at the same time, the Fourier series of the torus, its Floquet transformation, and its Floquet matrix. The Floquet matrix describes the linearization of the dynamics around the torus and, hence, its linear stability. The algorithm presents a high degree of parallelism, and the computational effort grows linearly with the number of Fourier modes needed to represent the solution. For these reasons it is a very good option to compute quasi-periodic solutions with several basic frequencies. The paper includes some examples (flows) to show the efficiency of the method in a parallel computer. In these flows we compute invariant tori of dimensions up to 5, by taking suitable sections.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Technology scaling has proceeded into dimensions in which the reliability of manufactured devices is becoming endangered. The reliability decrease is a consequence of physical limitations, relative increase of variations, and decreasing noise margins, among others. A promising solution for bringing the reliability of circuits back to a desired level is the use of design methods which introduce tolerance against possible faults in an integrated circuit. This thesis studies and presents fault tolerance methods for network-onchip (NoC) which is a design paradigm targeted for very large systems-onchip. In a NoC resources, such as processors and memories, are connected to a communication network; comparable to the Internet. Fault tolerance in such a system can be achieved at many abstraction levels. The thesis studies the origin of faults in modern technologies and explains the classification to transient, intermittent and permanent faults. A survey of fault tolerance methods is presented to demonstrate the diversity of available methods. Networks-on-chip are approached by exploring their main design choices: the selection of a topology, routing protocol, and flow control method. Fault tolerance methods for NoCs are studied at different layers of the OSI reference model. The data link layer provides a reliable communication link over a physical channel. Error control coding is an efficient fault tolerance method especially against transient faults at this abstraction level. Error control coding methods suitable for on-chip communication are studied and their implementations presented. Error control coding loses its effectiveness in the presence of intermittent and permanent faults. Therefore, other solutions against them are presented. The introduction of spare wires and split transmissions are shown to provide good tolerance against intermittent and permanent errors and their combination to error control coding is illustrated. At the network layer positioned above the data link layer, fault tolerance can be achieved with the design of fault tolerant network topologies and routing algorithms. Both of these approaches are presented in the thesis together with realizations in the both categories. The thesis concludes that an optimal fault tolerance solution contains carefully co-designed elements from different abstraction levels