9 resultados para Parallel or distributed processing

em Massachusetts Institute of Technology


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider the often-studied problem of sorting, for a parallel computer. Given an input array distributed evenly over p processors, the task is to compute the sorted output array, also distributed over the p processors. Many existing algorithms take the approach of approximately load-balancing the output, leaving each processor with Θ(n/p) elements. However, in many cases, approximate load-balancing leads to inefficiencies in both the sorting itself and in further uses of the data after sorting. We provide a deterministic parallel sorting algorithm that uses parallel selection to produce any output distribution exactly, particularly one that is perfectly load-balanced. Furthermore, when using a comparison sort, this algorithm is 1-optimal in both computation and communication. We provide an empirical study that illustrates the efficiency of exact data splitting, and shows an improvement over two sample sort algorithms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Jellybean Machine is a scalable MIMD concurrent processor consisting of special purpose RISC processors loosely coupled into a low latency network. I have developed an operating system to provide the supportive environment required to efficiently coordinate the collective power of the distributed processing elements. The system services are developed in detail, and may be of interest to other designers of fine grain, distributed memory processing networks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a DHT-based grid resource indexing and discovery (DGRID) approach. With DGRID, resource-information data is stored on its own administrative domain and each domain, represented by an index server, is virtualized to several nodes (virtual servers) subjected to the number of resource types it has. Then, all nodes are arranged as a structured overlay network or distributed hash table (DHT). Comparing to existing grid resource indexing and discovery schemes, the benefits of DGRID include improving the security of domains, increasing the availability of data, and eliminating stale data.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Conventional parallel computer architectures do not provide support for non-uniformly distributed objects. In this thesis, I introduce sparsely faceted arrays (SFAs), a new low-level mechanism for naming regions of memory, or facets, on different processors in a distributed, shared memory parallel processing system. Sparsely faceted arrays address the disconnect between the global distributed arrays provided by conventional architectures (e.g. the Cray T3 series), and the requirements of high-level parallel programming methods that wish to use objects that are distributed over only a subset of processing elements. A sparsely faceted array names a virtual globally-distributed array, but actual facets are lazily allocated. By providing simple semantics and making efficient use of memory, SFAs enable efficient implementation of a variety of non-uniformly distributed data structures and related algorithms. I present example applications which use SFAs, and describe and evaluate simple hardware mechanisms for implementing SFAs. Keeping track of which nodes have allocated facets for a particular SFA is an important task that suggests the need for automatic memory management, including garbage collection. To address this need, I first argue that conventional tracing techniques such as mark/sweep and copying GC are inherently unscalable in parallel systems. I then present a parallel memory-management strategy, based on reference-counting, that is capable of garbage collecting sparsely faceted arrays. I also discuss opportunities for hardware support of this garbage collection strategy. I have implemented a high-level hardware/OS simulator featuring hardware support for sparsely faceted arrays and automatic garbage collection. I describe the simulator and outline a few of the numerous details associated with a "real" implementation of SFAs and SFA-aware garbage collection. Simulation results are used throughout this thesis in the evaluation of hardware support mechanisms.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A vernier offset is detected at once among straight lines, and reaction times are almost independent of the number of simultaneously presented stimuli (distractors), indicating parallel processing of vernier offsets. Reaction times for identifying a vernier offset to one side among verniers offset to the opposite side increase with the number of distractors, indicating serial processing. Even deviations below a photoreceptor diameter can be detected at once. The visual system thus attains positional accuracy below the photoreceptor diameter simultaneously at different positions. I conclude that deviation from straightness, or change of orientation, is detected in parallel over the visual field. Discontinuities or gradients in orientation may represent an elementary feature of vision.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper presents a model for the general flow in the neocortex. The basic process, called "sequence-seeking," is a search for a sequence of mappings or transformations, linking source and target representations. The search is bi-directional, "bottom-up" as well as "top-down," and it explores in parallel a large numbe rof alternative sequences. This operation is implemented in a structure termed "counter streams," in which multiple sequences are explored along two separate, complementary pathways which seeking to meet. The first part of the paper discusses the general sequence-seeking scheme and a number of related processes, such as the learning of successful sequences, context effects, and the use of "express lines" and partial matches. The second part discusses biological implications of the model in terms of connections within and between cortical areas. The model is compared with existing data, and a number of new predictions are proposed.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This report describes Processor Coupling, a mechanism for controlling multiple ALUs on a single integrated circuit to exploit both instruction-level and inter-thread parallelism. A compiler statically schedules individual threads to discover available intra-thread instruction-level parallelism. The runtime scheduling mechanism interleaves threads, exploiting inter-thread parallelism to maintain high ALU utilization. ALUs are assigned to threads on a cycle byscycle basis, and several threads can be active concurrently. Simulation results show that Processor Coupling performs well both on single threaded and multi-threaded applications. The experiments address the effects of memory latencies, function unit latencies, and communication bandwidth between function units.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Parallel shared-memory machines with hundreds or thousands of processor-memory nodes have been built; in the future we will see machines with millions or even billions of nodes. Associated with such large systems is a new set of design challenges. Many problems must be addressed by an architecture in order for it to be successful; of these, we focus on three in particular. First, a scalable memory system is required. Second, the network messaging protocol must be fault-tolerant. Third, the overheads of thread creation, thread management and synchronization must be extremely low. This thesis presents the complete system design for Hamal, a shared-memory architecture which addresses these concerns and is directly scalable to one million nodes. Virtual memory and distributed objects are implemented in a manner that requires neither inter-node synchronization nor the storage of globally coherent translations at each node. We develop a lightweight fault-tolerant messaging protocol that guarantees message delivery and idempotence across a discarding network. A number of hardware mechanisms provide efficient support for massive multithreading and fine-grained synchronization. Experiments are conducted in simulation, using a trace-driven network simulator to investigate the messaging protocol and a cycle-accurate simulator to evaluate the Hamal architecture. We determine implementation parameters for the messaging protocol which optimize performance. A discarding network is easier to design and can be clocked at a higher rate, and we find that with this protocol its performance can approach that of a non-discarding network. Our simulations of Hamal demonstrate the effectiveness of its thread management and synchronization primitives. In particular, we find register-based synchronization to be an extremely efficient mechanism which can be used to implement a software barrier with a latency of only 523 cycles on a 512 node machine.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Euterpe is a real-time computer system for the modeling of musical structures. It provides a formalism wherein familiar concepts of musical analysis may be readily expressed. This is verified by its application to the analysis of a wide variety of conventional forms of music: Gregorian chant, Mediaeval polyphony, Back counterpoint, and sonata form. It may be of further assistance in the real-time experiments in various techniques of thematic development. Finally, the system is endowed with sound-synthesis apparatus with which the user may prepare tapes for musical performances.