22 resultados para parallel-machine
Resumo:
This thesis describes the design and implementation of an integrated circuit and associated packaging to be used as the building block for the data routing network of a large scale shared memory multiprocessor system. A general purpose multiprocessor depends on high-bandwidth, low-latency communications between computing elements. This thesis describes the design and construction of RN1, a novel self-routing, enhanced crossbar switch as a CMOS VLSI chip. This chip provides the basic building block for a scalable pipelined routing network with byte-wide data channels. A series of RN1 chips can be cascaded with no additional internal network components to form a multistage fault-tolerant routing switch. The chip is designed to operate at clock frequencies up to 100Mhz using Hewlett-Packard's HP34 $1.2\\mu$ process. This aggressive performance goal demands that special attention be paid to optimization of the logic architecture and circuit design.
Resumo:
Techniques, suitable for parallel implementation, for robust 2D model-based object recognition in the presence of sensor error are studied. Models and scene data are represented as local geometric features and robust hypothesis of feature matchings and transformations is considered. Bounds on the error in the image feature geometry are assumed constraining possible matchings and transformations. Transformation sampling is introduced as a simple, robust, polynomial-time, and highly parallel method of searching the space of transformations to hypothesize feature matchings. Key to the approach is that error in image feature measurement is explicitly accounted for. A Connection Machine implementation and experiments on real images are presented.
Resumo:
Rapid judgments about the properties and spatial relations of objects are the crux of visually guided interaction with the world. Vision begins, however, with essentially pointwise representations of the scene, such as arrays of pixels or small edge fragments. For adequate time-performance in recognition, manipulation, navigation, and reasoning, the processes that extract meaningful entities from the pointwise representations must exploit parallelism. This report develops a framework for the fast extraction of scene entities, based on a simple, local model of parallel computation.sAn image chunk is a subset of an image that can act as a unit in the course of spatial analysis. A parallel preprocessing stage constructs a variety of simple chunks uniformly over the visual array. On the basis of these chunks, subsequent serial processes locate relevant scene components and assemble detailed descriptions of them rapidly. This thesis defines image chunks that facilitate the most potentially time-consuming operations of spatial analysis---boundary tracing, area coloring, and the selection of locations at which to apply detailed analysis. Fast parallel processes for computing these chunks from images, and chunk-based formulations of indexing, tracing, and coloring, are presented. These processes have been simulated and evaluated on the lisp machine and the connection machine.
Resumo:
Reconstructing a surface from sparse sensory data is a well known problem in computer vision. Early vision modules typically supply sparse depth, orientation and discontinuity information. The surface reconstruction module incorporates these sparse and possibly conflicting measurements of a surface into a consistent, dense depth map. The coupled depth/slope model developed here provides a novel computational solution to the surface reconstruction problem. This method explicitly computes dense slope representation as well as dense depth representations. This marked change from previous surface reconstruction algorithms allows a natural integration of orientation constraints into the surface description, a feature not easily incorporated into earlier algorithms. In addition, the coupled depth/ slope model generalizes to allow for varying amounts of smoothness at different locations on the surface. This computational model helps conceptualize the problem and leads to two possible implementations- analog and digital. The model can be implemented as an electrical or biological analog network since the only computations required at each locally connected node are averages, additions and subtractions. A parallel digital algorithm can be derived by using finite difference approximations. The resulting system of coupled equations can be solved iteratively on a mesh-pf-processors computer, such as the Connection Machine. Furthermore, concurrent multi-grid methods are designed to speed the convergence of this digital algorithm.
Resumo:
This report describes a knowledge-base system in which the information is stored in a network of small parallel processing elements ??de and link units ??ich are controlled by an external serial computer. This network is similar to the semantic network system of Quillian, but is much more tightly controlled. Such a network can perform certain critical deductions and searches very quickly; it avoids many of the problems of current systems, which must use complex heuristics to limit and guided their searches. It is argued (with examples) that the key operation in a knowledge-base system is the intersection of large explicit and semi-explicit sets. The parallel network system does this in a small, essentially constant number of cycles; a serial machine takes time proportional to the size of the sets, except in special cases.
Resumo:
Conventional parallel computer architectures do not provide support for non-uniformly distributed objects. In this thesis, I introduce sparsely faceted arrays (SFAs), a new low-level mechanism for naming regions of memory, or facets, on different processors in a distributed, shared memory parallel processing system. Sparsely faceted arrays address the disconnect between the global distributed arrays provided by conventional architectures (e.g. the Cray T3 series), and the requirements of high-level parallel programming methods that wish to use objects that are distributed over only a subset of processing elements. A sparsely faceted array names a virtual globally-distributed array, but actual facets are lazily allocated. By providing simple semantics and making efficient use of memory, SFAs enable efficient implementation of a variety of non-uniformly distributed data structures and related algorithms. I present example applications which use SFAs, and describe and evaluate simple hardware mechanisms for implementing SFAs. Keeping track of which nodes have allocated facets for a particular SFA is an important task that suggests the need for automatic memory management, including garbage collection. To address this need, I first argue that conventional tracing techniques such as mark/sweep and copying GC are inherently unscalable in parallel systems. I then present a parallel memory-management strategy, based on reference-counting, that is capable of garbage collecting sparsely faceted arrays. I also discuss opportunities for hardware support of this garbage collection strategy. I have implemented a high-level hardware/OS simulator featuring hardware support for sparsely faceted arrays and automatic garbage collection. I describe the simulator and outline a few of the numerous details associated with a "real" implementation of SFAs and SFA-aware garbage collection. Simulation results are used throughout this thesis in the evaluation of hardware support mechanisms.
Resumo:
Euterpe is a real-time computer system for the modeling of musical structures. It provides a formalism wherein familiar concepts of musical analysis may be readily expressed. This is verified by its application to the analysis of a wide variety of conventional forms of music: Gregorian chant, Mediaeval polyphony, Back counterpoint, and sonata form. It may be of further assistance in the real-time experiments in various techniques of thematic development. Finally, the system is endowed with sound-synthesis apparatus with which the user may prepare tapes for musical performances.