994 resultados para Multiprocessor computer architectures


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thesis (M. S.)--University of Illinois at Urbana-Champaign.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

"This work was submitted in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering, June 1967, and was supported in part by the AEC under Contract No. USAEC AT(11-1)1018.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Simultaneous consideration of both performance and reliability issues is important in the choice of computer architectures for real-time aerospace applications. One of the requirements for such a fault-tolerant computer system is the characteristic of graceful degradation. A shared and replicated resources computing system represents such an architecture. In this paper, a combinatorial model is used for the evaluation of the instruction execution rate of a degradable, replicated resources computing system such as a modular multiprocessor system. Next, a method is presented to evaluate the computation reliability of such a system utilizing a reliability graph model and the instruction execution rate. Finally, this computation reliability measure, which simultaneously describes both performance and reliability, is applied as a constraint in an architecture optimization model for such computing systems. Index Terms-Architecture optimization, computation

Relevância:

90.00% 90.00%

Publicador:

Resumo:

With the latest advances in the area of advanced computer architectures we are seeing already large scale machines at petascale level and we are discussing exascale computing. All these require efficient scalable algorithms in order to bridge the performance gap. In this paper examples of various approaches of designing scalable algorithms for such advanced architectures will be given and the corresponding properties of these algorithms will be outlined and discussed. Examples will outline such scalable algorithms applied to large scale problems in the area Computational Biology, Environmental Modelling etc. The key properties of such advanced and scalable algorithms will be outlined.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Many novel computer architectures like array and multiprocessors which achieve high performance through the use of concurrency exploit variations of the von Neumann model of computation. The effective utilization of the machines makes special demands on programmers and their programming languages, such as the structuring of data into vectors or the partitioning of programs into concurrent processes. In comparison, the data flow model of computation demands only that the principle of structured programming be followed. A data flow program, often represented as a data flow graph, is a program that expresses a computation by indicating the data dependencies among operators. A data flow computer is a machine designed to take advantage of concurrency in data flow graphs by executing data independent operations in parallel. In this paper, we discuss the design of a high level language (DFL: Data Flow Language) suitable for data flow computers. Some sample procedures in DFL are presented. The implementation aspects have not been discussed in detail since there are no new problems encountered. The language DFL embodies the concepts of functional programming, but in appearance closely resembles Pascal. The language is a better vehicle than the data flow graph for expressing a parallel algorithm. The compiler has been implemented on a DEC 1090 system in Pascal.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Knowledge about program worst case execution time (WCET) is essential in validating real-time systems and helps in effective scheduling. One popular approach used in industry is to measure execution time of program components on the target architecture and combine them using static analysis of the program. Measurements need to be taken in the least intrusive way in order to avoid affecting accuracy of estimated WCET. Several programs exhibit phase behavior, wherein program dynamic execution is observed to be composed of phases. Each phase being distinct from the other, exhibits homogeneous behavior with respect to cycles per instruction (CPI), data cache misses etc. In this paper, we show that phase behavior has important implications on timing analysis. We make use of the homogeneity of a phase to reduce instrumentation overhead at the same time ensuring that accuracy of WCET is not largely affected. We propose a model for estimating WCET using static worst case instruction counts of individual phases and a function of measured average CPI. We describe a WCET analyzer built on this model which targets two different architectures. The WCET analyzer is observed to give safe estimates for most benchmarks considered in this paper. The tightness of the WCET estimates are observed to be improved for most benchmarks compared to Chronos, a well known static WCET analyzer.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Computers and Thought are the two categories that together define Artificial Intelligence as a discipline. It is generally accepted that work in Artificial Intelligence over the last thirty years has had a strong influence on aspects of computer architectures. In this paper we also make the converse claim; that the state of computer architecture has been a strong influence on our models of thought. The Von Neumann model of computation has lead Artificial Intelligence in particular directions. Intelligence in biological systems is completely different. Recent work in behavior-based Artificial Intelligenge has produced new models of intelligence that are much closer in spirit to biological systems. The non-Von Neumann computational models they use share many characteristics with biological computation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Conventional parallel computer architectures do not provide support for non-uniformly distributed objects. In this thesis, I introduce sparsely faceted arrays (SFAs), a new low-level mechanism for naming regions of memory, or facets, on different processors in a distributed, shared memory parallel processing system. Sparsely faceted arrays address the disconnect between the global distributed arrays provided by conventional architectures (e.g. the Cray T3 series), and the requirements of high-level parallel programming methods that wish to use objects that are distributed over only a subset of processing elements. A sparsely faceted array names a virtual globally-distributed array, but actual facets are lazily allocated. By providing simple semantics and making efficient use of memory, SFAs enable efficient implementation of a variety of non-uniformly distributed data structures and related algorithms. I present example applications which use SFAs, and describe and evaluate simple hardware mechanisms for implementing SFAs. Keeping track of which nodes have allocated facets for a particular SFA is an important task that suggests the need for automatic memory management, including garbage collection. To address this need, I first argue that conventional tracing techniques such as mark/sweep and copying GC are inherently unscalable in parallel systems. I then present a parallel memory-management strategy, based on reference-counting, that is capable of garbage collecting sparsely faceted arrays. I also discuss opportunities for hardware support of this garbage collection strategy. I have implemented a high-level hardware/OS simulator featuring hardware support for sparsely faceted arrays and automatic garbage collection. I describe the simulator and outline a few of the numerous details associated with a "real" implementation of SFAs and SFA-aware garbage collection. Simulation results are used throughout this thesis in the evaluation of hardware support mechanisms.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Act2 is a highly concurrent programming language designed to exploit the processing power available from parallel computer architectures. The language supports advanced concepts in software engineering, providing high-level constructs suitable for implementing artificially-intelligent applications. Act2 is based on the Actor model of computation, consisting of virtual computational agents which communicate by message-passing. Act2 serves as a framework in which to integrate an actor language, a description and reasoning system, and a problem-solving and resource management system. This document describes issues in Act2's design and the implementation of an interpreter for the language.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

It is now possible to use powerful general purpose computer architectures to support post-production of both video and multimedia projects. By devising a suitable portable software architecture and using high-speed networking in an appropriate manner, a system has been constructed where editors are no longer tied to a specific location. New types of production, such as multi-threaded interactive video, are supported. Editors may also work remotely where very high speed network connection is not currently provided. An object-oriented database is used for the comprehensive cataloging of material and to support automatic audio/video object migration and replication. Copyright © 1997 by the Society of Motion Picture and Television Engineers, Inc.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The process of developing software that takes advantage of multiple processors is commonly referred to as parallel programming. For various reasons, this process is much harder than the sequential case. For decades, parallel programming has been a problem for a small niche only: engineers working on parallelizing mostly numerical applications in High Performance Computing. This has changed with the advent of multi-core processors in mainstream computer architectures. Parallel programming in our days becomes a problem for a much larger group of developers. The main objective of this thesis was to find ways to make parallel programming easier for them. Different aims were identified in order to reach the objective: research the state of the art of parallel programming today, improve the education of software developers about the topic, and provide programmers with powerful abstractions to make their work easier. To reach these aims, several key steps were taken. To start with, a survey was conducted among parallel programmers to find out about the state of the art. More than 250 people participated, yielding results about the parallel programming systems and languages in use, as well as about common problems with these systems. Furthermore, a study was conducted in university classes on parallel programming. It resulted in a list of frequently made mistakes that were analyzed and used to create a programmers' checklist to avoid them in the future. For programmers' education, an online resource was setup to collect experiences and knowledge in the field of parallel programming - called the Parawiki. Another key step in this direction was the creation of the Thinking Parallel weblog, where more than 50.000 readers to date have read essays on the topic. For the third aim (powerful abstractions), it was decided to concentrate on one parallel programming system: OpenMP. Its ease of use and high level of abstraction were the most important reasons for this decision. Two different research directions were pursued. The first one resulted in a parallel library called AthenaMP. It contains so-called generic components, derived from design patterns for parallel programming. These include functionality to enhance the locks provided by OpenMP, to perform operations on large amounts of data (data-parallel programming), and to enable the implementation of irregular algorithms using task pools. AthenaMP itself serves a triple role: the components are well-documented and can be used directly in programs, it enables developers to study the source code and learn from it, and it is possible for compiler writers to use it as a testing ground for their OpenMP compilers. The second research direction was targeted at changing the OpenMP specification to make the system more powerful. The main contributions here were a proposal to enable thread-cancellation and a proposal to avoid busy waiting. Both were implemented in a research compiler, shown to be useful in example applications, and proposed to the OpenMP Language Committee.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

With the development of high-level languages for new computer architectures comes the need for appropriate debugging tools as well. One method for meeting this need would be to develop, from scratch, a symbolic debugger with the introduction of each new language implementation for any given architecture. This, however, seems to require unnecessary duplication of effort among developers. This paper describes Maygen, a "debugger generation system," designed to efficiently provide the desired language-dependent and architecture-dependent debuggers. A prototype of the Maygen system has been implemented and is able to handle the semantically different languages of C and OPAL.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

These notes have been issued on a small scale in 1983 and 1987 and on request at other times. This issue follows two items of news. First, WaIter Colquitt and Luther Welsh found the 'missed' Mersenne prime M110503 and advanced the frontier of complete Mp-testing to 139,267. In so doing, they terminated Slowinski's significant string of four consecutive Mersenne primes. Secondly, a team of five established a non-Mersenne number as the largest known prime. This result terminated the 1952-89 reign of Mersenne primes. All the original Mersenne numbers with p < 258 were factorised some time ago. The Sandia Laboratories team of Davis, Holdridge & Simmons with some little assistance from a CRAY machine cracked M211 in 1983 and M251 in 1984. They contributed their results to the 'Cunningham Project', care of Sam Wagstaff. That project is now moving apace thanks to developments in technology, factorisation and primality testing. New levels of computer power and new computer architectures motivated by the open-ended promise of parallelism are now available. Once again, the suppliers may be offering free buildings with the computer. However, the Sandia '84 CRAY-l implementation of the quadratic-sieve method is now outpowered by the number-field sieve technique. This is deployed on either purpose-built hardware or large syndicates, even distributed world-wide, of collaborating standard processors. New factorisation techniques of both special and general applicability have been defined and deployed. The elliptic-curve method finds large factors with helpful properties while the number-field sieve approach is breaking down composites with over one hundred digits. The material is updated on an occasional basis to follow the latest developments in primality-testing large Mp and factorising smaller Mp; all dates derive from the published literature or referenced private communications. Minor corrections, additions and changes merely advance the issue number after the decimal point. The reader is invited to report any errors and omissions that have escaped the proof-reading, to answer the unresolved questions noted and to suggest additional material associated with this subject.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

[EN]A new parallel algorithm for simultaneous untangling and smoothing of tetrahedral meshes is proposed in this paper. We provide a detailed analysis of its performance on shared-memory many-core computer architectures. This performance analysis includes the evaluation of execution time, parallel scalability, load balancing, and parallelism bottlenecks. Additionally, we compare the impact of three previously published graph coloring procedures on the performance of our parallel algorithm. We use six benchmark meshes with a wide range of sizes. Using these experimental data sets, we describe the behavior of the parallel algorithm for different data sizes. We demonstrate that this algorithm is highly scalable when it runs on two different high-performance many-core computers with up to 128 processors...