9 resultados para Processor architecture

em Massachusetts Institute of Technology


Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis describes Optimist, an optimizing compiler for the Concurrent Smalltalk language developed by the Concurrent VLSI Architecture Group. Optimist compiles Concurrent Smalltalk to the assembly language of the Message-Driven Processor (MDP). The compiler includes numerous optimization techniques such as dead code elimination, dataflow analysis, constant folding, move elimination, concurrency analysis, duplicate code merging, tail forwarding, use of register variables, as well as various MDP-specific optimizations in the code generator. The MDP presents some unique challenges and opportunities for compilation. Due to the MDP's small memory size, it is critical that the size of the generated code be as small as possible. The MDP is an inherently concurrent processor with efficient mechanisms for sending and receiving messages; the compiler takes advantage of these mechanisms. The MDP's tagged architecture allows very efficient support of object-oriented languages such as Concurrent Smalltalk. The initial goals for the MDP were to have the MDP execute about twenty instructions per method and contain 4096 words of memory. This compiler shows that these goals are too optimistic -- most methods are longer, both in terms of code size and running time. Thus, the memory size of the MDP should be increased.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Scheme86 and the HP Precision Architectures represent different trends in computer processor design. The former uses wide micro-instructions, parallel hardware, and a low latency memory interface. The latter encourages pipelined implementation and visible interlocks. To compare the merits of these approaches, algorithms frequently encountered in numerical and symbolic computation were hand-coded for each architecture. Timings were done in simulators and the results were evaluated to determine the speed of each design. Based on these measurements, conclusions were drawn as to which aspects of each architecture are suitable for a high- performance computer.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We consider the often-studied problem of sorting, for a parallel computer. Given an input array distributed evenly over p processors, the task is to compute the sorted output array, also distributed over the p processors. Many existing algorithms take the approach of approximately load-balancing the output, leaving each processor with Θ(n/p) elements. However, in many cases, approximate load-balancing leads to inefficiencies in both the sorting itself and in further uses of the data after sorting. We provide a deterministic parallel sorting algorithm that uses parallel selection to produce any output distribution exactly, particularly one that is perfectly load-balanced. Furthermore, when using a comparison sort, this algorithm is 1-optimal in both computation and communication. We provide an empirical study that illustrates the efficiency of exact data splitting, and shows an improvement over two sample sort algorithms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This report describes a working autonomous mobile robot whose only goal is to collect and return empty soda cans. It operates in an unmodified office environment occupied by moving people. The robot is controlled by a collection of over 40 independent "behaviors'' distributed over a loosely coupled network of 24 processors. Together this ensemble helps the robot locate cans with its laser rangefinder, collect them with its on-board manipulator, and bring them home using a compass and an array of proximity sensors. We discuss the advantages of using such a multi-agent control system and show how to decompose the required tasks into component activities. We also examine the benefits and limitations of spatially local, stateless, and independent computation by the agents.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis defines Pi, a parallel architecture interface that separates model and machine issues, allowing them to be addressed independently. This provides greater flexibility for both the model and machine builder. Pi addresses a set of common parallel model requirements including low latency communication, fast task switching, low cost synchronization, efficient storage management, the ability to exploit locality, and efficient support for sequential code. Since Pi provides generic parallel operations, it can efficiently support many parallel programming models including hybrids of existing models. Pi also forms a basis of comparison for architectural components.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Concurrent Smalltalk is the primary language used for programming the J- Machine, a MIMD message-passing computer containing thousands of 36-bit processors connected by a very low latency network. This thesis describes in detail Concurrent Smalltalk and its implementation on the J-Machine, including the Optimist II global optimizing compiler and Cosmos fine-grain parallel operating system. Quantitative and qualitative results are presented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The furious pace of Moore's Law is driving computer architecture into a realm where the the speed of light is the dominant factor in system latencies. The number of clock cycles to span a chip are increasing, while the number of bits that can be accessed within a clock cycle is decreasing. Hence, it is becoming more difficult to hide latency. One alternative solution is to reduce latency by migrating threads and data, but the overhead of existing implementations has previously made migration an unserviceable solution so far. I present an architecture, implementation, and mechanisms that reduces the overhead of migration to the point where migration is a viable supplement to other latency hiding mechanisms, such as multithreading. The architecture is abstract, and presents programmers with a simple, uniform fine-grained multithreaded parallel programming model with implicit memory management. In other words, the spatial nature and implementation details (such as the number of processors) of a parallel machine are entirely hidden from the programmer. Compiler writers are encouraged to devise programming languages for the machine that guide a programmer to express their ideas in terms of objects, since objects exhibit an inherent physical locality of data and code. The machine implementation can then leverage this locality to automatically distribute data and threads across the physical machine by using a set of high performance migration mechanisms. An implementation of this architecture could migrate a null thread in 66 cycles -- over a factor of 1000 improvement over previous work. Performance also scales well; the time required to move a typical thread is only 4 to 5 times that of a null thread. Data migration performance is similar, and scales linearly with data block size. Since the performance of the migration mechanism is on par with that of an L2 cache, the implementation simulated in my work has no data caches and relies instead on multithreading and the migration mechanism to hide and reduce access latencies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Space Systems, Policy and Architecture Research Consortium (SSPARC) was formed to make substantial progress on problems of national importance. The goals of SSPARC were to: • Provide technologies and methods that will allow the creation of flexible, upgradable space systems, • Create a “clean sheet” approach to space systems architecture determination and design, including the incorporation of risk, uncertainty, and flexibility issues, and • Consider the impact of national space policy on the above. This report covers the last two goals, and demonstrates that the effort was largely successful.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The memory hierarchy is the main bottleneck in modern computer systems as the gap between the speed of the processor and the memory continues to grow larger. The situation in embedded systems is even worse. The memory hierarchy consumes a large amount of chip area and energy, which are precious resources in embedded systems. Moreover, embedded systems have multiple design objectives such as performance, energy consumption, and area, etc. Customizing the memory hierarchy for specific applications is a very important way to take full advantage of limited resources to maximize the performance. However, the traditional custom memory hierarchy design methodologies are phase-ordered. They separate the application optimization from the memory hierarchy architecture design, which tend to result in local-optimal solutions. In traditional Hardware-Software co-design methodologies, much of the work has focused on utilizing reconfigurable logic to partition the computation. However, utilizing reconfigurable logic to perform the memory hierarchy design is seldom addressed. In this paper, we propose a new framework for designing memory hierarchy for embedded systems. The framework will take advantage of the flexible reconfigurable logic to customize the memory hierarchy for specific applications. It combines the application optimization and memory hierarchy design together to obtain a global-optimal solution. Using the framework, we performed a case study to design a new software-controlled instruction memory that showed promising potential.