3 resultados para employing
em Massachusetts Institute of Technology
Resumo:
The Transit network provides high-speed, low-latency, fault-tolerant interconnect for high-performance, multiprocessor computers. The basic connection scheme for Transit uses bidelta style, multistage networks to support up to 256 processors. Scaling to larger machines by simply extending the bidelta network topology will result in a uniform degradation of network latency between all processors. By employing a fat-tree network structure in larger systems, the network provides locality and universality properties which can help minimize the impact of scaling on network latency. This report details the topology and construction issues associated with integrating Transit routing technology into fat-tree interconnect topologies.
Resumo:
The dynamic power requirement of CMOS circuits is rapidly becoming a major concern in the design of personal information systems and large computers. In this work we present a number of new CMOS logic families, Charge Recovery Logic (CRL) as well as the much improved Split-Level Charge Recovery Logic (SCRL), within which the transfer of charge between the nodes occurs quasistatically. Operating quasistatically, these logic families have an energy dissipation that drops linearly with operating frequency, i.e., their power consumption drops quadratically with operating frequency as opposed to the linear drop of conventional CMOS. The circuit techniques in these new families rely on constructing an explicitly reversible pipelined logic gate, where the information necessary to recover the energy used to compute a value is provided by computing its logical inverse. Information necessary to uncompute the inverse is available from the subsequent inverse logic stage. We demonstrate the low energy operation of SCRL by presenting the results from the testing of the first fully quasistatic 8 x 8 multiplier chip (SCRL-1) employing SCRL circuit techniques.
Resumo:
Caches are known to consume up to half of all system power in embedded processors. Co-optimizing performance and power of the cache subsystems is therefore an important step in the design of embedded systems, especially those employing application specific instruction processors. In this project, we propose an analytical cache model that succinctly captures the miss performance of an application over the entire cache parameter space. Unlike exhaustive trace driven simulation, our model requires that the program be simulated once so that a few key characteristics can be obtained. Using these application-dependent characteristics, the model can span the entire cache parameter space consisting of cache sizes, associativity and cache block sizes. In our unified model, we are able to cater for direct-mapped, set and fully associative instruction, data and unified caches. Validation against full trace-driven simulations shows that our model has a high degree of fidelity. Finally, we show how the model can be coupled with a power model for caches such that one can very quickly decide on pareto-optimal performance-power design points for rapid design space exploration.