4 resultados para Numerical power performance
em Massachusetts Institute of Technology
Resumo:
Caches are known to consume up to half of all system power in embedded processors. Co-optimizing performance and power of the cache subsystems is therefore an important step in the design of embedded systems, especially those employing application specific instruction processors. In this project, we propose an analytical cache model that succinctly captures the miss performance of an application over the entire cache parameter space. Unlike exhaustive trace driven simulation, our model requires that the program be simulated once so that a few key characteristics can be obtained. Using these application-dependent characteristics, the model can span the entire cache parameter space consisting of cache sizes, associativity and cache block sizes. In our unified model, we are able to cater for direct-mapped, set and fully associative instruction, data and unified caches. Validation against full trace-driven simulations shows that our model has a high degree of fidelity. Finally, we show how the model can be coupled with a power model for caches such that one can very quickly decide on pareto-optimal performance-power design points for rapid design space exploration.
Resumo:
The Scheme86 and the HP Precision Architectures represent different trends in computer processor design. The former uses wide micro-instructions, parallel hardware, and a low latency memory interface. The latter encourages pipelined implementation and visible interlocks. To compare the merits of these approaches, algorithms frequently encountered in numerical and symbolic computation were hand-coded for each architecture. Timings were done in simulators and the results were evaluated to determine the speed of each design. Based on these measurements, conclusions were drawn as to which aspects of each architecture are suitable for a high- performance computer.
Resumo:
This work demonstrates how partial evaluation can be put to practical use in the domain of high-performance numerical computation. I have developed a technique for performing partial evaluation by using placeholders to propagate intermediate results. For an important class of numerical programs, a compiler based on this technique improves performance by an order of magnitude over conventional compilation techniques. I show that by eliminating inherently sequential data-structure references, partial evaluation exposes the low-level parallelism inherent in a computation. I have implemented several parallel scheduling and analysis programs that study the tradeoffs involved in the design of an architecture that can effectively utilize this parallelism. I present these results using the 9- body gravitational attraction problem as an example.
Resumo:
This work presents detailed numerical calculations of the dielectrophoretic force in octupolar traps designed for single-cell trapping. A trap with eight planar electrodes is studied for spherical and ellipsoidal particles using an indirect implementation of the boundary element method (BEM). Multipolar approximations of orders one to three are compared with the full Maxwell stress tensor (MST) calculation of the electrical force on spherical particles. Ellipsoidal particles are also studied, but in their case only the dipolar approximation is available for comparison with the MST solution. The results show that the full MST calculation is only required in the study of non-spherical particles.