66 resultados para supercomputing


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Large instruction windows and issue queues are key to exploiting greater instruction level parallelism in out-of-order superscalar processors. However, the cycle time and energy consumption of conventional large monolithic issue queues are high. Previous efforts to reduce cycle time segment the issue queue and pipeline wakeup. Unfortunately, this results in significant IPC loss. Other proposals which address energy efficiency issues by avoiding only the unnecessary tag-comparisons do not reduce broadcasts. These schemes also increase the issue latency.To address both these issues comprehensively, we propose the Scalable Lowpower Issue Queue (SLIQ). SLIQ augments a pipelined issue queue with direct indexing to mitigate the problem of delayed wakeups while reducing the cycle time. Also, the SLIQ design naturally leads to significant energy savings by reducing both the number of tag broadcasts and comparisons required.A 2 segment SLIQ incurs an average IPC loss of 0.2% over the entire SPEC CPU2000 suite, while achieving a 25.2% reduction in issue latency when compared to a monolithic 128-entry issue queue for an 8-wide superscalar processor. An 8 segment SLIQ improves scalability by reducing the issue latency by 38.3% while incurring an IPC loss of only 2.3%. Further, the 8 segment SLIQ significantly reduces the energy consumption and energy-delay product by 48.3% and 67.4% respectively on average.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The twin demands of energy-efficiency and higher performance on DRAM are highly emphasized in multicore architectures. A variety of schemes have been proposed to address either the latency or the energy consumption of DRAMs. These schemes typically require non-trivial hardware changes and end up improving latency at the cost of energy or vice-versa. One specific DRAM performance problem in multicores is that interleaved accesses from different cores can potentially degrade row-buffer locality. In this paper, based on the temporal and spatial locality characteristics of memory accesses, we propose a reorganization of the existing single large row-buffer in a DRAM bank into multiple sub-row buffers (MSRB). This re-organization not only improves row hit rates, and hence the average memory latency, but also brings down the energy consumed by the DRAM. The first major contribution of this work is proposing such a reorganization without requiring any significant changes to the existing widely accepted DRAM specifications. Our proposed reorganization improves weighted speedup by 35.8%, 14.5% and 21.6% in quad, eight and sixteen core workloads along with a 42%, 28% and 31% reduction in DRAM energy. The proposed MSRB organization enables opportunities for the management of multiple row-buffers at the memory controller level. As the memory controller is aware of the behaviour of individual cores it allows us to implement coordinated buffer allocation schemes for different cores that take into account program behaviour. We demonstrate two such schemes, namely Fairness Oriented Allocation and Performance Oriented Allocation, which show the flexibility that memory controllers can now exploit in our MSRB organization to improve overall performance and/or fairness. Further, the MSRB organization enables additional opportunities for DRAM intra-bank parallelism and selective early precharging of the LRU row-buffer to further improve memory access latencies. These two optimizations together provide an additional 5.9% performance improvement.

Relevância:

10.00% 10.00%

Publicador:

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The defect formation energies of transition metals (Cr, Fe, and Ni) doped in the pseudo-H passivated ZnO nanowires and bulk are systematically investigated using first-principles methods. The general chemical trends of the nanowires are similar to those of the bulk. We also show that the formation energy increases as the diameter of the nanowire decreases, indicating that the doping of magnetic ions in the ZnO nanowire becomes more difficult with decreasing diameter. We also systematically calculate the ferromagnetic properties of transition metals doped in the ZnO nanowire and bulk, and find that Cr ions of the nanowire favor ferromagnetic state, which is consistent with the experimental results. We also find that the ferromagnetic coupling state of Cr is more stable in the nanowire than in the bulk, which may lead to a higher T (c) useful for the nano-materials design of spintronics.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Pen-based user interface (PUI) has drawn significant interest, owing to its intuitiveness and convenience. While much of the research focuses on the technology, the usability of a PUI has been relatively low since human factors have not been considered sufficiently. Scenario-centric designs are ideal ways to improve usability. However, such designs possess some problems in practical use. To cope with these design issues, the concept of “interface scenarios” is proposed in to facilitate the interface design, and to help users understand the interaction process in such designs. The proposed scenario-focused development method for PUI is coupled with a practical application to show its effectiveness and usability.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

To consider the energy-aware scheduling problem in computer-controlled systems is necessary to improve the control performance, to use the limited computing resource sufficiently, and to reduce the energy consumption to extend the lifetime of the whole system. In this paper, the scheduling problem of multiple control tasks is discussed based on an adjustable voltage processor. A feedback fuzzy-DVS (dynamic voltage scaling) scheduling architecture is presented by applying technologies of the feedback control and the fuzzy DVS. The simulation results show that, by using the actual utilization as the feedback information to adjust the supply voltage of processor dynamically, the high CPU utilization can be implemented under the precondition of guaranteeing the control performance, whilst the low energy consumption can be achieved as well. The proposed method can be applied to the design in computer-controlled systems based on an adjustable voltage processor.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

For communication-intensive parallel applications, the maximum degree of concurrency achievable is limited by the communication throughput made available by the network. In previous work [HPS94], we showed experimentally that the performance of certain parallel applications running on a workstation network can be improved significantly if a congestion control protocol is used to enhance network performance. In this paper, we characterize and analyze the communication requirements of a large class of supercomputing applications that fall under the category of fixed-point problems, amenable to solution by parallel iterative methods. This results in a set of interface and architectural features sufficient for the efficient implementation of the applications over a large-scale distributed system. In particular, we propose a direct link between the application and network layer, supporting congestion control actions at both ends. This in turn enhances the system's responsiveness to network congestion, improving performance. Measurements are given showing the efficacy of our scheme to support large-scale parallel computations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Task-based dataflow programming models and runtimes emerge as promising candidates for programming multicore and manycore architectures. These programming models analyze dynamically task dependencies at runtime and schedule independent tasks concurrently to the processing elements. In such models, cache locality, which is critical for performance, becomes more challenging in the presence of fine-grain tasks, and in architectures with many simple cores.

This paper presents a combined hardware-software approach to improve cache locality and offer better performance is terms of execution time and energy in the memory system. We propose the explicit bulk prefetcher (EBP) and epoch-based cache management (ECM) to help runtimes prefetch task data and guide the replacement decisions in caches. The runtimem software can use this hardware support to expose its internal knowledge about the tasks to the architecture and achieve more efficient task-based execution. Our combined scheme outperforms HW-only prefetchers and state-of-the-art replacement policies, improves performance by an average of 17%, generates on average 26% fewer L2 misses, and consumes on average 28% less energy in the components of the memory system.