4 resultados para Memory hierarchy design
em Digital Commons - Michigan Tech
Resumo:
As the performance gap between microprocessors and memory continues to increase, main memory accesses result in long latencies which become a factor limiting system performance. Previous studies show that main memory access streams contain significant localities and SDRAM devices provide parallelism through multiple banks and channels. These locality and parallelism have not been exploited thoroughly by conventional memory controllers. In this thesis, SDRAM address mapping techniques and memory access reordering mechanisms are studied and applied to memory controller design with the goal of reducing observed main memory access latency. The proposed bit-reversal address mapping attempts to distribute main memory accesses evenly in the SDRAM address space to enable bank parallelism. As memory accesses to unique banks are interleaved, the access latencies are partially hidden and therefore reduced. With the consideration of cache conflict misses, bit-reversal address mapping is able to direct potential row conflicts to different banks, further improving the performance. The proposed burst scheduling is a novel access reordering mechanism, which creates bursts by clustering accesses directed to the same rows of the same banks. Subjected to a threshold, reads are allowed to preempt writes and qualified writes are piggybacked at the end of the bursts. A sophisticated access scheduler selects accesses based on priorities and interleaves accesses to maximize the SDRAM data bus utilization. Consequentially burst scheduling reduces row conflict rate, increasing and exploiting the available row locality. Using a revised SimpleScalar and M5 simulator, both techniques are evaluated and compared with existing academic and industrial solutions. With SPEC CPU2000 benchmarks, bit-reversal reduces the execution time by 14% on average over traditional page interleaving address mapping. Burst scheduling also achieves a 15% reduction in execution time over conventional bank in order scheduling. Working constructively together, bit-reversal and burst scheduling successfully achieve a 19% speedup across simulated benchmarks.
Resumo:
Though 3D computer graphics has seen tremendous advancement in the past two decades, most available mechanisms for computer interaction in 3D are high cost and targeted for industry and virtual reality applications. Recent advances in Micro-Electro-Mechanical-System (MEMS) devices have brought forth a variety of new low-cost, low-power, miniature sensors with high accuracy, which are well suited for hand-held devices. In this work a novel design for a 3D computer game controller using inertial sensors is proposed, and a prototype device based on this design is implemented. The design incorporates MEMS accelerometers and gyroscopes from Analog Devices to measure the three components of the acceleration and angular velocity. From these sensor readings, the position and orientation of the hand-held compartment can be calculated using numerical methods. The implemented prototype is utilizes a USB 2.0 compliant interface for power and communication with the host system. A Microchip dsPIC microcontroller is used in the design. This microcontroller integrates the analog to digital converters, the program memory flash, as well as the core processor, on a single integrated circuit. A PC running Microsoft Windows operating system is used as the host machine. Prototype firmware for the microcontroller is developed and tested to establish the communication between the design and the host, and perform the data acquisition and initial filtering of the sensor data. A PC front-end application with a graphical interface is developed to communicate with the device, and allow real-time visualization of the acquired data.
Resumo:
Virtualization has become a common abstraction layer in modern data centers. By multiplexing hardware resources into multiple virtual machines (VMs) and thus enabling several operating systems to run on the same physical platform simultaneously, it can effectively reduce power consumption and building size or improve security by isolating VMs. In a virtualized system, memory resource management plays a critical role in achieving high resource utilization and performance. Insufficient memory allocation to a VM will degrade its performance dramatically. On the contrary, over-allocation causes waste of memory resources. Meanwhile, a VM’s memory demand may vary significantly. As a result, effective memory resource management calls for a dynamic memory balancer, which, ideally, can adjust memory allocation in a timely manner for each VM based on their current memory demand and thus achieve the best memory utilization and the optimal overall performance. In order to estimate the memory demand of each VM and to arbitrate possible memory resource contention, a widely proposed approach is to construct an LRU-based miss ratio curve (MRC), which provides not only the current working set size (WSS) but also the correlation between performance and the target memory allocation size. Unfortunately, the cost of constructing an MRC is nontrivial. In this dissertation, we first present a low overhead LRU-based memory demand tracking scheme, which includes three orthogonal optimizations: AVL-based LRU organization, dynamic hot set sizing and intermittent memory tracking. Our evaluation results show that, for the whole SPEC CPU 2006 benchmark suite, after applying the three optimizing techniques, the mean overhead of MRC construction is lowered from 173% to only 2%. Based on current WSS, we then predict its trend in the near future and take different strategies for different prediction results. When there is a sufficient amount of physical memory on the host, it locally balances its memory resource for the VMs. Once the local memory resource is insufficient and the memory pressure is predicted to sustain for a sufficiently long time, a relatively expensive solution, VM live migration, is used to move one or more VMs from the hot host to other host(s). Finally, for transient memory pressure, a remote cache is used to alleviate the temporary performance penalty. Our experimental results show that this design achieves 49% center-wide speedup.
Resumo:
Neuromorphic computing has become an emerging field in wide range of applications. Its challenge lies in developing a brain-inspired architecture that can emulate human brain and can work for real time applications. In this report a flexible neural architecture is presented which consists of 128 X 128 SRAM crossbar memory and 128 spiking neurons. For Neuron, digital integrate and fire model is used. All components are designed in 45nm technology node. The core can be configured for certain Neuron parameters, Axon types and synapses states and are fully digitally implemented. Learning for this architecture is done offline. To train this circuit a well-known algorithm Restricted Boltzmann Machine (RBM) is used and linear classifiers are trained at the output of RBM. Finally, circuit was tested for handwritten digit recognition application. Future prospects for this architecture are also discussed.