16 resultados para memory access complexity
Resumo:
Non-Volatile Memory (NVM) technology holds promise to replace SRAM and DRAM at various levels of the memory hierarchy. The interest in NVM is motivated by the difficulty faced in scaling DRAM beyond 22 nm and, long-term, lower cost per bit. While offering higher density and negligible static power (leakage and refresh), NVM suffers increased latency and energy per memory access. This paper develops energy and performance models of memory systems and applies them to understand the energy-efficiency of replacing or complementing DRAM with NVM. Our analysis focusses on the application of NVM in main memory. We demonstrate that NVM such as STT-RAM and RRAM is energy-efficient for memory sizes commonly employed in servers and high-end workstations, but PCM is not. Furthermore, the model is well suited to quickly evaluate the impact of changes to the model parameters, which may be achieved through optimization of the memory architecture, and to determine the key parameters that impact system-level energy and performance.
Resumo:
This paper presents a scalable, statistical ‘black-box’ model for predicting the performance of parallel programs on multi-core non-uniform memory access (NUMA) systems. We derive a model with low overhead, by reducing data collection and model training time. The model can accurately predict the behaviour of parallel applications in response to changes in their concurrency, thread layout on NUMA nodes, and core voltage and frequency. We present a framework that applies the model to achieve significant energy and energy-delay-square (ED2) savings (9% and 25%, respectively) along with performance improvement (10% mean) on an actual 16-core NUMA system running realistic application workloads. Our prediction model proves substantially more accurate than previous efforts.
Resumo:
This paper introduces hybrid address spaces as a fundamental design methodology for implementing scalable runtime systems on many-core architectures without hardware support for cache coherence. We use hybrid address spaces for an implementation of MapReduce, a programming model for large-scale data processing, and the implementation of a remote memory access (RMA) model. Both implementations are available on the Intel SCC and are portable to similar architectures. We present the design and implementation of HyMR, a MapReduce runtime system whereby different stages and the synchronization operations between them alternate between a distributed memory address space and a shared memory address space, to improve performance and scalability. We compare HyMR to a reference implementation and we find that HyMR improves performance by a factor of 1.71× over a set of representative MapReduce benchmarks. We also compare HyMR with Phoenix++, a state-of-art implementation for systems with hardware-managed cache coherence in terms of scalability and sustained to peak data processing bandwidth, where HyMR demon- strates improvements of a factor of 3.1× and 3.2× respectively. We further evaluate our hybrid remote memory access (HyRMA) programming model and assess its performance to be superior of that of message passing.
Resumo:
We present TProf, an energy profiling tool for OpenMP-like task-parallel programs. To compute the energy consumed by each task in a parallel application, TProf dynamically traces the parallel execution and uses a novel technique to estimate the per-task energy consumption. To achieve this estimation, TProf apportions the total processor energy among cores and overcomes the limitation of current works which would otherwise make parallel accounting impossible to achieve. We demonstrate the value of TProf by characterizing a set of task parallel programs, where we find that data locality, memory access patterns and task working sets are responsible for significant variance in energy consumption between seemingly homogeneous tasks. In addition, we identify opportunities for fine-grain energy optimization by applying per-task Dynamic Voltage and Frequency Scaling (DVFS).
Resumo:
The area and power consumption of low-density parity check (LDPC) decoders are typically dominated by embedded memories. To alleviate such high memory costs, this paper exploits the fact that all internal memories of a LDPC decoder are frequently updated with new data. These unique memory access statistics are taken advantage of by replacing all static standard-cell based memories (SCMs) of a prior-art LDPC decoder implementation by dynamic SCMs (D-SCMs), which are designed to retain data just long enough to guarantee reliable operation. The use of D-SCMs leads to a 44% reduction in silicon area of the LDPC decoder compared to the use of static SCMs. The low-power LDPC decoder architecture with refresh-free D-SCMs was implemented in a 90nm CMOS process, and silicon measurements show full functionality and an information bit throughput of up to 600 Mbps (as required by the IEEE 802.11n standard).
On the complexity of solving polytree-shaped limited memory influence diagrams with binary variables
Resumo:
Influence diagrams are intuitive and concise representations of structured decision problems. When the problem is non-Markovian, an optimal strategy can be exponentially large in the size of the diagram. We can avoid the inherent intractability by constraining the size of admissible strategies, giving rise to limited memory influence diagrams. A valuable question is then how small do strategies need to be to enable efficient optimal planning. Arguably, the smallest strategies one can conceive simply prescribe an action for each time step, without considering past decisions or observations. Previous work has shown that finding such optimal strategies even for polytree-shaped diagrams with ternary variables and a single value node is NP-hard, but the case of binary variables was left open. In this paper we address such a case, by first noting that optimal strategies can be obtained in polynomial time for polytree-shaped diagrams with binary variables and a single value node. We then show that the same problem is NP-hard if the diagram has multiple value nodes. These two results close the fixed-parameter complexity analysis of optimal strategy selection in influence diagrams parametrized by the shape of the diagram, the number of value nodes and the maximum variable cardinality.
Resumo:
There are several factors which make the investigation and understanding of nanoscale ferroelectrics particularly timely and important. Firstly, there is a market pressure, primarily from the electronics industry, to integrate ferroelectrics into devices with progressive decreases in size and increases in morphological complexity. This is perhaps best illustrated through the roadmaps for product development in FeRAM (Ferroelectric Randorn Access Memory) where the need for increases in bit density will require a move from 2D planar capacitor structures to 3D trenched capacitors in the next few years. Secondly, there is opportunity for novel exploration, as it is only relatively recently that developments in thin film growth of complex oxides, self-assembly techniques and high-resolution 'top-down' patterning have converged to allow the fabrication of isolated and well-defined ferroelectric nanoshapes, the properties of which are not known. Thirdly, there is an expectation that the behaviour of small scale ferroelectrics will be different from bulk, as this group of functional materials is highly sensitive to boundary/surface conditions, which are expected to dominate the overall response when sizes are reduced into the nanoscale regime. This feature article attempts to introduce some of the current areas of discovery and debate surrounding studies on ferroelectrics at the nanoscale. The focus is directed primarily at the search for novel size-related properties and behaviour which are not necessarily observed in bulk.
Resumo:
Between 2006 and 2007, the Prisons Memory Archive (PMA) filmed participants, including former prisoners, prison staff, teachers, chaplains, visitors, solicitors and welfare workers back inside the Maze/Long Kesh Prison and Armagh Gaol. They shared the memory of the time spent in these prisons during the period of political violence from 1970 - 2000 in Northern Ireland, commonly known as the Troubles. Underpinning the overall methodology is co-ownership of the material, which gives participants the right to veto as well as to participate in the processes of editing and exhibiting their stories, so prioritising the value of co-authorship of their stories. The PMA adopted life-story interviewing techniques with the empty sites stimulating participants’ memory while they walked and talked their way around the empty sites. A third feature is inclusivity: the archive holds stories from across the full spectrum of the prison experience. A selection of the material, with accompanying context and links is available online www.prisonsmemoryarchive.com
Further Information:
The protocols of inclusivity, co-ownership and life-story telling make this collection significant as an initiative that engages with contemporary problems of how to negotiate narratives about a conflicted past in a society emerging out of violence. Inclusivity means that prison staff, prisoners, governors, chaplains, tutors and visitors have participated, relating their individual and collective experiences, which sit side by side on the PMA website. Co-ownership addresses the issues of ethics and sensitivity, allowing key constituencies to be involved. Life-story telling, based on oral history methodologies allows participants to be the authors of their own stories, crucial when dealing with sensitive issues from a violent past. The website hosts a selection of excerpts, e.g. the Armagh Stories page shows excerpts from 15 participants, while the Maze and Long Kesh Prison page offers interactive access to 24 participants from that prison. Using an interactive documentary structure, the site offers users opportunities to navigate their own way through the material and encourages them to hear and see the ‘other’, central to attempts at encouraging dialogue in a divided society. Further, public discussions have been held after screening of excerpts with community groups in the following locations - Belfast, Newtownabbey, Derry, Armagh, Enniskillen, London, Cork, Maynooth, Clones, and Monaghan. Extracts have been screened at international academic conferences in Valencia, Australia, Tartu, Estonia, Prague, and York. A dataset of the content, with description and links, is available for REF purposes.
Resumo:
We propose a novel admission control policy for database queries. Our methodology uses system measurements of CPU utilization and query backlogs to determine interference between queries in execution on the same database server. Query interference may arise due to the concurrent access of hardware and software resources and can affect performance in positive and negative ways. Specifically our admission control considers the mix of jobs in service and prioritizes the query classes consuming CPU resources more efficiently. The policy ignores I/O subsystems and is therefore highly appropriate for in-memory databases. We validate our approach in trace-driven simulation and show performance increases of query slowdowns and throughputs compared to first-come first-served and shortest expected processing time first scheduling. Simulation experiments are parameterized from system traces of a SAP HANA in-memory database installation with TPC-H type workloads. © 2012 IEEE.
Resumo:
On multiprocessors with explicitly managed memory hierarchies (EMM), software has the responsibility of moving data in and out of fast local memories. This task can be complex and error-prone even for expert programmers. Before we can allow compilers to handle the complexity for us, we must identify the abstractions that are general enough to allow us to write applications with reasonable effort, yet speci?c enough to exploit the vast on-chip memory bandwidth of EMM multi-processors. To this end, we compare two programming models against hand-tuned codes on the STI Cell, paying attention to programmability and performance. The ?rst programming model, Sequoia, abstracts the memory hierarchy as private address spaces, each corresponding to a parallel task. The second, Cellgen, is a new framework which provides OpenMP-like semantics and the abstraction of a shared address spaces divided into private and shared data. We compare three applications programmed using these models against their hand-optimized counterparts in terms of abstractions, programming complexity, and performance.
Resumo:
This paper evaluates the viability of user-level software management of a hybrid DRAM/NVM main memory system. We propose an operating system (OS) and programming interface to place data from within the user application. We present a profiling tool to help programmers decide on the placement of application data in hybrid memory systems. Cycle-accurate simulation of modified applications confirms that our approach is more energy-efficient than state-of-the- art hardware or OS approaches at equivalent performance. Moreover, our results are validated on several candidate NVM technologies and a wide set of 14 benchmarks.
The key observation behind this work is that, for the work- loads we evaluated, application objects are too short-lived to motivate migration. Utilizing this property significantly reduces the hardware complexity of hybrid memory systems.
Resumo:
The worsening of process variations and the consequent increased spreads in circuit performance and consumed power hinder the satisfaction of the targeted budgets and lead to yield loss. Corner based design and adoption of design guardbands might limit the yield loss. However, in many cases such methods may not be able to capture the real effects which might be way better than the predicted ones leading to increasingly pessimistic designs. The situation is even more severe in memories which consist of substantially different individual building blocks, further complicating the accurate analysis of the impact of variations at the architecture level leaving many potential issues uncovered and opportunities unexploited. In this paper, we develop a framework for capturing non-trivial statistical interactions among all the components of a memory/cache. The developed tool is able to find the optimum memory/cache configuration under various constraints allowing the designers to make the right choices early in the design cycle and consequently improve performance, energy, and especially yield. Our, results indicate that the consideration of the architectural interactions between the memory components allow to relax the pessimistic access times that are predicted by existing techniques.
Resumo:
We study multicarrier multiuser multiple-input multiple-output (MU-MIMO) systems, in which the base station employs an asymptotically large number of antennas. We analyze a fully correlated channel matrix and provide a beam domain channel model, where the channel gains are independent of sub-carriers. For this model, we first derive a closed-form upper bound on the achievable ergodic sum-rate, based on which, we develop asymptotically necessary and sufficient conditions for optimal downlink transmission that require only statistical channel state information at the transmitter. Furthermore, we propose a beam division multiple access (BDMA) transmission scheme that simultaneously serves multiple users via different beams. By selecting users within non-overlapping beams, the MU-MIMO channels can be equivalently decomposed into multiple single-user MIMO channels; this scheme significantly reduces the overhead of channel estimation, as well as, the processing complexity at transceivers. For BDMA transmission, we work out an optimal pilot design criterion to minimize the mean square error (MSE) and provide optimal pilot sequences by utilizing the Zadoff-Chu sequences. Simulations demonstrate the near-optimal performance of BDMA transmission and the advantages of the proposed pilot sequences.
Resumo:
This paper investigates processes and actions of diversifying memories of division in Northern Ireland’s political conflict known as the Troubles. Societal division is manifested in its built fabric and territories that have been adopted by predominant discourses of a fragmented society in Belfast; the unionist east and the nationalist west. The aim of the paper is to explore current approaches in planning contested spaces that have changed over time, leading to success in many cases. The argument is that divided cities, like Belfast, feature spatial images and memories of division that range from physical, clear-cut segregation to manifested actions of violence and have become influential representations in the community’s associative memory. While promoting notions of ‘re-imaging’ by current councils demonstrates a total erasure of the Troubles through cleansing its local collective memory, there yet remains an attempt to communicate a different tale of the city’s socio-economic past, to elaborate its supremacy for shaping future lived memories. Yet, planning Belfast’s contested areas is still suffering from a poor understanding of the context and its complexity against overambitious visions.