989 resultados para Main Memory
Resumo:
Searching in a dataset for elements that are similar to a given query element is a core problem in applications that manage complex data, and has been aided by metric access methods (MAMs). A growing number of applications require indices that must be built faster and repeatedly, also providing faster response for similarity queries. The increase in the main memory capacity and its lowering costs also motivate using memory-based MAMs. In this paper. we propose the Onion-tree, a new and robust dynamic memory-based MAM that slices the metric space into disjoint subspaces to provide quick indexing of complex data. It introduces three major characteristics: (i) a partitioning method that controls the number of disjoint subspaces generated at each node; (ii) a replacement technique that can change the leaf node pivots in insertion operations; and (iii) range and k-NN extended query algorithms to support the new partitioning method, including a new visit order of the subspaces in k-NN queries. Performance tests with both real-world and synthetic datasets showed that the Onion-tree is very compact. Comparisons of the Onion-tree with the MM-tree and a memory-based version of the Slim-tree showed that the Onion-tree was always faster to build the index. The experiments also showed that the Onion-tree significantly improved range and k-NN query processing performance and was the most efficient MAM, followed by the MM-tree, which in turn outperformed the Slim-tree in almost all the tests. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
As the performance gap between microprocessors and memory continues to increase, main memory accesses result in long latencies which become a factor limiting system performance. Previous studies show that main memory access streams contain significant localities and SDRAM devices provide parallelism through multiple banks and channels. These locality and parallelism have not been exploited thoroughly by conventional memory controllers. In this thesis, SDRAM address mapping techniques and memory access reordering mechanisms are studied and applied to memory controller design with the goal of reducing observed main memory access latency. The proposed bit-reversal address mapping attempts to distribute main memory accesses evenly in the SDRAM address space to enable bank parallelism. As memory accesses to unique banks are interleaved, the access latencies are partially hidden and therefore reduced. With the consideration of cache conflict misses, bit-reversal address mapping is able to direct potential row conflicts to different banks, further improving the performance. The proposed burst scheduling is a novel access reordering mechanism, which creates bursts by clustering accesses directed to the same rows of the same banks. Subjected to a threshold, reads are allowed to preempt writes and qualified writes are piggybacked at the end of the bursts. A sophisticated access scheduler selects accesses based on priorities and interleaves accesses to maximize the SDRAM data bus utilization. Consequentially burst scheduling reduces row conflict rate, increasing and exploiting the available row locality. Using a revised SimpleScalar and M5 simulator, both techniques are evaluated and compared with existing academic and industrial solutions. With SPEC CPU2000 benchmarks, bit-reversal reduces the execution time by 14% on average over traditional page interleaving address mapping. Burst scheduling also achieves a 15% reduction in execution time over conventional bank in order scheduling. Working constructively together, bit-reversal and burst scheduling successfully achieve a 19% speedup across simulated benchmarks.
Resumo:
The current industry trend is towards using Commercially available Off-The-Shelf (COTS) based multicores for developing real time embedded systems, as opposed to the usage of custom-made hardware. In typical implementation of such COTS-based multicores, multiple cores access the main memory via a shared bus. This often leads to contention on this shared channel, which results in an increase of the response time of the tasks. Analyzing this increased response time, considering the contention on the shared bus, is challenging on COTS-based systems mainly because bus arbitration protocols are often undocumented and the exact instants at which the shared bus is accessed by tasks are not explicitly controlled by the operating system scheduler; they are instead a result of cache misses. This paper makes three contributions towards analyzing tasks scheduled on COTS-based multicores. Firstly, we describe a method to model the memory access patterns of a task. Secondly, we apply this model to analyze the worst case response time for a set of tasks. Although the required parameters to obtain the request profile can be obtained by static analysis, we provide an alternative method to experimentally obtain them by using performance monitoring counters (PMCs). We also compare our work against an existing approach and show that our approach outperforms it by providing tighter upper-bound on the number of bus requests generated by a task.
Resumo:
Software transactional memory is a promising programming model that adapts many concepts borrowed from the databases world to control concurrent accesses to main memory (RAM) locations. This paper discusses how to support apparently irreversible operations, such as memory allocation and deallocation, within software libraries that will be used in (software memory) transactional contexts, and propose a generic and elegant approach based on a handler system, which provide the means to create and execute compensation actions at key moments during the life-time of a transaction.
Resumo:
Master’s Thesis in Computer Engineering
Resumo:
The last decade has witnessed a major shift towards the deployment of embedded applications on multi-core platforms. However, real-time applications have not been able to fully benefit from this transition, as the computational gains offered by multi-cores are often offset by performance degradation due to shared resources, such as main memory. To efficiently use multi-core platforms for real-time systems, it is hence essential to tightly bound the interference when accessing shared resources. Although there has been much recent work in this area, a remaining key problem is to address the diversity of memory arbiters in the analysis to make it applicable to a wide range of systems. This work handles diverse arbiters by proposing a general framework to compute the maximum interference caused by the shared memory bus and its impact on the execution time of the tasks running on the cores, considering different bus arbiters. Our novel approach clearly demarcates the arbiter-dependent and independent stages in the analysis of these upper bounds. The arbiter-dependent phase takes the arbiter and the task memory-traffic pattern as inputs and produces a model of the availability of the bus to a given task. Then, based on the availability of the bus, the arbiter-independent phase determines the worst-case request-release scenario that maximizes the interference experienced by the tasks due to the contention for the bus. We show that the framework addresses the diversity problem by applying it to a memory bus shared by a fixed-priority arbiter, a time-division multiplexing (TDM) arbiter, and an unspecified work-conserving arbiter using applications from the MediaBench test suite. We also experimentally evaluate the quality of the analysis by comparison with a state-of-the-art TDM analysis approach and consistently showing a considerable reduction in maximum interference.
Resumo:
Current computer systems have evolved from featuring only a single processing unit and limited RAM, in the order of kilobytes or few megabytes, to include several multicore processors, o↵ering in the order of several tens of concurrent execution contexts, and have main memory in the order of several tens to hundreds of gigabytes. This allows to keep all data of many applications in the main memory, leading to the development of inmemory databases. Compared to disk-backed databases, in-memory databases (IMDBs) are expected to provide better performance by incurring in less I/O overhead. In this dissertation, we present a scalability study of two general purpose IMDBs on multicore systems. The results show that current general purpose IMDBs do not scale on multicores, due to contention among threads running concurrent transactions. In this work, we explore di↵erent direction to overcome the scalability issues of IMDBs in multicores, while enforcing strong isolation semantics. First, we present a solution that requires no modification to either database systems or to the applications, called MacroDB. MacroDB replicates the database among several engines, using a master-slave replication scheme, where update transactions execute on the master, while read-only transactions execute on slaves. This reduces contention, allowing MacroDB to o↵er scalable performance under read-only workloads, while updateintensive workloads su↵er from performance loss, when compared to the standalone engine. Second, we delve into the database engine and identify the concurrency control mechanism used by the storage sub-component as a scalability bottleneck. We then propose a new locking scheme that allows the removal of such mechanisms from the storage sub-component. This modification o↵ers performance improvement under all workloads, when compared to the standalone engine, while scalability is limited to read-only workloads. Next we addressed the scalability limitations for update-intensive workloads, and propose the reduction of locking granularity from the table level to the attribute level. This further improved performance for intensive and moderate update workloads, at a slight cost for read-only workloads. Scalability is limited to intensive-read and read-only workloads. Finally, we investigate the impact applications have on the performance of database systems, by studying how operation order inside transactions influences the database performance. We then propose a Read before Write (RbW) interaction pattern, under which transaction perform all read operations before executing write operations. The RbW pattern allowed TPC-C to achieve scalable performance on our modified engine for all workloads. Additionally, the RbW pattern allowed our modified engine to achieve scalable performance on multicores, almost up to the total number of cores, while enforcing strong isolation.
Resumo:
The quest for universal memory is driving the rapid development of memories with superior all-round capabilities in non-volatility, high speed, high endurance and low power. The memory subsystem accounts for a significant cost and power budget of a computer system. Current DRAM-based main memory systems are starting to hit the power and cost limit. To resolve this issue the industry is improving existing technologies such as Flash and exploring new ones. Among those new technologies is the Phase Change Memory (PCM), which overcomes some of the shortcomings of the Flash such as durability and scalability. This alternative non-volatile memory technology, which uses resistance contrast in phase-change materials, offers more density relative to DRAM, and can help to increase main memory capacity of future systems while remaining within the cost and power constraints. Chalcogenide materials can suitably be exploited for manufacturing phase-change memory devices. Charge transport in amorphous chalcogenide-GST used for memory devices is modeled using two contributions: hopping of trapped electrons and motion of band electrons in extended states. Crystalline GST exhibits an almost Ohmic I(V) curve. In contrast amorphous GST shows a high resistance at low biases while, above a threshold voltage, a transition takes place from a highly resistive to a conductive state, characterized by a negative differential-resistance behavior. A clear and complete understanding of the threshold behavior of the amorphous phase is fundamental for exploiting such materials in the fabrication of innovative nonvolatile memories. The type of feedback that produces the snapback phenomenon is described as a filamentation in energy that is controlled by electron–electron interactions between trapped electrons and band electrons. The model thus derived is implemented within a state-of-the-art simulator. An analytical version of the model is also derived and is useful for discussing the snapback behavior and the scaling properties of the device.
Resumo:
This paper explores potential for the RAMpage memory hierarchy to use a microkernel with a small memory footprint, in a specialized cache-speed static RAM (tightly-coupled memory, TCM). Dreamy memory is DRAM kept in low-power mode, unless referenced. Simulations show that a small microkernel suits RAMpage well, in that it achieves significantly better speed and energy gains than a standard hierarchy from adding TCM. RAMpage, in its best 128KB L2 case, gained 11% speed using TCM, and reduced energy 14%. Equivalent conventional hierarchy gains were under 1%. While 1MB L2 was significantly faster against lower-energy cases for the smaller L2, the larger SRAM's energy does not justify the speed gain. Using a 128KB L2 cache in a conventional architecture resulted in a best-case overall run time of 2.58s, compared with the best dreamy mode run time (RAMpage without context switches on misses) of 3.34s, a speed penalty of 29%. Energy in the fastest 128KB L2 case was 2.18J vs. 1.50J, a reduction of 31%. The same RAMpage configuration without dreamy mode took 2.83s as simulated, and used 2.39J, an acceptable trade-off (penalty under 10%) for being able to switch easily to a lower-energy mode.
Resumo:
The use of multicores is becoming widespread inthe field of embedded systems, many of which have real-time requirements. Hence, ensuring that real-time applications meet their timing constraints is a pre-requisite before deploying them on these systems. This necessitates the consideration of the impact of the contention due to shared lowlevel hardware resources like the front-side bus (FSB) on the Worst-CaseExecution Time (WCET) of the tasks. Towards this aim, this paper proposes a method to determine an upper bound on the number of bus requests that tasks executing on a core can generate in a given time interval. We show that our method yields tighter upper bounds in comparison with the state of-the-art. We then apply our method to compute the extra contention delay incurred by tasks, when they are co-scheduled on different cores and access the shared main memory, using a shared bus, access to which is granted using a round-robin arbitration (RR) protocol.
Resumo:
Work in Progress Session, 21st IEEE Real-Time and Embedded Techonology and Applications Symposium (RTAS 2015). 13 to 16, Apr, 2015, pp 27-28. Seattle, U.S.A..
Resumo:
Dissertação de mestrado integrado em Engenharia Eletrónica Industrial e Computadores
Resumo:
Multiresolution Triangular Mesh (MTM) models are widely used to improve the performance of large terrain visualization by replacing the original model with a simplified one. MTM models, which consist of both original and simplified data, are commonly stored in spatial database systems due to their size. The relatively slow access speed of disks makes data retrieval the bottleneck of such terrain visualization systems. Existing spatial access methods proposed to address this problem rely on main-memory MTM models, which leads to significant overhead during query processing. In this paper, we approach the problem from a new perspective and propose a novel MTM called direct mesh that is designed specifically for secondary storage. It supports available indexing methods natively and requires no modification to MTM structure. Experiment results, which are based on two real-world data sets, show an average performance improvement of 5-10 times over the existing methods.
Resumo:
Passive avoidance learning is with advantage studied in day-old chicks trained to distinguish between beads of two different colors, of which one at training was associated with aversive taste. During the first 30-min post-training, two periods of glutamate release occur in the forebrain. One period is immediately after the aversive experience, when glutamate release is confined to the left hemisphere. A second release, 30 min later, may be bilateral, perhaps with preponderance of the right hemisphere. The present study showed increased pool sizes of glutamate and glutamine, specifically in the left hemisphere, at the time when the first glutamate release occurs, indicating de novo synthesis of glutamate/glutamine from glucose or glycogen, which are the only possible substrates. Behavioral evidence that memory is extinguished by intracranial administration at this time of iodoacetate, an inhibitor of glycolysis and glycogenolysis, and that the extinction of memory is counteracted by injection of glutamine, supports this concept. A decrease in forebrain glycogen of similar magnitude and coinciding with the increase in glutamate and glutamine suggests that glycogen rather than glucose is the main source of newly synthesized glutamate/glutamine. The second activation of glutamatergic activity 30 min after training, when memory is consolidated into stable, long-term memory, is associated with a bilateral increase in pool size of glutamate/glutamine. No glycogenolysis was observed at this time, but again there is a temporal correlation with sensitivity to inhibition by iodoacetate and rescue by glutamine, indicating the importance of de novo synthesis of glutamate/glutamine from glucose or glycogen. (C) 2003 Elsevier B.V All rights reserved.
Resumo:
The main idea of the article is to consider the interdependence between Politics of Memory (as a type of narrating the Past) and Stereotyping. The author suggests that, in a time of information revolution, we are still constructing images of others on the basis of simplification, overestimation of association between features, and illusory correlations, instead of basing them on knowledge and personal contact. The Politics of Memory, national remembrance, and the historical consciousness play a significant role in these processes, because – as the author argues – they transform historically based 'symbolic analogies' into 'illusory correlations' between national identity and the behavior of its members. To support his theoretical investigation, the author presents results of his draft experiment and two case studies: (a) a social construction of images of neighbors based on Polish narrations about the Past; and (b) various processes of stereotyping based on the Remembrance of the Holocaust. All these considerations lead him to state that the Politics of Memory should be recognized as an influential source of commonly shared stereotypes on other cultures and nations.