4 resultados para memories

em Massachusetts Institute of Technology


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Conventional floating gate non-volatile memories (NVMs) present critical issues for device scalability beyond the sub-90 nm node, such as gate length and tunnel oxide thickness reduction. Nanocrystalline germanium (nc-Ge) quantum dot flash memories are fully CMOS compatible technology based on discrete isolated charge storage nodules which have the potential of pushing further the scalability of conventional NVMs. Quantum dot memories offer lower operating voltages as compared to conventional floating-gate (FG) Flash memories due to thinner tunnel dielectrics which allow higher tunneling probabilities. The isolated charge nodules suppress charge loss through lateral paths, thereby achieving a superior charge retention time. Despite the considerable amount of efforts devoted to the study of nanocrystal Flash memories, the charge storage mechanism remains obscure. Interfacial defects of the nanocrystals seem to play a role in charge storage in recent studies, although storage in the nanocrystal conduction band by quantum confinement has been reported earlier. In this work, a single transistor memory structure with threshold voltage shift, Vth, exceeding ~1.5 V corresponding to interface charge trapping in nc-Ge, operating at 0.96 MV/cm, is presented. The trapping effect is eliminated when nc-Ge is synthesized in forming gas thus excluding the possibility of quantum confinement and Coulomb blockade effects. Through discharging kinetics, the model of deep level trap charge storage is confirmed. The trap energy level is dependent on the matrix which confines the nc-Ge.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

As the number of processors in distributed-memory multiprocessors grows, efficiently supporting a shared-memory programming model becomes difficult. We have designed the Protocol for Hierarchical Directories (PHD) to allow shared-memory support for systems containing massive numbers of processors. PHD eliminates bandwidth problems by using a scalable network, decreases hot-spots by not relying on a single point to distribute blocks, and uses a scalable amount of space for its directories. PHD provides a shared-memory model by synthesizing a global shared memory from the local memories of processors. PHD supports sequentially consistent read, write, and test- and-set operations. This thesis also introduces a method of describing locality for hierarchical protocols and employs this method in the derivation of an abstract model of the protocol behavior. An embedded model, based on the work of Johnson[ISCA19], describes the protocol behavior when mapped to a k-ary n-cube. The thesis uses these two models to study the average height in the hierarchy that operations reach, the longest path messages travel, the number of messages that operations generate, the inter-transaction issue time, and the protocol overhead for different locality parameters, degrees of multithreading, and machine sizes. We determine that multithreading is only useful for approximately two to four threads; any additional interleaving does not decrease the overall latency. For small machines and high locality applications, this limitation is due mainly to the length of the running threads. For large machines with medium to low locality, this limitation is due mainly to the protocol overhead being too large. Our study using the embedded model shows that in situations where the run length between references to shared memory is at least an order of magnitude longer than the time to process a single state transition in the protocol, applications exhibit good performance. If separate controllers for processing protocol requests are included, the protocol scales to 32k processor machines as long as the application exhibits hierarchical locality: at least 22% of the global references must be able to be satisfied locally; at most 35% of the global references are allowed to reach the top level of the hierarchy.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

General-purpose computing devices allow us to (1) customize computation after fabrication and (2) conserve area by reusing expensive active circuitry for different functions in time. We define RP-space, a restricted domain of the general-purpose architectural space focussed on reconfigurable computing architectures. Two dominant features differentiate reconfigurable from special-purpose architectures and account for most of the area overhead associated with RP devices: (1) instructions which tell the device how to behave, and (2) flexible interconnect which supports task dependent dataflow between operations. We can characterize RP-space by the allocation and structure of these resources and compare the efficiencies of architectural points across broad application characteristics. Conventional FPGAs fall at one extreme end of this space and their efficiency ranges over two orders of magnitude across the space of application characteristics. Understanding RP-space and its consequences allows us to pick the best architecture for a task and to search for more robust design points in the space. Our DPGA, a fine- grained computing device which adds small, on-chip instruction memories to FPGAs is one such design point. For typical logic applications and finite- state machines, a DPGA can implement tasks in one-third the area of a traditional FPGA. TSFPGA, a variant of the DPGA which focuses on heavily time-switched interconnect, achieves circuit densities close to the DPGA, while reducing typical physical mapping times from hours to seconds. Rigid, fabrication-time organization of instruction resources significantly narrows the range of efficiency for conventional architectures. To avoid this performance brittleness, we developed MATRIX, the first architecture to defer the binding of instruction resources until run-time, allowing the application to organize resources according to its needs. Our focus MATRIX design point is based on an array of 8-bit ALU and register-file building blocks interconnected via a byte-wide network. With today's silicon, a single chip MATRIX array can deliver over 10 Gop/s (8-bit ops). On sample image processing tasks, we show that MATRIX yields 10-20x the computational density of conventional processors. Understanding the cost structure of RP-space helps us identify these intermediate architectural points and may provide useful insight more broadly in guiding our continual search for robust and efficient general-purpose computing structures.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Most Artificial Intelligence (AI) work can be characterized as either ``high-level'' (e.g., logical, symbolic) or ``low-level'' (e.g., connectionist networks, behavior-based robotics). Each approach suffers from particular drawbacks. High-level AI uses abstractions that often have no relation to the way real, biological brains work. Low-level AI, on the other hand, tends to lack the powerful abstractions that are needed to express complex structures and relationships. I have tried to combine the best features of both approaches, by building a set of programming abstractions defined in terms of simple, biologically plausible components. At the ``ground level'', I define a primitive, perceptron-like computational unit. I then show how more abstract computational units may be implemented in terms of the primitive units, and show the utility of the abstract units in sample networks. The new units make it possible to build networks using concepts such as long-term memories, short-term memories, and frames. As a demonstration of these abstractions, I have implemented a simulator for ``creatures'' controlled by a network of abstract units. The creatures exist in a simple 2D world, and exhibit behaviors such as catching mobile prey and sorting colored blocks into matching boxes. This program demonstrates that it is possible to build systems that can interact effectively with a dynamic physical environment, yet use symbolic representations to control aspects of their behavior.