18 resultados para Memory Awareness
em Boston University Digital Common
Resumo:
Memorial Sermon preached in memory of the Rev. Walter Gardner Webster
Resumo:
Tribute to the Memory of President Fisk.
Resumo:
The proliferation of inexpensive workstations and networks has prompted several researchers to use such distributed systems for parallel computing. Attempts have been made to offer a shared-memory programming model on such distributed memory computers. Most systems provide a shared-memory that is coherent in that all processes that use it agree on the order of all memory events. This dissertation explores the possibility of a significant improvement in the performance of some applications when they use non-coherent memory. First, a new formal model to describe existing non-coherent memories is developed. I use this model to prove that certain problems can be solved using asynchronous iterative algorithms on shared-memory in which the coherence constraints are substantially relaxed. In the course of the development of the model I discovered a new type of non-coherent behavior called Local Consistency. Second, a programming model, Mermera, is proposed. It provides programmers with a choice of hierarchically related non-coherent behaviors along with one coherent behavior. Thus, one can trade-off the ease of programming with coherent memory for improved performance with non-coherent memory. As an example, I present a program to solve a linear system of equations using an asynchronous iterative algorithm. This program uses all the behaviors offered by Mermera. Third, I describe the implementation of Mermera on a BBN Butterfly TC2000 and on a network of workstations. The performance of a version of the equation solving program that uses all the behaviors of Mermera is compared with that of a version that uses coherent behavior only. For a system of 1000 equations the former exhibits at least a 5-fold improvement in convergence time over the latter. The version using coherent behavior only does not benefit from employing more than one workstation to solve the problem while the program using non-coherent behavior continues to achieve improved performance as the number of workstations is increased from 1 to 6. This measurement corroborates our belief that non-coherent shared memory can be a performance boon for some applications.
Resumo:
Coherent shared memory is a convenient, but inefficient, method of inter-process communication for parallel programs. By contrast, message passing can be less convenient, but more efficient. To get the benefits of both models, several non-coherent memory behaviors have recently been proposed in the literature. We present an implementation of Mermera, a shared memory system that supports both coherent and non-coherent behaviors in a manner that enables programmers to mix multiple behaviors in the same program[HS93]. A programmer can debug a Mermera program using coherent memory, and then improve its performance by selectively reducing the level of coherence in the parts that are critical to performance. Mermera permits a trade-off of coherence for performance. We analyze this trade-off through measurements of our implementation, and by an example that illustrates the style of programming needed to exploit non-coherence. We find that, even on a small network of workstations, the performance advantage of non-coherence is compelling. Raw non-coherent memory operations perform 20-40~times better than non-coherent memory operations. An example application program is shown to run 5-11~times faster when permitted to exploit non-coherence. We conclude by commenting on our use of the Isis Toolkit of multicast protocols in implementing Mermera.
Resumo:
For a given TCP flow, exogenous losses are those occurring on links other than the flow's bottleneck link. Exogenous losses are typically viewed as introducing undesirable "noise" into TCP's feedback control loop, leading to inefficient network utilization and potentially severe global unfairness. This has prompted much research on mechanisms for hiding such losses from end-points. In this paper, we show through analysis and simulations that low levels of exogenous losses are surprisingly beneficial in that they improve stability and convergence, without sacrificing efficiency. Based on this, we argue that exogenous loss awareness should be taken into account in any AQM design that aims to achieve global fairness. To that end, we propose an exogenous-loss aware Queue Management (XQM) that actively accounts for and leverages exogenous losses. We use an equation based approach to derive the quiescent loss rate for a connection based on the connection's profile and its global fair share. In contrast to other queue management techniques, XQM ensures that a connection sees its quiescent loss rate, not only by complementing already existing exogenous losses, but also by actively hiding exogenous losses, if necessary, to achieve global fairness. We establish the advantages of exogenous-loss awareness using extensive simulations in which, we contrast the performance of XQM to that of a host of traditional exogenous-loss unaware AQM techniques.
Resumo:
Communication and synchronization stand as the dual bottlenecks in the performance of parallel systems, and especially those that attempt to alleviate the programming burden by incurring overhead in these two domains. We formulate the notions of communicable memory and lazy barriers to help achieve efficient communication and synchronization. These concepts are developed in the context of BSPk, a toolkit library for programming networks of workstations|and other distributed memory architectures in general|based on the Bulk Synchronous Parallel (BSP) model. BSPk emphasizes efficiency in communication by minimizing local memory-to-memory copying, and in barrier synchronization by not forcing a process to wait unless it needs remote data. Both the message passing (MP) and distributed shared memory (DSM) programming styles are supported in BSPk. MP helps processes efficiently exchange short-lived unnamed data values, when the identity of either the sender or receiver is known to the other party. By contrast, DSM supports communication between processes that may be mutually anonymous, so long as they can agree on variable names in which to store shared temporary or long-lived data.
Resumo:
This paper focuses on an efficient user-level method for the deployment of application-specific extensions, using commodity operating systems and hardware. A sandboxing technique is described that supports multiple extensions within a shared virtual address space. Applications can register sandboxed code with the system, so that it may be executed in the context of any process. Such code may be used to implement generic routines and handlers for a class of applications, or system service extensions that complement the functionality of the core kernel. Using our approach, application-specific extensions can be written like conventional user-level code, utilizing libraries and system calls, with the advantage that they may be executed without the traditional costs of scheduling and context-switching between process-level protection domains. No special hardware support such as segmentation or tagged translation look-aside buffers (TLBs) is required. Instead, our ``user-level sandboxing'' mechanism requires only paged-based virtual memory support, given that sandboxed extensions are either written by a trusted source or are guaranteed to be memory-safe (e.g., using type-safe languages). Using a fast method of upcalls, we show how our mechanism provides significant performance improvements over traditional methods of invoking user-level services. As an application of our approach, we have implemented a user-level network subsystem that avoids data copying via the kernel and, in many cases, yields far greater network throughput than kernel-level approaches.
Resumo:
This paper is centered around the design of a thread- and memory-safe language, primarily for the compilation of application-specific services for extensible operating systems. We describe various issues that have influenced the design of our language, called Cuckoo, that guarantees safety of programs with potentially asynchronous flows of control. Comparisons are drawn between Cuckoo and related software safety techniques, including Cyclone and software-based fault isolation (SFI), and performance results suggest our prototype compiler is capable of generating safe code that executes with low runtime overheads, even without potential code optimizations. Compared to Cyclone, Cuckoo is able to safely guard accesses to memory when programs are multithreaded. Similarly, Cuckoo is capable of enforcing memory safety in situations that are potentially troublesome for techniques such as SFI.
Resumo:
Neural models have proposed how short-term memory (STM) storage in working memory and long-term memory (LTM) storage and recall are linked and interact, but are realized by different mechanisms that obey different laws. The authors' data can be understood in the light of these models, which suggest that the authors may have gone too far in obscuring the differences between these processes.
Resumo:
Most associative memory models perform one level mapping between predefined sets of input and output patterns1 and are unable to represent hierarchical knowledge. Complex AI systems allow hierarchical representation of concepts, but generally do not have learning capabilities. In this paper, a memory model is proposed which forms concept hierarchy by learning sample relations between concepts. All concepts are represented in a concept layer. Relations between a concept and its defining lower level concepts, are chunked as cognitive codes represented in a coding layer. By updating memory contents in the concept layer through code firing in the coding layer, the system is able to perform an important class of commonsense reasoning, namely recognition and inheritance.
Resumo:
A model which extends the adaptive resonance theory model to sequential memory is presented. This new model learns sequences of events and recalls a sequence when presented with parts of the sequence. A sequence can have repeated events and different sequences can share events. The ART model is modified by creating interconnected sublayers within ART's F2 layer. Nodes within F2 learn temporal patterns by forming recency gradients within LTM. Versions of the ART model like ART I, ART 2, and fuzzy ART can be used.
Resumo:
We can recognize objects through receiving continuously huge temporal information including redundancy and noise, and can memorize them. This paper proposes a neural network model which extracts pre-recognized patterns from temporally sequential patterns which include redundancy, and memorizes the patterns temporarily. This model consists of an adaptive resonance system and a recurrent time-delay network. The extraction is executed by the matching mechanism of the adaptive resonance system, and the temporal information is processed and stored by the recurrent network. Simple simulations are examined to exemplify the property of extraction.
Resumo:
Advanced Research Projects Agency (ONR N00014-92-J-4015); Office of Naval Research (N00014-91-J-4100, N00014-92-J-1309)
Resumo:
How do the layered circuits of prefrontal and motor cortex carry out working memory storage, sequence learning, and voluntary sequential item selection and performance? A neural model called LIST PARSE is presented to explain and quantitatively simulate cognitive data about both immediate serial recall and free recall, including bowing of the serial position performance curves, error-type distributions, temporal limitations upon recall, and list length effects. The model also qualitatively explains cognitive effects related to attentional modulation, temporal grouping, variable presentation rates, phonemic similarity, presentation of non-words, word frequency/item familiarity and list strength, distracters and modality effects. In addition, the model quantitatively simulates neurophysiological data from the macaque prefrontal cortex obtained during sequential sensory-motor imitation and planned performance. The article further develops a theory concerning how the cerebral cortex works by showing how variations of the laminar circuits that have previously clarified how the visual cortex sees can also support cognitive processing of sequentially organized behaviors.
Resumo:
Working memory neural networks are characterized which encode the invariant temporal order of sequential events that may be presented at widely differing speeds, durations, and interstimulus intervals. This temporal order code is designed to enable all possible groupings of sequential events to be stably learned and remembered in real time, even as new events perturb the system. Such a competence is needed in neural architectures which self-organize learned codes for variable-rate speech perception, sensory-motor planning, or 3-D visual object recognition. Using such a working memory, a self-organizing architecture for invariant 3-D visual object recognition is described that is based on the model of Seibert and Waxman [1].