987 resultados para Software Transaction Memory
Resumo:
The Java Memory Model (JMM) provides a semantics of Java multithreading for any implementation platform. The JMM is defined in a declarative fashion with an allowed program execution being defined in terms of existence of "commit sequences" (roughly, the order in which actions in the execution are committed). In this work, we develop OpMM, an operational under-approximation of the JMM. The immediate motivation of this work lies in integrating a formal specification of the JMM with software model checkers. We show how our operational memory model description can be integrated into a Java Path Finder (JPF) style model checker for Java programs.
Resumo:
The proliferation of inexpensive workstations and networks has created a new era in distributed computing. At the same time, non-traditional applications such as computer-aided design (CAD), computer-aided software engineering (CASE), geographic-information systems (GIS), and office-information systems (OIS) have placed increased demands for high-performance transaction processing on database systems. The combination of these factors gives rise to significant challenges in the design of modern database systems. In this thesis, we propose novel techniques whose aim is to improve the performance and scalability of these new database systems. These techniques exploit client resources through client-based transaction management. Client-based transaction management is realized by providing logging facilities locally even when data is shared in a global environment. This thesis presents several recovery algorithms which utilize client disks for storing recovery related information (i.e., log records). Our algorithms work with both coarse and fine-granularity locking and they do not require the merging of client logs at any time. Moreover, our algorithms support fine-granularity locking with multiple clients permitted to concurrently update different portions of the same database page. The database state is recovered correctly when there is a complex crash as well as when the updates performed by different clients on a page are not present on the disk version of the page, even though some of the updating transactions have committed. This thesis also presents the implementation of the proposed algorithms in a memory-mapped storage manager as well as a detailed performance study of these algorithms using the OO1 database benchmark. The performance results show that client-based logging is superior to traditional server-based logging. This is because client-based logging is an effective way to reduce dependencies on server CPU and disk resources and, thus, prevents the server from becoming a performance bottleneck as quickly when the number of clients accessing the database increases.
Resumo:
This paper evaluates the viability of user-level software management of a hybrid DRAM/NVM main memory system. We propose an operating system (OS) and programming interface to place data from within the user application. We present a profiling tool to help programmers decide on the placement of application data in hybrid memory systems. Cycle-accurate simulation of modified applications confirms that our approach is more energy-efficient than state-of-the- art hardware or OS approaches at equivalent performance. Moreover, our results are validated on several candidate NVM technologies and a wide set of 14 benchmarks.
The key observation behind this work is that, for the work- loads we evaluated, application objects are too short-lived to motivate migration. Utilizing this property significantly reduces the hardware complexity of hybrid memory systems.
Resumo:
The memory hierarchy is the main bottleneck in modern computer systems as the gap between the speed of the processor and the memory continues to grow larger. The situation in embedded systems is even worse. The memory hierarchy consumes a large amount of chip area and energy, which are precious resources in embedded systems. Moreover, embedded systems have multiple design objectives such as performance, energy consumption, and area, etc. Customizing the memory hierarchy for specific applications is a very important way to take full advantage of limited resources to maximize the performance. However, the traditional custom memory hierarchy design methodologies are phase-ordered. They separate the application optimization from the memory hierarchy architecture design, which tend to result in local-optimal solutions. In traditional Hardware-Software co-design methodologies, much of the work has focused on utilizing reconfigurable logic to partition the computation. However, utilizing reconfigurable logic to perform the memory hierarchy design is seldom addressed. In this paper, we propose a new framework for designing memory hierarchy for embedded systems. The framework will take advantage of the flexible reconfigurable logic to customize the memory hierarchy for specific applications. It combines the application optimization and memory hierarchy design together to obtain a global-optimal solution. Using the framework, we performed a case study to design a new software-controlled instruction memory that showed promising potential.
Resumo:
Current industry proposals for Hardware Transactional Memory (HTM) focus on best-effort solutions (BE-HTM) where hardware limits are imposed on transactions. These designs may show a significant performance degradation due to high contention scenarios and different hardware and operating system limitations that abort transactions, e.g. cache overflows, hardware and software exceptions, etc. To deal with these events and to ensure forward progress, BE-HTM systems usually provide a software fallback path to execute a lock-based version of the code. In this paper, we propose a hardware implementation of an irrevocability mechanism as an alternative to the software fallback path to gain insight into the hardware improvements that could enhance the execution of such a fallback. Our mechanism anticipates the abort that causes the transaction serialization, and stalls other transactions in the system so that transactional work loss is mini- mized. In addition, we evaluate the main software fallback path approaches and propose the use of ticket locks that hold precise information of the number of transactions waiting to enter the fallback. Thus, the separation of transactional and fallback execution can be achieved in a precise manner. The evaluation is carried out using the Simics/GEMS simulator and the complete range of STAMP transactional suite benchmarks. We obtain significant performance benefits of around twice the speedup and an abort reduction of 50% over the software fallback path for a number of benchmarks.
Resumo:
The Dynamic Data eXchange (DDX) is our third generation platform for building distributed robot controllers. DDX allows a coalition of programs to share data at run-time through an efficient shared memory mechanism managed by a store. Further, stores on multiple machines can be linked by means of a global catalog and data is moved between the stores on an as needed basis by multi-casting. Heterogeneous computer systems are handled. We describe the architecture of DDX and the standard clients we have developed that let us rapidly build complex control systems with minimal coding.
Resumo:
The generation of a correlation matrix from a large set of long gene sequences is a common requirement in many bioinformatics problems such as phylogenetic analysis. The generation is not only computationally intensive but also requires significant memory resources as, typically, few gene sequences can be simultaneously stored in primary memory. The standard practice in such computation is to use frequent input/output (I/O) operations. Therefore, minimizing the number of these operations will yield much faster run-times. This paper develops an approach for the faster and scalable computing of large-size correlation matrices through the full use of available memory and a reduced number of I/O operations. The approach is scalable in the sense that the same algorithms can be executed on different computing platforms with different amounts of memory and can be applied to different problems with different correlation matrix sizes. The significant performance improvement of the approach over the existing approaches is demonstrated through benchmark examples.
Resumo:
Pavlovian fear conditioning is a robust technique for examining behavioral and cellular components of fear learning and memory. In fear conditioning, the subject learns to associate a previously neutral stimulus with an inherently noxious co-stimulus. The learned association is reflected in the subjects' behavior upon subsequent re-exposure to the previously neutral stimulus or the training environment. Using fear conditioning, investigators can obtain a large amount of data that describe multiple aspects of learning and memory. In a single test, researchers can evaluate functional integrity in fear circuitry, which is both well characterized and highly conserved across species. Additionally, the availability of sensitive and reliable automated scoring software makes fear conditioning amenable to high-throughput experimentation in the rodent model; thus, this model of learning and memory is particularly useful for pharmacological and toxicological screening. Due to the conserved nature of fear circuitry across species, data from Pavlovian fear conditioning are highly translatable to human models. We describe equipment and techniques needed to perform and analyze conditioned fear data. We provide two examples of fear conditioning experiments, one in rats and one in mice, and the types of data that can be collected in a single experiment. © 2012 Springer Science+Business Media, LLC.
Resumo:
This paper uses transaction cost theory to study cloud computing adoption. A model is developed and tested with data from an Australian survey. According to the results, perceived vendor opportunism and perceived legislative uncertainty around cloud computing were significantly associated with perceived cloud computing security risk. There was also a significant negative relationship between perceived cloud computing security risk and the intention to adopt cloud services. This study also reports on adoption rates of cloud computing in terms of applications, as well as the types of services used.
Resumo:
The literature contains many examples of digital procedures for the analytical treatment of electroencephalograms, but there is as yet no standard by which those techniques may be judged or compared. This paper proposes one method of generating an EEG, based on a computer program for Zetterberg's simulation. It is assumed that the statistical properties of an EEG may be represented by stationary processes having rational transfer functions and achieved by a system of software fillers and random number generators.The model represents neither the neurological mechanism response for generating the EEG, nor any particular type of EEG record; transient phenomena such as spikes, sharp waves and alpha bursts also are excluded. The basis of the program is a valid ‘partial’ statistical description of the EEG; that description is then used to produce a digital representation of a signal which if plotted sequentially, might or might not by chance resemble an EEG, that is unimportant. What is important is that the statistical properties of the series remain those of a real EEG; it is in this sense that the output is a simulation of the EEG. There is considerable flexibility in the form of the output, i.e. its alpha, beta and delta content, which may be selected by the user, the same selected parameters always producing the same statistical output. The filtered outputs from the random number sequences may be scaled to provide realistic power distributions in the accepted EEG frequency bands and then summed to create a digital output signal, the ‘stationary EEG’. It is suggested that the simulator might act as a test input to digital analytical techniques for the EEG, a simulator which would enable at least a substantial part of those techniques to be compared and assessed in an objective manner. The equations necessary to implement the model are given. The program has been run on a DEC1090 computer but is suitable for any microcomputer having more than 32 kBytes of memory; the execution time required to generate a 25 s simulated EEG is in the region of 15 s.
Resumo:
The design, implementation and evaluation are described of a dual-microcomputer system based on the concept of shared memory. Shared memory is useful for passing large blocks of data and it also provides a means to hold and work with shared data. In addition to the shared memory, a separate bus between the I/O ports of the microcomputers is provided. This bus is utilized for interprocessor synchronization. Software routines helpful in applying the dual-microcomputer system to realistic problems are presented. Performance evaluation of the system is carried out using benchmarks.
Resumo:
The StreamIt programming model has been proposed to exploit parallelism in streaming applications on general purpose multi-core architectures. This model allows programmers to specify the structure of a program as a set of filters that act upon data, and a set of communication channels between them. The StreamIt graphs describe task, data and pipeline parallelism which can be exploited on modern Graphics Processing Units (GPUs), as they support abundant parallelism in hardware. In this paper, we describe the challenges in mapping StreamIt to GPUs and propose an efficient technique to software pipeline the execution of stream programs on GPUs. We formulate this problem - both scheduling and assignment of filters to processors - as an efficient Integer Linear Program (ILP), which is then solved using ILP solvers. We also describe a novel buffer layout technique for GPUs which facilitates exploiting the high memory bandwidth available in GPUs. The proposed scheduling utilizes both the scalar units in GPU, to exploit data parallelism, and multiprocessors, to exploit task and pipelin parallelism. Further it takes into consideration the synchronization and bandwidth limitations of GPUs, and yields speedups between 1.87X and 36.83X over a single threaded CPU.